DATA4ECOLOGY

View Data Set

DataSets List

Name:

US Adult Income

Description:

US Adult Census data relating income to social factors such as Age, Education, race etc.

The Us Adult income dataset was extracted by Barry Becker from the 1994 US Census Database. The data set consists of anonymous information such as occupation, age, native country, race, capital gain, capital loss, education, work class and more.
Each row is labelled as either having a salary greater than ">50K" or "<=50K".

This Data set is split into two CSV files, named adult-training.txt and adult-test.txt.

The goal here is to train a binary classifier on the training dataset to predict the column income_bracket which has two possible values ">50K" and "<=50K" and evaluate the accuracy of the classifier with the test dataset.

Note that the dataset is made up of categorical and continuous features. It also contains missing values
The categorical columns are: workclass, education, maritalstatus, occupation, relationship, race, gender, nativecountry

The continuous columns are: age, educationnum, capitalgain, capitalloss, hoursper_week

This Dataset was obtained from the UCI repository, it can be found on

https://archive.ics.uci.edu/ml/datasets/census+income,
http://mlr.cs.umass.edu/ml/machine-learning-databases/adult/

Usage
This dataset is well suited to developing and testing wide linear classifiers, deep neutral network classifiers and a combination of both. For more info on Combined Deep and Wide Model classifiers, refer to the Research Paper by Google https://arxiv.org/abs/1606.07792

Refer to this kernel for sample usage : https://www.kaggle.com/johnolafenwa/wage-prediction

Complete Tutorial is available from http://johnolafenwa.blogspot.com.ng/2017/07/machine-learning-tutorial-1-wage.html?m=1

Variables:

Link To Google Sheets:

Rows:

32560

Columns:

Files:

US Adult Income_1617_1.csv

License Type:

Public Domain (CC0)

References/Notes/Attributions:

R Dataset Upload:

Use the following R code to directly access this dataset in R.

d <- read.csv("https://www.key2stats.com/US_Adult_Income_1617_1.csv")