1
2
3
4
5
#

Name:

Description:

The data consist of 4601 email items, of which 1813 items were identified as spam.

Variables:

This data frame contains the following columns:

crl.tot

total length of words in capitals

dollar

number of occurrences of the \$ symbol

bang

number of occurrences of the ! symbol

money

number of occurrences of the word ‘money’

n000

number of occurrences of the string ‘000’

make

number of occurrences of the word ‘make’

yesno

outcome variable, a factor with levels n not spam, y spam

Link To Google Sheets:

Rows:

Columns:

License Type:

References/Notes/Attributions:

Source

George Forman, Hewlett-Packard Laboratories

These data are available from the University of California at Irvine Repository of Machine Learning Databases and Domain Theories. The address is: http://www.ics.uci.edu/~Here

R Dataset Upload:

Use the following R code to directly access this dataset in R.

d <- read.csv("https://www.key2stats.com/Spam_E-mail_Data_336_10.csv")

R Coding Interface:


Datasets Tag Questions & Instructional Blocks

NumberContentType
No results found.