DATA4ECOLOGY

View Data Set

DataSets List

Name:

Names with Character Set Problems

Description:

A data.frame describing names containing character codes rare or non-existent in standard English text, e.g., with various accent marks that may not be coded consistently in different locales or by different software.

Variables:

A data.frame with two columns:

nonEnglish: a character vector containing names that often have non-standard characters with the non-standard characters replaced by "_"
English: a character vector containing a standard English-character translation of nonEnglish

Link To Google Sheets:

Rows:

Columns:

Files:

Names with Character Set Problems_635_15.csv

License Type:

Public Domain (CC0)

References/Notes/Attributions:

See Also

grepNonStandardCharacters, subNonStandardCharacters

R Dataset Upload:

Use the following R code to directly access this dataset in R.

d <- read.csv("https://www.key2stats.com/Names_with_Character_Set_Problems_635_15.csv")