A data.frame describing names containing character codes rare or non-existent in standard English text, e.g., with various accent marks that may not be coded consistently in different locales or by different software.
A data.frame with two columns:
nonEnglish
a character vector containing names that often have non-standard characters with the non-standard characters replaced by "_"
English
a character vector containing a standard English-character translation of nonEnglish
See Also
grepNonStandardCharacters, subNonStandardCharacters
Use the following R code to directly access this dataset in R.