Find the race of a surname or first name — predict

The surname data comes from the United States Census. The first name data comes from Tzioumis (2018, <dx.doi.org/10.1038/sdata.2018.25>)

predict_race(name, probability = TRUE, surname = TRUE)

Arguments

name	String or vector of strings of surname or first name that you want to know the race of.
probability	If TRUE (default) will provide columns for each race with the probability that the surname is of that race. If FALSE, will only return the name, the match-name from the Census data, and the most likely race.
surname	If TRUE (default) will return the race based on the inputted name being a surname. If FALSE, will return the race based on the inputted name being a first name.

Value

A data.frame with three or nine columns: The first column has the name as inputted, the second column has the cleaned up name (no spaces or punctuation, all lowercase), the third column tells the likely race of the surname or first name (if there are multiple races with the same probability of a match, it will be a string with each race separated by a comma). If the parameter probability is false, these three columns are all that is returned. Otherwise, columns 4-9 tell the specific probability that the surname or first name is each race.

Examples

predict_race("franklin")
#>       name match_name likely_race probability_american_indian probability_asian
#> 1 franklin   franklin       white                      0.0085             0.005
#>   probability_black probability_hispanic probability_white probability_2races
#> 1            0.3828               0.0222            0.5577             0.0238

predict_race(c("franklin", "Washington", "Jefferson", "Sotomayor", "Liu"))
#>         name match_name likely_race probability_american_indian
#> 1   franklin   franklin       white                      0.0085
#> 2 Washington washington       black                      0.0066
#> 3  Jefferson  jefferson       black                      0.0188
#> 4  Sotomayor  sotomayor    hispanic                      0.0000
#> 5        Liu        liu       asian                      0.0003
#>   probability_asian probability_black probability_hispanic probability_white
#> 1            0.0050            0.3828               0.0222            0.5577
#> 2            0.0028            0.8865               0.0202            0.0517
#> 3            0.0033            0.7472               0.0204            0.1806
#> 4            0.0068            0.0057               0.8924            0.0918
#> 5            0.9556            0.0016               0.0051            0.0181
#>   probability_2races
#> 1             0.0238
#> 2             0.0323
#> 3             0.0298
#> 4             0.0014
#> 5             0.0194
predict_race("franklin", probability = FALSE)
#>       name match_name likely_race
#> 1 franklin   franklin       white
predict_race("jacob", probability = FALSE, surname = FALSE)
#>    name match_name likely_race
#> 1 jacob      jacob       white
predict_race("jacob", probability = TRUE, surname = FALSE)
#>    name match_name likely_race probability_american_indian probability_asian
#> 1 jacob      jacob       white                      0.0015            0.0321
#>   probability_black probability_hispanic probability_white probability_2races
#> 1            0.0163               0.0301            0.9184             0.0015