Find the gender of a first name — predict

The surname data comes from the United States Social Security Administration (SSA). This data has the number of people with that name that are identified as female or male so the probability female/male is the proportion of all people with that name that are female/male. SSA data is available annually from 1880-2019, this aggregates all years together.

predict_gender(name, probability = TRUE)

Arguments

name	String or vector of strings of the first name that you want to know the gender of.
probability	If TRUE (default) will provide columns for each race with the probability that the first name is of that gender If FALSE, will only return the name, the match-name from the SSA data, and the most likely gender.

Value

A data.frame with three or nine columns: The first column has the name as inputted, the second column has the cleaned up name (no spaces or punctuation, all lowercase), the third column tells the likely gender of the first name (if there are multiple genders with the same probability of a match, it will be a string with each race separated by a comma). If the parameter probability is false, these three columns are all that is returned. Otherwise, columns 4-5 tell the specific probability that the surname is female or male.

Examples

predict_gender("tyrion")
#>     name match_name likely_gender probability_female probability_male
#> 1 tyrion     tyrion          male          0.0115894        0.9884106

predict_gender(c("harry", "ron", "hermione", "DEAN", "NEVILLE", "Cho"))
#>       name match_name likely_gender probability_female probability_male
#> 1    harry      harry          male        0.004759343        0.9952407
#> 2      ron        ron          male        0.001931194        0.9980688
#> 3 hermione   hermione        female        1.000000000        0.0000000
#> 4     DEAN       dean          male        0.015914871        0.9840851
#> 5  NEVILLE    neville          male        0.031820110        0.9681799
#> 6      Cho        cho          <NA>                 NA               NA
predict_gender("franklin", probability = FALSE)
#>       name match_name likely_gender
#> 1 franklin   franklin          male
predict_gender("jacob", probability = FALSE)
#>    name match_name likely_gender
#> 1 jacob      jacob          male
predict_gender("jacob", probability = TRUE)
#>    name match_name likely_gender probability_female probability_male
#> 1 jacob      jacob          male        0.002394497        0.9976055