Given a database table tbl, assigns the likely gender of the person given the firstname. The firstname needs to be present as a column in tbl and passed as argument firstname_left.

define_gender(tbl, conn, firstname_left, drop_missing)

Arguments

tbl

A query from conn with dbplyr and lazily evaluated.

conn

An object of class SQLiteConnection to a sqlite database.

firstname_left

Column containing the firstname in table and to be used for joining gender on.

drop_missing

If TRUE, drops records without clear gender assigned. Clear assignment is when probability of either gender is 0.8 or higher.

Value

tbl augmented by a gender column.

Details

The function uses the internal table FirstNamesGender, which assigns the likely gender to each first name. The table is generated from genderize.io.

firstname_left should be free of middle names and middle initials, as otherwise the gender assignment fails (even though using only the firstname would result in a high-confidence assignment.)

Examples

if (FALSE) {
new_table <- define_gender(
conn = conn, table = old_table,
firstname_left = "firstname_old", drop_missing = TRUE
)
}