define_gender.Rd
Given a database table tbl
, assigns the likely gender of the person
given the firstname. The firstname needs to be present as a column in tbl
and passed
as argument firstname_left
.
define_gender(tbl, conn, firstname_left, drop_missing)
A query from conn
with dbplyr and lazily evaluated.
An object of class SQLiteConnection
to a sqlite database.
Column containing the firstname in table
and to
be used for joining gender on.
If TRUE, drops records without clear gender assigned. Clear assignment is when probability of either gender is 0.8 or higher.
tbl
augmented by a gender column.
The function uses the internal table FirstNamesGender
, which
assigns the likely gender to each first name. The table is generated from
genderize.io.
firstname_left
should be free of middle names and middle
initials, as otherwise the gender assignment fails (even though using only
the firstname would result in a high-confidence assignment.)
if (FALSE) {
new_table <- define_gender(
conn = conn, table = old_table,
firstname_left = "firstname_old", drop_missing = TRUE
)
}