The goal of magutils is to facilitate loading and extracting data from a database with records from Microsoft Academic Graph and ProQuest Dissertations and make the functions available to co-authors and RAs. In the future, we may publish a “back-end” package to generate the database.
You can install the development version of magutils from GitHub with:
# install.packages("devtools")
devtools::install_github("f-hafner/magutils", build_vignettes = TRUE)
If you do not have access to the full database, use the example database like this:
library(magutils)
db_file <- db_example("AcademicGraph.sqlite")
conn <- connect_to_db(db_file)
#> The database connection is:
#> src: sqlite 3.38.5 [/tmp/RtmpOOS3Cx/temp_libpath3eff04106586a/magutils/extdata/AcademicGraph.sqlite]
#> tbls: author_coauthor, author_output, AuthorAffiliation, current_links,
#> current_links_advisors, FieldsOfStudy, FirstNamesGender, pq_advisors,
#> pq_authors, pq_fields_mag, pq_unis
Then query the graduate links:
links <- get_links(conn, from = "graduates", lazy = TRUE)
Or query info on graduates:
graduates <- get_proquest(conn, from = "graduates", lazy = FALSE, limit = 3)
You can join the two together
library(magrittr)
links <- get_links(conn, from = "graduates", lazy = TRUE)
d_full <- get_proquest(conn, from = "graduates", limit = 5) %>%
dplyr::left_join(links, by = "goid") %>%
dplyr::collect()
At the end, do not forget to disconnect from the database:
DBI::dbDisconnect(conn)
Extracting key tables
get_proquest
: Source data on dissertations in United States from ProQuest.
get_links
: Load links between ProQuest and MAG. Can be links from PhD graduates to MAG authors, or from PhD advisors to MAG authors
define_field
: define the field of study for records in a table.
define_gender
: define gender of a table of persons with firstnames.
augment_tbl
: augment a table with various additional information: output, affiliations, co-authors. Because output
and affiliations
are at the unit-year level, the result will be a table at the unit-year level. I am not sure if this is the best way to do it (also the naming wrt to the previous functions), but we have to see how it works in practice.