The goal of magutils is to facilitate loading and extracting data from a database with records from Microsoft Academic Graph and ProQuest Dissertations and make the functions available to co-authors and RAs. In the future, we may publish a “back-end” package to generate the database.
You can install the development version of magutils from GitHub with:
# install.packages("devtools")
devtools::install_github("f-hafner/magutils", build_vignettes = TRUE)If you do not have access to the full database, use the example database like this:
library(magutils)
db_file <- db_example("AcademicGraph.sqlite")
conn <- connect_to_db(db_file)
#> The database connection is: 
#> src:  sqlite 3.38.5 [/tmp/RtmpOOS3Cx/temp_libpath3eff04106586a/magutils/extdata/AcademicGraph.sqlite]
#> tbls: author_coauthor, author_output, AuthorAffiliation, current_links,
#>   current_links_advisors, FieldsOfStudy, FirstNamesGender, pq_advisors,
#>   pq_authors, pq_fields_mag, pq_unisThen query the graduate links:
links <- get_links(conn, from = "graduates", lazy = TRUE)Or query info on graduates:
graduates <- get_proquest(conn, from = "graduates", lazy = FALSE, limit = 3)You can join the two together
library(magrittr)
links <- get_links(conn, from = "graduates", lazy = TRUE)
d_full <- get_proquest(conn, from = "graduates", limit = 5) %>%
  dplyr::left_join(links, by = "goid") %>%
  dplyr::collect()At the end, do not forget to disconnect from the database:
DBI::dbDisconnect(conn)Extracting key tables
get_proquest: Source data on dissertations in United States from ProQuest.
get_links: Load links between ProQuest and MAG. Can be links from PhD graduates to MAG authors, or from PhD advisors to MAG authors
define_field: define the field of study for records in a table.
define_gender: define gender of a table of persons with firstnames.
augment_tbl: augment a table with various additional information: output, affiliations, co-authors. Because output and affiliations are at the unit-year level, the result will be a table at the unit-year level. I am not sure if this is the best way to do it (also the naming wrt to the previous functions), but we have to see how it works in practice.