Export PDF + Metadata from Zotero into R


I am trying to export PDF files + metadata from Zotero into R. I found this guide that includes a script and have successfully implemented it. However, the metadata that is imported are things like "ID", "date last modified", "date last added". Not very useful for analysis.

I am looking for a way to export other metadata such as "author", "tags", "organization", "type" using this script or any other script.

install.packages(c("magrittr", "DBI", "RSQLite", "quanteda", "readtext"))


connect to Zotero's SQLite database

con = dbConnect(drv = RSQLite::SQLite(),
dbname = "~/Zotero/zotero.sqlite")

get names of all tables in the database

alltables = dbListTables(con)

bring the items and itemNotes tables into R

table.items <- dbGetQuery(con, 'select * from items')
table.itemNotes <- dbGetQuery(con, 'select * from itemNotes')

bring in Zotero fulltext cache plaintext

textDF <- readtext(paste0("~/Zotero/storage", "/*/.zotero-ft-cache"),
docvarsfrom = "filepaths")

isolate "key" (8-character alphanumeric directory in storage/) in docvar1 associated with plaintext

textDF$docvar1 <- gsub(pattern = "^.storage\/", replacement = "", x = textDF$docvar1)
textDF$docvar1 <- gsub(pattern = "\/.
", replacement = "", x = textDF$docvar1)

bring in itemID (and some other metadata) and that's all

textDF <- textDF %>%
dplyr::rename(key = docvar1) %>%
dplyr::left_join(table.items) %>%
dplyr::filter(!is.na(itemID), !itemID %in% table.itemNotes$itemID)

I don't have extensive experience with R, so any help would be really appreciated!


Have you looked at RefManageR package?

You seem to be connecting to the data base Direct. The API may contain what you need?

1 Like

the alltables vector gives you a list of all the SQLite tables, there seems to be 60. You can just work through the ones you want.

table.creatorTypes <- dbGetQuery(con, 'select * from creators')

gives the authors, editors etc.

table.creatorTypes <- dbGetQuery(con, 'select * from creatorTypes')

tells you what kind of creators there are.

table.tags  <-  dbGetQuery(con, 'select * from tags')

lists all the tags in the data base.

And so on.

Depending on what you are doing this may work or you may find that exporting the data base to bibtex and following @ CALUM_POLWART's suggestion of RefManageR might be a better choice.



Thank you for the advice. I will check it out!


Just wanted to say that I have seen this, but haven't had time to try it out yet.

Will be working on it soon.

Thank you for your input!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.