So I want to mine some #altmetrics data for some research I'm thinking about doing. The steps would be:
It's early days, so jus working on the first step. However, getting a list of journals in ecology and evolution is frustratingly hard. This turns out to not be that easy if you are (1) trying to avoid Thomson Reuters, and (2) want a machine interface way to do it (read: API).
Unfortunately, Mendeley's API does not have methods for getting a list of journals by field, or at least I don't know how to do it using their API. No worries though - Crossref comes to save the day. Here's my attempt at this using the Crossref OAI-PMH.
1 library(XML)
2 library(RCurl)
3
4 token <- "characters" # define a iterator, also used for gettingn the resumptionToken
5 nameslist <- list() # define empty list to put joural titles in to
6 while (is.character(token) == TRUE) {
7 baseurl <- "http://oai.crossref.org/OAIHandler?verb=ListSets"
8 if (token == "characters") {
9 tok2 <- NULL
10 } else {
11 tok2 <- paste("&resumptionToken=", token, sep = "")
12 }
13 query <- paste(baseurl, tok2, sep = "")
14 crsets <- xmlToList(xmlParse(getURL(query)))
15 names <- as.character(sapply(crsets[[4]], function(x) x[["setName"]]))
16 nameslist[[token]] <- names
17 if (class(try(crsets[[2]]$.attrs[["resumptionToken"]])) == "try-error") {
18 stop("no more data")
19 } else token <- crsets[[2]]$.attrs[["resumptionToken"]]
20 }
1 sapply(nameslist, length) # length of each list
characters c65ebc3f-b540-4672-9c00-f3135bf849e3
10001 10001
6f61b343-a8f4-48f1-8297-c6f6909ca7f7
6864
1 allnames <- do.call(c, nameslist) # combine to list
2 length(allnames)
[1] 26866
regex to pull out the journal titles that are likely ecology and evolutionary biology journals. The ^ symbol says "the string must start here". The \\s means whitespace. The [] lets you specify a set of letters you are looking for, e.g., [Ee] means capital E OR lowercase e. I threw in titles that had the words systematic and natrualist too. Tried to trim any whitespace as well using the stringr package. 1 library(stringr)
2
3 ecotitles <- as.character(allnames[str_detect(allnames, "^[Ee]cology|\\s[Ee]cology")])
4 evotitles <- as.character(allnames[str_detect(allnames, "^[Ee]volution|\\s[Ee]volution")])
5 systtitles <- as.character(allnames[str_detect(allnames, "^[Ss]ystematic|\\s[Ss]systematic")])
6 naturalist <- as.character(allnames[str_detect(allnames, "[Nn]aturalist")])
7
8 ecoevotitles <- unique(c(ecotitles, evotitles, systtitles, naturalist)) # combine to list
9 ecoevotitles <- str_trim(ecoevotitles, side = "both") # trim whitespace, if any
10 length(ecoevotitles)
[1] 188
1 # Just the first ten titles
2 ecoevotitles[1:10]
[1] "Microbial Ecology in Health and Disease"
[2] "Population Ecology"
[3] "Researches on Population Ecology"
[4] "Behavioral Ecology and Sociobiology"
[5] "Microbial Ecology"
[6] "Biochemical Systematics and Ecology"
[7] "FEMS Microbiology Ecology"
[8] "Journal of Experimental Marine Biology and Ecology"
[9] "Applied Soil Ecology"
[10] "Forest Ecology and Management"