Getting ecology and evolution journal titles from R

August 31, 2012

814 words

Scott Chamberlain


  R altmetrics ecology evolution doi


  



So I want to mine some #altmetrics data for some research I'm thinking about doing. The steps would be:

  • Get journal titles for ecology and evolution journals.
  • Get DOI's for all papers in all the above journal titles.
  • Get altmetrics data on each DOI.
  • Do some fancy analyses.
  • Make som pretty figs.
  • Write up results.

It's early days, so jus working on the first step. However, getting a list of journals in ecology and evolution is frustratingly hard. This turns out to not be that easy if you are (1) trying to avoid Thomson Reuters, and (2) want a machine interface way to do it (read: API).

Unfortunately, Mendeley's API does not have methods for getting a list of journals by field, or at least I don't know how to do it using their API. No worries though - Crossref comes to save the day. Here's my attempt at this using the Crossref OAI-PMH.


I wrote a little while loop to get journal titles from the Crossref OAI-PMH. This takes a while to run, but at least it works on my machine - hopefully yours too!

 1 library(XML)
 2 library(RCurl)
 3 
 4 token <- "characters"  # define a iterator, also used for gettingn the resumptionToken
 5 nameslist <- list()  # define empty list to put joural titles in to
 6 while (is.character(token) == TRUE) {
 7     baseurl <- "http://oai.crossref.org/OAIHandler?verb=ListSets"
 8     if (token == "characters") {
 9         tok2 <- NULL
10     } else {
11         tok2 <- paste("&resumptionToken=", token, sep = "")
12     }
13     query <- paste(baseurl, tok2, sep = "")
14     crsets <- xmlToList(xmlParse(getURL(query)))
15     names <- as.character(sapply(crsets[[4]], function(x) x[["setName"]]))
16     nameslist[[token]] <- names
17     if (class(try(crsets[[2]]$.attrs[["resumptionToken"]])) == "try-error") {
18         stop("no more data")
19     } else token <- crsets[[2]]$.attrs[["resumptionToken"]]
20 }

Yay! Hopefully it worked if you tried it. Let's see how long the list of journal titles is.

1 sapply(nameslist, length)  # length of each list
                          characters c65ebc3f-b540-4672-9c00-f3135bf849e3 
                               10001                                10001 
6f61b343-a8f4-48f1-8297-c6f6909ca7f7 
                                6864 
1 allnames <- do.call(c, nameslist)  # combine to list
2 length(allnames)
[1] 26866

Now, let's use some regex to pull out the journal titles that are likely ecology and evolutionary biology journals. The ^ symbol says "the string must start here". The \\s means whitespace. The [] lets you specify a set of letters you are looking for, e.g., [Ee] means capital E OR lowercase e. I threw in titles that had the words systematic and natrualist too. Tried to trim any whitespace as well using the stringr package.

 1 library(stringr)
 2 
 3 ecotitles <- as.character(allnames[str_detect(allnames, "^[Ee]cology|\\s[Ee]cology")])
 4 evotitles <- as.character(allnames[str_detect(allnames, "^[Ee]volution|\\s[Ee]volution")])
 5 systtitles <- as.character(allnames[str_detect(allnames, "^[Ss]ystematic|\\s[Ss]systematic")])
 6 naturalist <- as.character(allnames[str_detect(allnames, "[Nn]aturalist")])
 7 
 8 ecoevotitles <- unique(c(ecotitles, evotitles, systtitles, naturalist))  # combine to list
 9 ecoevotitles <- str_trim(ecoevotitles, side = "both")  # trim whitespace, if any
10 length(ecoevotitles)
[1] 188
1 # Just the first ten titles
2 ecoevotitles[1:10]
 [1] "Microbial Ecology in Health and Disease"           
 [2] "Population Ecology"                                
 [3] "Researches on Population Ecology"                  
 [4] "Behavioral Ecology and Sociobiology"               
 [5] "Microbial Ecology"                                 
 [6] "Biochemical Systematics and Ecology"               
 [7] "FEMS Microbiology Ecology"                         
 [8] "Journal of Experimental Marine Biology and Ecology"
 [9] "Applied Soil Ecology"                              
[10] "Forest Ecology and Management"                     

Get the .Rmd file used to create this post at my github account.


Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.

  





comments powered by Disqus

Designed and built using Twitter Bootstrap and Jekyll. Icons from Font Awesome by Dave Gandy, licensed under CC BY 3.0. More details about the site here. Page last generated on June 17, 2013.

CC0