Import, assemble, deduplicate, and write bibliographic data with synthesisr

Eliza M. Grames and Martin J. Westgate

Introduction

Systematic review searches include multiple databases that export results in a variety of formats with overlap in coverage between databases. To streamline the process of importing, assembling, and deduplicating results, synthesisr recognizes bibliographic files exported from databases commonly used for systematic reviews and merges results into a standardized format.

If you run into issues with the package, please open an issue at https://github.com/rmetaverse/synthesisr or email martinjwestgate@gmail.com or eliza.grames@uconn.edu.

Read and assemble bibliographic files

synthesisr can read any BibTex or RIS formatted bibliographic data files. It detects whether files are more bib-like or ris-like and imports them accordingly. Note that files from some databases may contain non-standard fields or non-standard characters that cause import failure in rare cases; if this happens, we recommend converting the file in open source bibliographic management software such as Zotero.

In the code below, we will demonstrate how to read and assemble bibliographic data files with example datasets included in the synthesisr package. Note that if you are using the code with your own data, you will not need to use system.file() and instead will want to pass a character vector of the path(s) to the file(s) you want to import. For example, if you have saved all your search results in a directory called “search_results”, you may want to use list.files(“./search_results/”) instead.

# system.file will look for the path to where synthesisr is installed
# by using the example bibliographic data files, you can reproduce the vignette
bibfiles <- list.files(
  system.file("extdata/", package = "synthesisr"),
  full.names = TRUE
)

# we can print the list of bibfiles to confirm what we will import
# in this example, we have bibliographic data exported from Scopus and Zoological Record
print(bibfiles)
#> [1] "/private/var/folders/s4/06ssj1yx0wgfpx500t9vp6b00000gn/T/RtmpN2kzwE/Rinst241c139586b6/synthesisr/extdata//scopus.ris"
#> [2] "/private/var/folders/s4/06ssj1yx0wgfpx500t9vp6b00000gn/T/RtmpN2kzwE/Rinst241c139586b6/synthesisr/extdata//zoorec.txt"

# now we can use read_refs to read in our bibliographic data files
# we save them to a data.frame object (because return_df=TRUE) called imported_files
library(synthesisr)
imported_files <- read_refs(
  filename = bibfiles,
  return_df = TRUE)

Deduplicate bibliographic data

Many journals are indexed in multiple databases, so searching across databases will retrieve duplicates. After import, synthesisr can detect duplicates and retain only unique bibliographic records using a variety of methods such as string distance or fuzzy matching records. A good place to start is removing articles that have identical titles, especially since this reduces computational time for more sophisticated deduplication methods.

# first, we will remove articles that have identical titles
# this is a fairly conservative approach, so we will remove them without review
df <- deduplicate(
  imported_files,
  match_by = "title",
  method = "exact"
)

In some cases, it may be useful to know which articles were identified as duplicates so they can be manually reviewed or so that information from two records can be merged. Using our partially-deduplicated dataset, we check a few titles and use string distance methods to find additional duplicate articles in the code below and then remove them by extracting unique references. Although here we only use one secondary deduplication method (string distance), we could look for additional duplicates based on fuzzy matching abstracts, for example.

# there are still some duplicate articles that were not removed
# for example, the titles for articles 91 and 114 appear identical
df$title[c(91,114)]
#> [1] "Composition of Bird Communities Following Stand-Replacement Fires in Northern Rocky Mountain (U.S.A.) Conifer Forests"
#> [2] "FORAGING-HABITAT SELECTION OF BLACK-BACKED WOODPECKERS IN FOREST BURNS OF SOUTHWESTERN IDAHO"
# the dash-like symbol in title 91, however, is a special character not punctuation
# so it was not classified as identical

# similarly, there is a missing space in the title for article 96
df$title[c(21,96)]
#> [1] "An integrated occupancy and space-use model to predict abundance of imperfectly detected, territorial vertebrates"
#> [2] "The persistence of Black-backed Woodpeckers following delayed salvage logging in the Sierra Nevada"

# and an extra space in title 47
df$title[c(47, 101)]
#> [1] "Foraging-habitat selection of black-backed wood peckers info rest burns of southwestern Idaho"                   
#> [2] "An integrated occupancy and space-usemodel to predict abundance of imperfectly detected, territorial vertebrates"

# in this example, we will use string distance to identify likely duplicates
duplicates_string <- find_duplicates(
  df$title,
  method = "string_osa",
  to_lower = TRUE,
  rm_punctuation = TRUE,
  threshold = 7
)

# we can extract the line numbers from the dataset that are likely duplicated
# this lets us manually review those titles to confirm they are duplicates

manual_checks <- review_duplicates(df$title, duplicates_string)
print(manual_checks)
#>                                                           title matches
#> 1  Few detections of black-backed woodpeckers (Picoides arcticu       4
#> 33 Few detections of Black-backed Woodpeckers (Picoides arcticu       4
#> 2  The persistence of black-backed woodpeckers following delaye      11
#> 35 The persistence of Black-backed Woodpeckers following delaye      11
#> 3  Fire-bird: A gis-based toolset for applying habitat suitabil      12
#> 32 FIRE-BIRD: A GIS-based toolset for applying habitat suitabil      12
#> 4  Harvesting interacts with climate change to affect future ha      14
#> 36 Harvesting interacts with climate change to affect future ha      14
#> 5  Novel function of flutter display in the black-backed woodpe      15
#> 34 NOVEL FUNCTION OF FLUTTER DISPLAY IN THE BLACK-BACKED WOODPE      15
#> 6  Tag-team takeover: Usurpation of woodpecker nests by western      17
#> 37 TAG-TEAM TAKEOVER: USURPATION OF WOODPECKER NESTS BY WESTERN      17
#> 7  An integrated occupancy and space-use model to predict abund      21
#> 38 An integrated occupancy and space-usemodel to predict abunda      21
#> 8  Contribution of Unburned Boreal Forests to the Population of      33
#> 40 Contribution of unburned boreal forests to the population of      33
#> 9  Drill, baby, drill: The influence of woodpeckers on post-fir      34
#> 39 Drill, baby, drill: the influence of woodpeckers on post-fir      34
#> 10 The role of wildfire, prescribed fire, and mountain pine bee      37
#> 42 The Role of Wildfire, Prescribed Fire, and Mountain Pine Bee      37
#> 11 Influence of old coniferous habitat on nestling growth of bl      39
#> 45 Influence of old coniferous habitat on nestling growth of Bl      39
#> 12  Roost sites of the Black-backed Woodpecker in burned forest      40
#> 44  ROOST SITES OF THE BLACK-BACKED WOODPECKER IN BURNED FOREST      40
#> 13 Occurrence patterns of black-backed woodpeckers in green for      41
#> 46 Occurrence patterns of Black-backed Woodpeckers in green for      41
#> 14 Habitat availability for multiple avian species under modele      42
#> 41 Habitat availability for multiple avian species under modele      42
#> 15 A comparison of avian habitat in forest management plans pro      43
#> 43 A Comparison of Avian Habitat in Forest Management Plans Pro      43
#> 16 Lethal procyrnea infection in a black-backed woodpecker (pic      46
#> 47 LETHAL PROCYRNEA INFECTION IN A BLACK-BACKED WOODPECKER (PIC      46
#> 17 Foraging-habitat selection of black-backed wood peckers info      47
#> 48 FORAGING-HABITAT SELECTION OF BLACK-BACKED WOODPECKERS IN FO      47
#> 18 High density nesting of black-backed woodpeckers (picoides a      48
#> 50 High Density Nesting of Black-backed Woodpeckers (Picoides a      48
#> 19 Pre-fire forest conditions and fire severity as determinants      49
#> 51 Pre-fire forest conditions and fire severity as determinants      49
#> 20 Modeling nest survival of cavity-nesting birds in relation t      50
#> 52 Modeling Nest Survival of Cavity-Nesting Birds in Relation t      50
#> 21 Occupancy modeling of black-backed woodpeckers on burned sie      51
#> 53 Occupancy modeling of Black-backed Woodpeckers on burned Sie      51
#> 22  Netguns: A technique for capturing Black-backed Woodpeckers      52
#> 49  Netguns: a technique for capturing Black-backed Woodpeckers      52
#> 23 Reproductive success of the black-backed woodpecker (Picoide      58
#> 54 Reproductive success of the black-backed woodpecker (Picoide      58
#> 24 Modeling the effects of environmental disturbance on wildlif      59
#> 55 Modeling the effects of environmental disturbance on wildlif      59
#> 25 Influences of postfire salvage logging on forest birds in th      61
#> 56 Influences of postfire salvage logging on forest birds in th      61
#> 26 Nest success of black-backed woodpeckers in forests with mou      66
#> 58 Nest success of black-backed woodpeckers in forests with mou      66
#> 27 The ecological importance of severe wildfires: Some like it       67
#> 57 The ecological importance of severe wildfires: some like it       67
#> 28 Boreal forest landbirds in relation to forest composition, s      72
#> 60 Boreal forest landbirds in relation to forest composition, s      72
#> 29 Avian communities of mature balsam fir forests in Newfoundla      87
#> 62 Avian communities of mature balsam fir forests in Newfoundla      87
#> 30 Immediate post-fire nesting by Black-backed Woodpeckers, Pic      90
#> 63 Immediate post-fire nesting by black-backed woodpeckers, Pic      90
#> 31 Composition of Bird Communities Following Stand-Replacement       91
#> 64 Composition of bird communities following stand-replacement       91
#> 59                              2006 May species count of birds      99
#> 61                             2002 May species count for birds      99
#> 65 Black-backed three-toed wood-pecker, Picoides arcticus, pred     140
#> 66 Black-backed three-toed woodpecker, Pieoides arcticus, preda     140

# the titles under match #99 are not duplicates, so we need to keep them both
# we can use the override_duplicates function to manually mark them as unique
new_duplicates <- synthesisr::override_duplicates(duplicates_string, 99)

# now we can extract unique references from our dataset
# we need to pass it the dataset (df) and the matching articles (new_duplicates)
results <- extract_unique_references(df, new_duplicates)

Write bibliographic files

To facilitate exporting results to other platforms after assembly and deduplication, synthesisr can write bibliographic data to .ris or .bib files. Optionally, write_refs can write directly to a text file stored locally.


# synthesisr can write the full dataset to a bibliographic file
# but in this example, we will just write the first citation
# we also want it to be a nice clean bibliographic file, so we remove NA data
# this makes it easier to view the output when working with a single article
citation <- df[1,!is.na(df[1,])]

format_citation(citation)
#>                                                                                                                                                              1 
#> "Tingley, M.W. et al. (2020) Black-Backed Woodpecker Occupancy in Burned and Beetle-Killed Forests: Disturbance Agent Matters. Forest Ecology and Management."

write_refs(citation,
  format = "bib",
  file = FALSE
)
#>  [1] "@ARTICLE{1,"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
#>  [2] "database={Scopus},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
#>  [3] "document_type={Article},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
#>  [4] "source_type={JOUR},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
#>  [5] "author={Tingley, M.W. and Stillman, A.N. and Wilkerson, R.L. and Sawyer, S.C. and Siegel, R.B.},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
#>  [6] "address={Ecology & Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, Unit 3043, Storrs, CT  06269, United States and The Institute for Bird Populations, P.O. Box 1346, Point Reyes StationCA  94956, United States and USDA Forest Service, Pacific Southwest Region, Vallejo, CA  94592, United States},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
#>  [7] "year={2020},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
#>  [8] "title={Black-backed woodpecker occupancy in burned and beetle-killed forests: Disturbance agent matters},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
#>  [9] "source={Forest Ecology and Management},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
#> [10] "volume={455},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
#> [11] "article_number={117694},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
#> [12] "abstract={In the western United States, the black-backed woodpecker (Picoides arcticus) is a “snag specialist”, found predominantly in burned montane forests. While fire is a key disturbance agent in this system, recently, unprecedented large tracts of drought-stressed forest in the Sierra Nevada and Southern Cascades of California have succumbed to bark beetle outbreaks. Although this tree mortality could potentially be a boon for snag-dependent species, it is unclear whether the resulting snag forests provide sufficiently high-quality habitat for black-backed woodpeckers and other wildlife that are regionally associated with burned forests. We tested for differences in black-backed woodpecker occupancy between fire- and beetle-killed forests, and whether key environmental relationships driving woodpecker occupancy differed between stands affected by the two disturbance agents. Between 2016 and 2018, we surveyed for black-backed woodpeckers during 4448 surveys at 75 burned and 113 beetle-killed forest stands throughout the black-backed woodpecker's range in California, detecting at least one black-backed woodpecker on 448 surveys (16.2%) in burned forests and 115 surveys (6.8%) in beetle-killed forests. Controlling for a suite of environmental variables that can affect habitat quality, the odds of black-backed woodpeckers occurring in burned forests were predicted to be 12.6 times higher than in beetle-killed forest. Occupancy declined with time-since-disturbance in fire-killed but not beetle-killed forests, but occupancy increased similarly with snag density resulting from either disturbance agent. Across our broad study region, black-backed woodpeckers were more likely to occur in burned forests at higher latitudes and elevations; these patterns were even stronger in beetle-killed forests, where we found woodpeckers only at the more northerly and higher elevation sites. Our results demonstrate that for this disturbed-habitat specialist, disturbance agent matters; black-backed woodpeckers do not use habitat created by bark beetle outbreaks as readily as habitat created by fire. Given the likely increased magnitude and extent of bark beetle outbreaks in the future, further work is needed to assess the role of beetle-killed forests in longer-term population dynamics of black-backed woodpeckers beyond the first decade after disturbance, and to investigate whether these results can be generalized to other fire-associated wildlife species in the region. © 2019 Elsevier B.V.},"
#> [13] "keywords={Bark beetle; California; Drought; Habitat; Occupancy; Picoides arcticus; Wildfire},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
#> [14] "doi={10.1016/j.foreco.2019.117694},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
#> [15] "url={https://www.scopus.com/inward/record.uri?eid=2-s2.0-85074800841&doi=10.1016%2fj.foreco.2019.117694&partnerID=40&md5=0eda2e05b4ee01a795e8eb2dd2ec45bb},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
#> [16] "notes={Export Date: 11 January 2020},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
#> [17] "filename={/private/var/folders/s4/06ssj1yx0wgfpx500t9vp6b00000gn/T/RtmpN2kzwE/Rinst241c139586b6/synthesisr/extdata//scopus.ris},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
#> [18] "n_duplicates={1},"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
#> [19] "}"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
#> [20] ""