Accessing Publication Data

Helen Miller

2020-08-21

DataSpace maintains a curated collection of relevant publications, which can be accessed through the Publications page through the app. Some publications laos include datasets which can be downloaded as a zip file. DataSpaceR provides an interface for browsing publications in DataSpace and downloading publication data where available.

Browsing publications in DataSpace

The DataSpaceConnection object includes methods for browsing and downloading publications and publication data.

library(DataSpaceR)
library(data.table)
con <- connectDS()
con
#> <DataSpaceConnection>
#>   URL: https://dataspace.cavd.org
#>   User: jkim2345@scharp.org
#>   Available studies: 260
#>     - 76 studies with data
#>     - 4994 subjects
#>     - 423195 data points
#>   Available groups: 6
#>   Available publications: 1403
#>     - 1 publications with data

The DataSpaceConnection print method summarizes the publications and publication data. More details about publications can be accessed through con$availablePublications.

knitr::kable(head(con$availablePublications[, -"link"]))
publication_id first_author title journal publication_date pubmed_id related_studies studies_with_data publication_data_available
1006 Abbink P Construction and evaluation of novel rhesus monkey adenovirus vaccine vectors J Virol 2015 Feb 25410856 NA NA FALSE
1303 Abbott RK Precursor frequency and affinity determine B Cell competitive fitness in germinal centers, tested with germline-targeting HIV vaccine immunogens Immunity 2018 Jan 16 29287996 NA NA FALSE
1136 Abu-Raddad LJ Analytic insights into the population level impact of imperfect prophylactic HIV vaccines J Acquir Immune Defic Syndr 2007 Aug 1 17554215 NA NA FALSE
842 Acharya P Structural definition of an antibody-dependent cellular cytotoxicity response implicated in reduced risk for HIV-1 infection J Virol 2014 Nov 25165110 NA NA FALSE
1067 Ackerman ME Polyfunctional HIV-specific antibody responses are associated with spontaneous HIV control PLOS Pathog 2016 Jan 26745376 NA NA FALSE
1076 Ackerman ME Systems serology for evaluation of HIV vaccine trials Immunol Rev 2017 Jan 28133810 NA NA FALSE

This table summarizes all publications, providing some information like first author, journal where it was published, and title as a data.table. It also includes a pubmed url where available under link. Related studies under related_studies, and related studies with data available under studies_with_data. We can use data.table methods to filter and sort this table to browse available publications.

For example, we can filter this table to view only publications related to a particular study:

vtn096_pubs <- con$availablePublications[grepl("vtn096", related_studies)]
knitr::kable(vtn096_pubs[, -"link"])
publication_id first_author title journal publication_date pubmed_id related_studies studies_with_data publication_data_available
250 Huang Y Selection of HIV vaccine candidates for concurrent testing in an efficacy trial Curr Opin Virol 2016 Jan 28 26827165 mrv144, vtn096 NA FALSE
267 Huang Y Predictors of durable immune responses six months after the last vaccination in preventive HIV vaccine trials Vaccine 2017 Feb 22 28131393 mrv144, vtn096, vtn205 NA FALSE
268 Huang Y Statistical methods for down-selection of treatment regimens based on multiple endpoints, with application to HIV vaccine trials Biostatistics 2017 Apr 1 27649715 mrv144, vtn096 NA FALSE
1392 Pantaleo G Safety and immunogenicity of a multivalent HIV vaccine comprising envelope protein with either DNA or NYVAC vectors (HVTN 096): a phase 1b, double-blind, placebo-controlled trial Lancet HIV 2019 Oct 7 31601541 vtn096 NA FALSE
1420 Westling T Methods for comparing durability of immune responses between vaccine regimens in early-phase trials Stat Methods Med Res 2019 Jan 9 30623732 vtn094, vtn096 NA FALSE
281 Yates NL HIV-1 envelope glycoproteins from diverse clades differentiate antibody responses and durability among vaccinees J Virol 2018 Mar 28 29386288 mrv144, vtn096 NA FALSE

or publications that have related studies with integrated data in DataSpace:

pubs_with_study_data <- con$availablePublications[!is.na(studies_with_data)]
knitr::kable(head(pubs_with_study_data[, -"link"]))
publication_id first_author title journal publication_date pubmed_id related_studies studies_with_data publication_data_available
213 Andrasik MP Exploring barriers and facilitators to participation of male-to-female transgender persons in preventive HIV vaccine clinical trials. Prev Sci 2014 Jun 23446435 vtn505 vtn505 FALSE
218 Andrasik MP Behavioral risk assessment in HIV Vaccine Trials Network (HVTN) clinical trials: A qualitative study exploring HVTN staff perspectives Vaccine 2013 Sep 13 23859840 vtn069, vtn404, vtn502, vtn503, vtn504, vtn505, vtn802, vtn903, vtn906, vtn907 vtn505 FALSE
225 Andrasik MP Bridging the divide: HIV prevention research and Black men who have sex with men Am J Public Health 2014 Apr 24524520 vtn505 vtn505 FALSE
226 Arnold MP Sources of racial/ethnic differences in HIV vaccine trial awareness: Exposure, attention, or both? Am J Public Health 2014 Aug 24922153 vtn505 vtn505 FALSE
7 Asbach B Potential to streamline heterologous DNA prime and NYVAC/protein boost HIV vaccine regimens in rhesus macaques by employing improved antigens J Virol 2016 Mar 28 26865719 cvd277, cvd281 cvd277, cvd281 FALSE
1394 Boppana S Cross-reactive CD8 T-cell responses elicited by Ad5-based HIV-1 vaccines contributed to early viral evolution in vaccine recipients who became infected J Virol 2019 Oct 23 31645444 vtn502, vtn505 vtn505 FALSE

We can also use this information to connect to related studies and pull integrated data. Say we are interested in this Rouphael (2019) publication:

rouphael2019 <- con$availablePublications[first_author == "Rouphael NG"]
knitr::kable(rouphael2019[, -"link"])
publication_id first_author title journal publication_date pubmed_id related_studies studies_with_data publication_data_available
1390 Rouphael NG DNA priming and gp120 boosting induces HIV-specific antibodies in a randomized clinical trial J Clin Invest 2019 Sep 30 31566579 vtn105 vtn105 FALSE

We can find this publication in the available publications table, determine related studies, and pull data for those studies where available.

related_studies <- rouphael2019$related_studies
related_studies
#> [1] "vtn105"
rouphael2019_study <- con$getStudy(related_studies)
rouphael2019_study
#> <DataSpaceStudy>
#>   Study: vtn105
#>   URL: https://dataspace.cavd.org/CAVD/vtn105
#>   Available datasets:
#>     - BAMA
#>     - Demographics
#>     - ICS
#>     - NAb
#>   Available non-integrated datasets:
dim(rouphael2019_study$availableDatasets)
#> [1] 4 4

We can see that there are datasets available for this study. We can pull any of them using rouphael2019_study$getDataset().

Downloading Publication Data

DataSpace also includes publication datasets for some publications. The format of this data will vary from publication to publication, and is stored in a zip file. The publication_data_available field specifies publications where publication data is available.

pubs_with_data <- con$availablePublications[publication_data_available == TRUE]
knitr::kable(head(pubs_with_data[, -"link"]))
publication_id first_author title journal publication_date pubmed_id related_studies studies_with_data publication_data_available
1461 Westling T Causal isotonic regression J R Stat Soc 2020 NA vtn044, vtn052, vtn060, vtn063, vtn064, vtn068, vtn069, vtn070, vtn080, vtn100, vtn204 NA TRUE

Data for a publication can be accessed through DataSpaceR with con$downloadPublicationData(). The publication ID must be specified, as found under publication_id in con$availablePublications. The file is presented as a zip file. The unzip argument gives us the option whether to unzip this file. By default, the file will be unzipped. You may also specify the directory to download the file. By default, it will be saved to your Downloads directory. This function invisibly returns the paths to the downloaded files.

Here, we download data for publication with ID 1461 (Westling, 2020), and view the resulting downloads.

publicationDataFiles <- con$downloadPublicationData("1461", outputDir = tempdir(), unzip = TRUE, verbose = TRUE)
basename(publicationDataFiles)
#> [1] "causal.isoreg.fns.R"           "CD.SuperLearner.R"            
#> [3] "cd4_analysis.R"                "cd4_data.csv"                 
#> [5] "cd8_analysis.R"                "cd8_data.csv"                 
#> [7] "README.txt"                    "Westling_1461_file_format.pdf"

All zip files will include a file format document as a PDF, as well as a README. These documents will give an overview of the remaining contents of the files. In this case, data is separated by CD8+ T-cell responses and CD4+ T-cell responses, as described in the README.txt.

cd4 <- fread(publicationDataFiles[grepl("cd4_data", publicationDataFiles)])
cd4
#>       pub_id age    sex      bmi   prot num_vacc dose response sexFemale studyHVTN052
#>   1: 069-071  23   Male 33.31000 vtn069        3  4.0        0         0            0
#>   2: 069-068  20 Female 32.74000 vtn069        3  4.0        1         1            0
#>   3: 069-020  19   Male 20.24000 vtn069        3  4.0        0         0            0
#>   4: 069-008  19   Male 29.98000 vtn069        3  4.0        0         0            0
#>   5: 069-058  27   Male 31.48000 vtn069        3  4.0        0         0            0
#>  ---                                                                                 
#> 368: 100-182  36 Female 24.32323 vtn100        4  1.5        0         1            0
#> 369: 100-034  20   Male 25.15590 vtn100        4  1.5        1         0            0
#> 370: 100-115  20   Male 22.94812 vtn100        4  1.5        1         0            0
#> 371: 100-144  19   Male 20.95661 vtn100        4  1.5        0         0            0
#> 372: 100-189  24   Male 19.03114 vtn100        4  1.5        1         0            0
#>      studyHVTN068 studyHVTN069 studyHVTN204 studyHVTN100 vacc_type vacc_typeSinglePlasmid
#>   1:            0            1            0            0   VRC4or6                      0
#>   2:            0            1            0            0   VRC4or6                      0
#>   3:            0            1            0            0   VRC4or6                      0
#>   4:            0            1            0            0   VRC4or6                      0
#>   5:            0            1            0            0   VRC4or6                      0
#>  ---                                                                                     
#> 368:            0            0            0            1   VRC4or6                      0
#> 369:            0            0            0            1   VRC4or6                      0
#> 370:            0            0            0            1   VRC4or6                      0
#> 371:            0            0            0            1   VRC4or6                      0
#> 372:            0            0            0            1   VRC4or6                      0
#>      vacc_typeVRC4or6
#>   1:                1
#>   2:                1
#>   3:                1
#>   4:                1
#>   5:                1
#>  ---                 
#> 368:                1
#> 369:                1
#> 370:                1
#> 371:                1
#> 372:                1

This publication also includes analysis scripts used for the publication, which can allow users to reproduce the analysis and results.