2.A: Enrichr & rbioapi

Moosa Rezwani

2024-03-30


1 Introduction

Enrichr is a popular gene-set enrichment analysis tool developed in the Ma’ayan Lab.


2 Gene set library concept in Enrichr

Directly quoting from Enrichr’s help page:

A gene set library is a set of related gene sets or enrichment terms […] These libraries have been constructed from many sources such as published studies and major biological and biomedical online databases. Others have been created for and only available through Enrichr.

(source: https://maayanlab.cloud/Enrichr/help#background)

To get a list of the available libraries in Enrichr, use:

enrichr_libs <- rba_enrichr_libs()

In the returned data frame, you can find the names of available Enrichr libraries in “libraryName” column. As you will see in the following sections, you can use these names to request an enrichment analysis based on the selected library or libraries.


3 Enrichment analysis using Enrichr

To perform enrichment analysis on your gene-set with Enrichr using rbioapi, you can take two approaches. We will begin with the simple one. But first, we create a vector of genes’ NCBI IDs to use as the input example in this article.

# Create a vector with our genes' NCBI IDs
genes <- c("p53", "BRCA1", "cdk2", "Q99835", "CDC42","CDK1","KIF23","PLK1",
           "RAC2","RACGAP1","RHOA","RHOB", "PHF14", "RBM3", "MSL1")

3.1 Approach 1: Using the one-step Wrapper function

The only required input for this function is to simply supply your gene-set as a character vector. Optionally you can also select one or more libraries. Please see rba_enrichr() function’s manual for more details on the arguments.

# Request the enrichment analysis
results_all <- rba_enrichr(gene_list = genes)

Note that the default value for the argument gene_set_library in the rba_enrichr function is “all”. This means that if you call the function as above, all of the Enrichr libraries will be used for the enrichment analysis of your uploaded gene list. In this case, you will have a named list, where each of its elements is a dataframe containing your genes’ analysis results using that Enrichr library.

Alternatively, you can use the gene_set_library argument to specify the library (or libraries) to use. Here we demonstrate using “MSigDB_Hallmark_2020” library:

# Request the enrichment analysis by a specific library
results_msig_hallmark <- rba_enrichr(gene_list = genes,
                                     gene_set_library = "MSigDB_Hallmark_2020")

When supplying the gene_set_library argument, rbioapi assumes you are entering a regex pattern. You can disable this by setting regex_library_name to FALSE. However, this feature is useful if you need -for example- partial matches in the library names. Suppose you want to perform the enrichment analysis on every library available in Enrichr that contains the name “MSig”. You can do the following:

# Request the enrichment analysis
results_msig <- rba_enrichr(gene_list = genes,
                            gene_set_library = "msig",
                            regex_library_name = TRUE)

# You can drop `regex_library_name = TRUE`, as it is TRUE by default.

Note that when only one Enrichr library is selected, a data frame with enrichment analysis result will be returned.

str(results_msig_hallmark)
#> 'data.frame':    18 obs. of  9 variables:
#>  $ Term                : chr  "Mitotic Spindle" "G2-M Checkpoint" "E2F Targets" "Apoptosis" ...
#>  $ Overlap             : chr  "5/199" "4/200" "4/200" "3/161" ...
#>  $ P.value             : num  2.57e-07 1.22e-05 1.22e-05 2.17e-04 2.74e-03 ...
#>  $ Adjusted.P.value    : num  4.62e-06 7.29e-05 7.29e-05 9.76e-04 9.87e-03 ...
#>  $ Old.P.value         : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Old.Adjusted.P.value: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ Odds.Ratio          : num  51 36.7 36.7 31.4 29.7 ...
#>  $ Combined.Score      : num  774 416 416 265 175 ...
#>  $ Genes               : chr  "CDC42;RACGAP1;PLK1;CDK1;KIF23" "RACGAP1;PLK1;CDK1;KIF23" "RACGAP1;PLK1;CDK1;BRCA1" "CDK2;BRCA1;RHOB" ...

But when multiple libraries have been selected, the function’s output will be a list where each element is a data frame corresponding to one of the selected libraries.

str(results_msig, 1)
#> List of 3
#>  $ MSigDB_Computational       :'data.frame': 195 obs. of  9 variables:
#>  $ MSigDB_Oncogenic_Signatures:'data.frame': 26 obs. of  9 variables:
#>  $ MSigDB_Hallmark_2020       :'data.frame': 18 obs. of  9 variables:

3.2 Approach 2: Going step-by-step

rba_enrichr() is a wrapper function. It internally executes a sequence of functions necessary to run your analysis. Alternatively, you could go step by step. We demonstrate these steps in this section.

First, you need to retrieve the list of available Enrichr libraries. This step is optional. You can skip it if you already know the name of your desired libraries or if you want to run the analysis over every available library.

# Get a list of available Enrichr libraries
libs <- rba_enrichr_libs(store_in_options = TRUE)

Now, you need to upload your genes list to Enrichr. By this, an identifier will be assigned to your submitted list, which is needed for the next step.

# Submit your gene-set to enrichr
list_id <- rba_enrichr_add_list(gene_list = genes)

From the returned response, we need the numeric ID in the “userListId” element.

str(list_id)
#> List of 2
#>  $ shortId   : chr "2f1139e6d5f2cbe2e8ae7577f2e7a3ec"
#>  $ userListId: int 71015774

Finally, we are ready to submit the enrichment analysis request to Enrichr. Same as explained above for the wrapper function rba_enrichr(), we can supply the “gene_set_library” argument in different ways. Here we will only select the “Table_Mining_of_CRISPR_Studies” library:

# Request the analysis
results_crispr <- rba_enrichr_enrich(user_list_id = list_id$userListId,
                                      gene_set_library = "Table_Mining_of_CRISPR_Studies")

4 Working with Other Species

Enrichr also provides libraries for model organisms. The following functions have an organism argument that allows you to perform the analysis on species other than humans:

  1. rba_enrichr()

  2. rba_enrichr_enrich()

  3. rba_enrichr_gene_map()

  4. rba_enrichr_libs()

The available options for the organism argument are human”, (H. sapiens & M. musculus), fly” (D. melanogaster), “yeast” (S. cerevisiae), “worm” (C. elegans), and “fish” (D. rerio).


5 See also in Functions’ manuals

Some rbioapi Enrichr functions were not covered in this vignette, be sure to check their manuals:


6 How to Cite?

To cite Enrichr (Please see https://maayanlab.cloud/Enrichr/help#terms):

To cite rbioapi:


7 Over-representation analysis Using Other Services

Other services supported by rbioapi also provide Over-representation analysis tools. Please see the vignette article Do with rbioapi: Over-Representation (Enrichment) Analysis in R (link to the documentation site) for an in-depth review.


9 Session info

#> R version 4.3.3 (2024-02-29 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: Europe/Brussels
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rbioapi_0.8.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.35     R6_2.5.1          fastmap_1.1.1     xfun_0.43        
#>  [5] magrittr_2.0.3    cachem_1.0.8      knitr_1.45        htmltools_0.5.8  
#>  [9] rmarkdown_2.26    lifecycle_1.0.4   DT_0.32           cli_3.6.2        
#> [13] sass_0.4.9        jquerylib_0.1.4   compiler_4.3.3    httr_1.4.7       
#> [17] rstudioapi_0.16.0 tools_4.3.3       curl_5.2.1        evaluate_0.23    
#> [21] bslib_0.6.2       yaml_2.3.8        htmlwidgets_1.6.4 rlang_1.1.3      
#> [25] jsonlite_1.8.8    crosstalk_1.2.1