Cell atlases such as Tabula Muris and Tabula Sapiens are multi-organ single cell omics data sets describing entire organisms. A cell atlas approximation is a lossy and lightweight compression of a cell atlas that can be streamed via the internet.
This project enables biologists, doctors, and data scientist to quickly find answers for questions such as:
NOTE: These questions can be also asked in R, Python, JavaScript or in a language agnostic manner using the REST API (see https://atlasapprox.readthedocs.io).
To install the R interface of atlasapprox
from CRAN,
use:
To use the package, you must first load it:
Now you have all atlasapprox
functions available.
The easiest way to explore atlas approximations is to query a list of available organisms:
## [1] "a_queenslandica" "a_thaliana" "c_elegans" "c_gigas"
## [5] "c_hemisphaerica" "d_melanogaster" "d_rerio" "f_vesca"
## [9] "h_miamia" "h_sapiens" "h_vulgaris" "i_pulchra"
## [13] "l_minuta" "m_leidyi" "m_murinus" "m_musculus"
## [17] "n_vectensis" "o_sativa" "p_crozieri" "p_dumerilii"
## [21] "s_lacustris" "s_mansoni" "s_mediterranea" "s_pistillata"
## [25] "s_purpuratus" "t_adhaerens" "t_aestivum" "x_laevis"
## [29] "z_mays"
Once you know what species you are interested in, you can explore the list of organs from that species for which an atlas approximation is available:
## [1] "bladder" "blood" "colon" "eye" "fat" "gut"
## [7] "heart" "kidney" "liver" "lung" "lymphnode" "mammary"
## [13] "marrow" "muscle" "pancreas" "prostate" "salivary" "skin"
## [19] "spleen" "thymus" "tongue" "trachea" "uterus"
The next level of zoom is to query the list of cell types that make up an organ of choice, e.g.:
## [1] "neutrophil" "basophil" "monocyte"
## [4] "macrophage" "dendritic" "B"
## [7] "plasma" "T" "NK"
## [10] "plasmacytoid" "goblet" "AT1"
## [13] "AT2" "club" "ciliated"
## [16] "basal" "serous" "mucous"
## [19] "arterial" "venous" "capillary"
## [22] "CAP2" "lymphatic" "fibroblast"
## [25] "alveolar fibroblast" "smooth muscle" "vascular smooth muscle"
## [28] "pericyte" "mesothelial" "ionocyte"
NOTE: Although cell atlases aim to cover all cell types from a tissue, rare types might be missing because of limited sampling or inaccurate annotation. If you think a cell type is missing from a tissue, please contact fabio DOT zanini AT unsw DOT edu DOT au.
If you have some genes you are interested in, you can query their expression across cell types in the organ of choice:
expression <- GetAverage(organism = 'h_sapiens', organ = 'Lung', features = c('PTPRC', 'COL1A1'))
print(expression)
## PTPRC COL1A1
## neutrophil 2.231271e+01 0.014522638
## basophil 2.443684e+00 0.005077871
## monocyte 7.794549e+00 0.003399504
## macrophage 2.801027e+00 0.002812853
## dendritic 4.313318e+00 0.013302779
## B 3.000779e+00 0.000000000
## plasma 4.200674e-01 0.009642163
## T 1.051312e+01 0.009203196
## NK 1.143152e+01 0.063305810
## plasmacytoid 2.168309e+00 0.000000000
## goblet 1.898965e-01 0.145349205
## AT1 9.707276e-02 0.109001435
## AT2 1.457898e-01 0.058521412
## club 3.052110e-01 0.071080528
## ciliated 2.264476e-01 0.060997065
## basal 2.570614e-01 0.064534329
## serous 3.813045e-01 0.000000000
## mucous 0.000000e+00 0.116527453
## arterial 1.409595e-01 0.031918123
## venous 3.115328e-01 0.007172978
## capillary 1.500604e-01 0.004225238
## CAP2 1.768180e-01 0.022919910
## lymphatic 2.947334e-04 0.000000000
## fibroblast 5.332901e-02 10.089125633
## alveolar fibroblast 1.934833e-01 4.771382809
## smooth muscle 5.999142e-01 2.049613953
## vascular smooth muscle 4.121004e-01 2.203665972
## pericyte 6.380575e-01 0.038223870
## mesothelial 5.869431e-01 1.449272752
## ionocyte 5.413984e-01 0.000000000
You can also request not only the average level of expression, but the fraction of cells within each type that express the gene:
fraction_expressing <- GetFractionDetected(organism = 'h_sapiens', organ = 'Lung', features = c('PTPRC', 'COL1A1'))
print(fraction_expressing)
## PTPRC COL1A1
## neutrophil 0.92528737 0.011494253
## basophil 0.65014577 0.002915452
## monocyte 0.93330902 0.002186589
## macrophage 0.94777960 0.004276316
## dendritic 0.94303799 0.009493670
## B 0.60919541 0.000000000
## plasma 0.36567163 0.014925373
## T 0.93114001 0.003825555
## NK 0.95454544 0.007575758
## plasmacytoid 0.72222221 0.000000000
## goblet 0.16710876 0.172413796
## AT1 0.07109005 0.085308060
## AT2 0.11589766 0.115569651
## club 0.10886320 0.047206167
## ciliated 0.11627907 0.069767445
## basal 0.12568556 0.049360145
## serous 0.20000000 0.000000000
## mucous 0.00000000 0.375000000
## arterial 0.05347594 0.010695187
## venous 0.10236221 0.007874016
## capillary 0.05678023 0.002672011
## CAP2 0.05975395 0.003514939
## lymphatic 0.02127660 0.000000000
## fibroblast 0.03116883 0.890909076
## alveolar fibroblast 0.07098766 0.595678985
## smooth muscle 0.11206897 0.534482777
## vascular smooth muscle 0.11250000 0.500000000
## pericyte 0.13145539 0.014084507
## mesothelial 0.29411766 0.764705896
## ionocyte 0.21052632 0.000000000
To get a list of all available features (e.g. genes) for an organism, you can use:
genes <- GetFeatures(organism = 'h_sapiens')
# To show just the first 20 genes
print(head(genes, 20))
## [1] "A1BG" "A1BG-AS1" "A1CF" "A2M" "A2M-AS1"
## [6] "A2ML1" "A2ML1-AS1" "A2ML1-AS2" "A2MP1" "A3GALT2"
## [11] "A4GALT" "A4GNT" "AAAS" "AACS" "AACSP1"
## [16] "AADAC" "AADACL2" "AADACL2-AS1" "AADACL3" "AADACL4"
Each cell type expressed specific genes that contribute to its unique biological function, called markers. To request a list of markers for your cell type of choice:
markers <- GetMarkers(organism = 'h_sapiens', organ = 'Lung', cell_type = 'fibroblast', number = 5)
print(markers)
## [1] "MFAP5" "PI16" "RPL10P6" "EEF1A1P11" "RPL7P9"
NOTE: There are multiple methods to compute marker genes. The current version of the API uses one specific method, but future versions aim to give the user choice as of which method they prefer.
If you’re interested in knowing which cell types express your gene of interest the most, across all organs:
highest_expressors <- GetHighestMeasurement(organism = 'h_sapiens', feature = 'PTPRC', number = 5)
print(highest_expressors)
## Cell type Organ Average
## 1 neutrophil fat 32.03100
## 2 neutrophil blood 23.67732
## 3 neutrophil spleen 23.54425
## 4 neutrophil prostate 23.49050
## 5 neutrophil trachea 23.17156
If you want to find other features (genes) that show similar expression patterns to a feature of interest. To get a list of similar features for your gene of choice:
similar_genes <- GetSimilarFeatures(organism = 'h_sapiens', organ = 'lung', feature = 'PTPRC', number = 5, method = 'correlation')
print(similar_genes)
## Similar features distances
## 1 LCP1 0.01616174
## 2 CD53 0.03778654
## 3 WAS 0.04407340
## 4 HCLS1 0.05033290
## 5 ARHGAP30 0.05041230
NOTE: There are multiple methods to compute feature similarity. The available methods are:
atlasapprox
relies upon available cell atlases kindly
released for public use:
We are grateful to all authors above for their help and committment to open science.
To get the data sources in the package, call:
NOTE: Although the original cell type annotations of these data sets are mostly preserved, a quality check is performed before computing approximations. During this step, some cell types might be filtered out, renamed, or split into multiple subannotations. If you found a problem in the data that indicates misannotations, please reach out to fabio DOT zanini AT unsw DOT edu DOT au and we will endeavour to fix it.