# Quick Start

If you want a very quick look at pagoo and start playing with pangenome objects, this is a short tutorial to show the concept. Let’s start by loading a Campylobacter spp. dataset, included in the package.

library(pagoo, quietly = TRUE, warn.conflicts = FALSE) # Load package
rds <- system.file('extdata', 'campylobacter.RDS', package = 'pagoo')
campy <- load_pangenomeRDS(rds) # Load pangenome

Now that the object (campy) is loaded, we can start by querying it. pagoo was developed considering that in a pangenome each individual gene belongs to a given organism, and is assigned to a cluster of orthologous. So those variables are interconnected, but each of them can have metadata associated that is specific to each of them, i.e.: an individual gene can have coordinates inside a genome, but this doesn’t apply to a whole cluster, and a given organism has, for instance, a host where it was isolated from, but this information doesn’t apply to an individual gene.

## Basic Fields

So this 3 variables are 3 separate tables that can be queried:

campy$organisms ## DataFrame with 7 rows and 8 columns ## org id strain year country host ## <factor> <character> <character> <integer> <character> <character> ## 1 16244_6_6 FR15 2008/170h 2008 France Human ## 2 16244_6_18 FR27 2012/185h 2012 France Human ## 3 17059_2_16 AR1 99/801 1999 Argentina Bovine ## 4 17059_2_23 AR8 04/875 2004 Argentina Bovine ## 5 17059_2_27 AR12 06/195 2006 Argentina Bovine ## 6 17150_1_73 CA1 001A-0374 2005 Canada Human ## 7 17059_2_42 TW6 1830 2008 Taiwan Human ## source accession ## <character> <character> ## 1 Feces ERS672247 ## 2 Blood ERS672259 ## 3 Prepuce ERS739235 ## 4 Fetus ERS739242 ## 5 VM ERS739246 ## 6 Blood ERS686652 ## 7 Blood ERS739261 (Tip: To see all fields and methods, in any R console type campy$ and press the [TAB] key two times.)

This dataset consist in 7 Campylobacter spp genomes. For each organism, you have a row with associated metadata. The first column, org, indicates the organism.

campy$clusters ## DataFrame with 2588 rows and 2 columns ## cluster Pfam_Arch ## <factor> <character> ## 1 group0001 2-Hacid_dh_C ## 2 group0002 2-Hacid_dh_C;2-Hacid.. ## 3 group0003 2-Hacid_dh_C;ACT;2-H.. ## 4 group0004 2Fe-2S_thioredx ## 5 group0005 4HB_MCP_1;MCPsignal ## ... ... ... ## 2584 group2584 zf-RING_7 ## 2585 group2585 zf-TFIIB ## 2586 group2586 ZinT ## 2587 group2587 ZnuA ## 2588 group2588 ZT_dimer The $clusters field returns a table with metadata associated to each group of orthologous, in this case is the Pfam architecture domain (second column).

The last, and most important field is $genes, which returns a list of DataFrame with information given for each individual gene, grouped by cluster. We let the user to inspect this field by him/herself. campy$genes

The first 3 columns (cluster, org, and gene) are the glue that interconnects each of 3 “variables”.

Another useful field is $pan_matrix, which returns a matrix with gene abundance for each cluster (columns), and each organism (rows). ## Basic Methods pagoo objects contain basic methods to analyze the pangenome, from general statistics to some basic plotting capabilities. Some of these methods can also take arguments. For example: campy$dist(method = "bray")
##             16244_6_6 16244_6_18 17059_2_16 17059_2_23 17059_2_27 17150_1_73
## 16244_6_18 0.06594656
## 17059_2_16 0.12122816 0.12500000
## 17059_2_23 0.09622745 0.09632517 0.07632399
## 17059_2_27 0.09245937 0.11310008 0.10311629 0.08230990
## 17150_1_73 0.08203991 0.09034444 0.13624408 0.12275937 0.12999735
## 17059_2_42 0.08920705 0.09927089 0.14532148 0.12682137 0.13706919 0.09518600

Or:

campy$gg_barplot() ## Sequence Manipulation One of the main advantages of using pagoo is the ability to very easily manipulate sequences. Sequences are stored as a List of DNAStringSet from Biostrings package. campy$sequences
## DNAStringSetList of length 2588
## [["group0001"]] 16244_6_6__16244_6_6_00150=ATGGCGATAACAGTTTATTACGACAAAGATTGCG...
## [["group0002"]] 16244_6_6__16244_6_6_01290=ATGAAAATAGTATGCTTAGATGCCGACACGCTTG...
## [["group0003"]] 16244_6_6__16244_6_6_01710=ATGAAAACAGTTATAGTTTGCGATGCAATACATC...
## [["group0004"]] 16244_6_6__16244_6_6_01754=ATGAAATTCGAATTTACTCATGAGCAATTATCGG...
## [["group0005"]] 16244_6_6__16244_6_6_00049=ATGTCAAATTTAACTACTAACTTAACTACCAAAA...
## [["group0006"]] 16244_6_6__16244_6_6_01069=ATGAATTATTTTGAGAATTTAAAAGTTTCAACAA...
## [["group0007"]] 16244_6_6__16244_6_6_01612=ATGCGAATTAGAATTTATTATGAAGATACCGATG...
## [["group0008"]] 16244_6_6__16244_6_6_01679=ATGATGAAAGATATGGGCGAGCCACGTATAAAAA...
## [["group0009"]] 16244_6_18__16244_6_18_01216=ATGGGGCTTACTACGAGTACGACAAAGTATAT...
## [["group0010"]] 16244_6_6__16244_6_6_00758=ATGAAAAGAGTGGTTATAAAAGTAGGCTCTCACG...
## ...
## <2578 more elements>