Wed Dec 28 16:51:25 2022

Introduction

We detected horizontally transferred (HT) segments in the P. sonneborni MIC genome by searching for regions that show an excess of sequence similarity with a distantly related species.

The MIC genome was cut into 500 bp-long windows, which were compared (BLASTN) to all available MAC and MIC genomes from the 13 aurelia species.

We first searched for windows with strong evidence of recent HT:

Then, starting from these ‘seed windows’, we extended the HT segment by aggregating neighbor windows having their best hits on the same scaffold of the non-sister species

Among candidate HT segments identified by Florian, we retained the ones that match our filtering criteria:

# Minimal number of seed windows
MinSeedWindows=4

# Minimal fraction of windows having a best hit on the same scaffold
MinFractionBestHits=0.7

# Focus on HT segments inferred from the analysis of MAC genomes
MAC_Filter=TRUE

NB: we detected many PSON windows having their best hits on MIC-specific regions of the PSEX genome. However, given that the MIC genome of PJEN is not available (only the MAC), we cannot check whether the similarity detected between PSON and PSEX MIC-specific regions is higher than with PJEN => we only retained candidate windows having their best BLAST hit on MAC genomes.

Results

In total, we identified 521 HT segments in 471 MIC contigs. An example of HT segment is shown below:

Fig. 1: Similarity profile along Pson MIC contig 00093

This figure shows the profile of sequence similarity along the contig00093 (110 kb). Colored lines represent the level of sequence identity of the best BLASTN hit of the query sequence (500 bp windows of contig00093) with each of the target genomes. In this contig, most of windows have their best hit on the MAC genome of PQUAD (orange). These hits are located on scaffold_0022 of the PQUAD MAC genome assembly. The orange thick line below the profile indicate the detected HT segment (i.e. segment of PSON contig00093 containing at least 4 ‘seed windows’ and in which >70% of windows have their best match on PQUAD scaffold_0022).

This profile shows that the highest similarity is detected with PQUAD, and then with sister species of PQUAD (PTRED, PNOV), and the lowest similarity is detected with the PSEX, PJEN and PSON (MAC) genomes. This implies that the direction of the transfer was from the PQUAD lineage towards the PSON lineage.

The bottom plot show the profile of MAC and MIC sequencing read depth along contig00093. In that contig, the MAC read depth is very low, which indicates that this contig is located in the MIC-specific compartment.

We provide below some summary information on HT segments (their length distribution, their distribution in MIC-specific vs MAC-destined regions), and provide links to interesting examples of HT segments:

Length distribution of HT segments

The cumulated length of HT segments is 9560.5 kb. There are 95 contigs with more than 30.5 kb of HT segments (representing 50% of the total HT length). The longest HT segment is 118.5 kb long.

Distribution of HT segments in MIC-specific vs MAC-destined regions

To determine whether HT segments are located in MAC-destined or in MIC-specific regions, we measured the MAC read depth (normalized by the mean read depth in MAC-destined regions) and the MIC read depth (normalized by the mean read depth in MIC-specific regions), and computed the ratio MAC read depth/MIC read depth.

The distribution of this ratio indicates is shown below:

Contigs with the longest HT segments

In total, we identified 521 HT segments in 471 MIC contigs. I present below the 33 contigs with the longest HT segments (representing 25% of the total length of HT segments). The longest HT segments are more than 100 kb-long.

You can click on the contig name to view the location of HT segments in the contig, and its profile of similarity with P. aurelia genomes:

Pson_contig NbSegments LgHT_kb DonorSpecies RatioMacMic
contig00046 1 118.5 :ptre 0.04
contig00093 1 107.0 :pqua 0.04
contig00131 1 101.5 :pqua 0.04
contig00047 3 97.5 :pqua 0.04
contig00015 2 88.0 :psex:ptre 0.03
contig00211 2 85.0 :psex 0.04
contig00119 2 84.5 :pqua 0.03
contig00253 1 84.0 :pqua 0.04
contig00240 1 83.5 :psex 0.04
contig00073 1 80.5 :ptre 0.04
contig00249 1 80.0 :pqua 0.05
contig00023 2 75.0 :psex:pqua 0.03
contig00264 2 74.0 :psex 0.05
contig00306 1 74.0 :ptre 0.04
contig00397 1 70.0 :pqua 0.03
contig00414 1 70.0 :pqua 0.02
contig00335 1 68.5 :ptre 0.04
contig00481 1 65.0 :ptre 0.05
contig00011 2 64.5 :psex:ptre 0.03
contig00388 2 63.5 :psex 0.03
contig00116 1 62.5 :pqua 0.02
contig00250 4 61.5 :psex:ptre 0.02
contig00178 3 60.0 :ptre:pqua 0.05
contig00605 1 60.0 :ptre 0.05
contig00606 1 60.0 :psex 0.04
contig00600 1 58.5 :pqua 0.05
contig00664 1 57.5 :pqua 0.06
contig00230 3 56.0 :psex:pqua 0.02
contig00456 1 55.5 :psex 0.04
contig00734 1 55.5 :ptre 0.05
contig00412 2 55.0 :ptre 0.02
contig00705 2 55.0 :psex 0.03
contig00724 2 55.0 :pqua 0.04

Contigs with HT segments coming from distinct lineages

Several contigs (N=15) have received HT segments that derive from distinct lineages:

Pson_contig NbSegments LgHT_kb DonorSpecies RatioMacMic
contig00015 2 88.0 :psex:ptre 0.03
contig00023 2 75.0 :psex:pqua 0.03
contig00011 2 64.5 :psex:ptre 0.03
contig00250 4 61.5 :psex:ptre 0.02
contig00178 3 60.0 :ptre:pqua 0.05
contig00230 3 56.0 :psex:pqua 0.02
contig00362 2 43.0 :psex:ptre 0.01
contig00172 2 42.0 :ptre:pqua 0.02
contig00027 2 35.0 :psex:ptre 0.03
contig00873 2 34.0 :psex:ptre 0.06
contig01207 2 27.0 :ptre:pnov 0.01
contig00820 2 23.5 :psex:ptre 0.02
contig00282 2 21.5 :psex:ptre 0.03
contig00525 2 18.5 :psex:ptre 0.01
contig03175 2 10.0 :psex:pqua 0.01

Psonn contigs with HT segments in MAC-destined regions

The vast majority (97%) of HT segments are located in MIC-specific regions. There are however 16 HT segments with a ratio MAC/MIC>0.5, i.e. located in MAC-destined regions (MAC-constitutive or MAC-variable). Most of these HT segments are quite short (max=11.5 kb):

Pson_contig ContigLength NbSegments SegStart SegEnd LgHT_kb DonorSpecies RatioMacMic
contig01703 35379 1 11501 23000 11.5 :pqua 0.92
contig02752 23839 1 11001 22000 11.0 :ptre 0.81
contig00726 55893 1 1 10000 10.0 :ptre 0.74
contig01697 35423 1 24001 32500 8.5 :psex 1.04
contig01936 32372 1 1 8500 8.5 :psex 0.52
contig04884 11743 1 3001 11500 8.5 :ptre 0.92
contig03962 16010 1 8501 16000 7.5 :psex 1.02
contig06091 7870 1 1 7500 7.5 :pqua 0.94
contig06306 7334 1 1 7000 7.0 :pqua 0.75
contig06598 6669 1 1 6500 6.5 :pqua 0.80
contig05433 9840 1 2001 8000 6.0 :ptre 0.61
contig07340 5239 1 1 4500 4.5 :ptre 0.62
contig05439 9812 1 1 3500 3.5 :pqua 0.80
contig06456 6931 1 1 3500 3.5 :pqua 0.62
contig02903 22816 1 19501 22500 3.0 :pqua 0.63
contig00722 55960 1 51001 53500 2.5 :pqua 1.01

We mapped these segments in the Psonn MAC genome assembly: 2 segments are not present in the MAC assembly (probably MAC-variable regions). The 14 other segments correspond to 10 regions in the MAC genome (the MIC genome assembly is highly fragmented; hence, some HT regions are fragmented in several contigs in the MIC genome assembly). Thus, in the end we only have 10 HT segments in MAC-destined regions.

Most of these HT segments (8/10) are located at an extremity of the corresponding MAC scaffold.

These HT regions located in MAC-destined regions include 48 annotated genes (based on the annotation of the the MAC genome assembly). We performed a sequence similarity search (BLASTP) against proteins of each of the 13 P. aurelia species + 2 outgroups (PCAUD and PMUL). We retained the two best hits in each species (excluding self-hits for PSONN). For each gene, we computed a multiple alignment of the detected homologs and inferred the phylogenetic tree.

Among the 48 annotated genes in HT segments of the Pson MAC assembly, 4 do not have any significant BLASTP hit. In a majority of the other cases (31/44), the annotated genes are truncated compared to their homologs. This suggests that the vast majority of ‘genes’ HT in MAC-destined regions correspond to pseudogenes that have been mis-annotated as genes.

Most phylogenic trees (25/44 genes, representing 8/10 segments) indicate that Pson was the recipient of the HT event. The other gene trees were not informative regarding the direction of the HT: either the tree was unresolved (n=4) or showed no congruence with the species tree (owing to multiple HTs or to complex patterns of duplicationd/losses).
During hybridization events, recombination between orthologous loci may lead to the replacement of the original allele by an allele from the other species (introgression). However, given the large genetic distance between Pson and the other species with which we find evidence of HT, allelic recombination seems a priori unlikely. Furthermore, the patterns that we observe do not fit with the hypothesis that the original Psonn allele has been replaced by the new allele from the donor species:

  • most (31/44) acquired genes are pseudogenized
  • most of the HT segments (8/10) are located at the extremity of MAC chromosomes
  • For 23 of these genes (representing 8 of the 10 HT segments, including the 2 HT segments that are not at a chromosome end), phylogenetic analysis indicates that full-length orthologous genes are present on other scaffolds of the Pson MAC assembly (generally with conserved gene neighborhood).

Thus, these HT segments correspond to genomic regions of the donor species that have been translocated on Psonn chromosomes (generally at the end of MAC chromosomes).

Details on these 48 genes (or pseudogenes) located in these 10 MAC HT segments are provided in file Pson_MAC_HT_segments.summary.xlsx (with information on conserved synteny and ohnology relationships).