UV-targeted dinucleotides are not depleted in
light-exposed Prokaryotic genomes

Online reproduction of the figures from the paper by

Laboratoire BBE CNRS UMR 5558
INRIA Helix Project, Univ. C. Bernard - LYON I
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE

Abstract: We have investigated the hypothesis that pyrimidine dinucleotides are avoided in light-exposed genomes, as the result of selective pressure due to high UV exposure. The main damage to DNA produced by UV radiation is known to be the formation of pyrimidine photoproducts: it is estimated that about ten dimers per minute are formed in an Escherichia coli chromosome exposed to the ultraviolet light in direct overhead sunlight at sea level. It is also known that on an Escherichia coli chromosome exposed to UVb wavelengths (290 to 320 nm), pyrimidine photoproducts are formed in the following proportions: 59% TpT, 7% CpC, and 34% CpT plus TpC. We have analyzed all available complete prokaryotic genomes and the model organism Prochlorococcus marinus, and have found that pyrimidine dinucleotides are not systematically avoided. This suggests that prokaryotes must have sufficiently effective protection and repair systems for UV exposure not to affect their dinucleotide composition.

Contact: Palmeira, L.

Figure 1 -- Density of phototargets, weighted by their frequency in the E. coli chromosome, and calculated for different G+C contents and for three kinds of random genomes. The weights are as follows: 0.59*f(tt)+0.34*[f(tc)+f(ct)]+0.07*f(cc) (where f(xy) is the frequency of dinucleotide xy in the specified genome). Three models of random genomes are analyzed. In the worst case (solid curve), the genome is the concatenation of a sequence of pyrimidines and a sequence of purines: all pyrimidines are involved in a pyrimidine dinucleotide. In the best case (dotted curve), the genome is an unbroken succession of pyrimidine-purine dinucleotides: no pyrimidine is involved in a pyrimidine dinucleotide. In the "random case" (dashed curve), the frequency of a pyrimidine dinucleotide is the result of chance (f(xy) = f(x)*f(y)).

Figure 2 -- Plot of the mean z-score statistics for intergenic sequences (x-axis) and for coding sequences (y-axis), for each of the four pyrimidine dinucleotides. On each plot, a dot corresponds to the mean of these two statistics in a given prokaryote chromosome. The null x and y axis (dotted lines), and the 5% limits of significance for the standard normal distribution (dashed lines) are plotted as benchmarks. It should be noted note (see Fig 3), that the variability within one chromosome is sometimes as great as that between different chromosomes.

#
# Figure 2
#

par(mfrow=c(2,2),mar=c(2,2,2,2))
zinterg <- read.table('/www/htdocs/members/lobry/repro/Leo/data/uv/zinterg')
zcodonCDS <- read.table('/www/htdocs/members/lobry/repro/Leo/data/uv/zcodonCDS')
plot(zinterg$CT,zcodonCDS$CT,xlab="intergenic",ylab="coding",main="CpT bias",las=1,ylim=c(-6,4),xlim=c(-3,3),pch=21,cex=0.5,bg=rgb(0.5,0.5,0.5),col=rgb(0.25,0.25,0.25,0.25))
abline(v=0,lty=3)
abline(h=0,lty=3)
abline(h=-1.96,lty=2)
abline(h=+1.96,lty=2)
abline(v=-1.96,lty=2)
abline(v=+1.96,lty=2)
plot(zinterg$TC,zcodonCDS$TC,xlab="intergenic",ylab="coding",main="TpC bias",las=1,ylim=c(-6,4),xlim=c(-3,3),pch=21,cex=0.5,bg=rgb(0.5,0.5,0.5),col=rgb(0.25,0.25,0.25,0.25))
abline(v=0,lty=3)
abline(h=0,lty=3)
abline(h=-1.96,lty=2)
abline(h=+1.96,lty=2)
abline(v=-1.96,lty=2)
abline(v=+1.96,lty=2)
plot(zinterg$CC,zcodonCDS$CC,xlab="intergenic",ylab="coding",main="CpC bias",las=1,ylim=c(-6,4),xlim=c(-3,3),pch=21,cex=0.5,bg=rgb(0.5,0.5,0.5),col=rgb(0.25,0.25,0.25,0.25))
abline(v=0,lty=3)
abline(h=0,lty=3)
abline(h=-1.96,lty=2)
abline(h=+1.96,lty=2)
abline(v=-1.96,lty=2)
abline(v=+1.96,lty=2)
plot(zinterg$TT,zcodonCDS$TT,xlab="intergenic",ylab="coding",main="TpT bias",las=1,ylim=c(-6,4),xlim=c(-3,3),pch=21,cex=0.5,bg=rgb(0.5,0.5,0.5),col=rgb(0.25,0.25,0.25,0.25))
abline(v=0,lty=3)
abline(h=0,lty=3)
abline(h=-1.96,lty=2)
abline(h=+1.96,lty=2)
abline(v=-1.96,lty=2)
abline(v=+1.96,lty=2)

Figure 3 -- Each figure shows the distributions of the z-score in all coding sequences corresponding to each of the three strains of Prochlorococcus marinus. In each figure, the distribution for the MED4 (a high-light adapted strain) is shown as a solid line; the distribution for the SS120 (a low-light adapted strain) is shown as a dashed line, and the distribution for the MIT 9313 (a low-light adapted strain) is shown as a dotted line. The 5% limits of significance for the standard normal distribution (dashed vertical lines) are plotted as benchmarks.

#
# Figure 3
#

BX5cds <- read.table('/www/htdocs/members/lobry/repro/Leo/data/uv/zrhotheo_cds_BX548175') #MED4 strain
AEcds <- read.table('/www/htdocs/members/lobry/repro/Leo/data/uv/zrhotheo_cds_AE017126')  #SS120 strain
BX4cds <- read.table('/www/htdocs/members/lobry/repro/Leo/data/uv/zrhotheo_cds_BX548174') #MIT9313 strain

par(mfrow=c(2,2),mar=c(2,2.5,2,2))
plot(density(BX4cds[,8]),ylim=c(0,0.4),xlim=c(-4,4),lty=3,lwd=1,main="CpT bias",xlab="",ylab="",las=1)
lines(density(AEcds[,8]),lwd=1,lty=2)
lines(density(BX5cds[,8]),lwd=1,lty=1)
abline(v=c(-1.96,1.96),lty=5)
plot(density(BX4cds[,14]),ylim=c(0,0.4),xlim=c(-4,4),lty=3,lwd=1,main="TpC bias",xlab="",ylab="",las=1)
lines(density(AEcds[,14]),lwd=1,lty=2)
lines(density(BX5cds[,14]),lwd=1,lty=1)
abline(v=c(-1.96,1.96),lty=5)
plot(density(BX4cds[,6]),ylim=c(0,0.4),xlim=c(-4,4),lty=3,lwd=1,main="CpC bias",xlab="",ylab="",las=1)
lines(density(AEcds[,6]),lwd=1,lty=2)
lines(density(BX5cds[,6]),lwd=1,lty=1)
abline(v=c(-1.96,1.96),lty=5)
plot(density(BX4cds[,16]),ylim=c(0,0.4),xlim=c(-4,4),lty=3,lwd=1,main="TpT bias",xlab="",ylab="",las=1)
lines(density(AEcds[,16]),lwd=1,lty=2)
lines(density(BX5cds[,16]),lwd=1,lty=1)
abline(v=c(-1.96,1.96),lty=5)

If you have any problems or comments, please contact Palmeira, L..

UV-targeted dinucleotides are not depleted in light-exposed Prokaryotic genomes

UV-targeted dinucleotides are not depleted in
light-exposed Prokaryotic genomes