Discovery of Structural Features in RNA by Computational Methods

Jacob V. Maizel, Jr.
Laboratory of Experimental and Computational Biology
National Cancer Institute/National Institutes of Health
Frederick Cancer Research and Development Center
Frederick, MD 21702
ÉTATS-UNIS
E-Mail: jmaizel@ncifcrf.gov

Single stranded nucleic acid molecules readily fold into complex structures based in part on the formation of duplex regions between complementary regions, and higher order interactions. It has long been a challenge to use nucleic acid sequence information to predict RNA structure. Efforts by many groups to develop computer programs for this purpose have been partially successful. We have used a number of computational techniques to discover regions in sequences where potential structural features are located, that in many cases correlate with functional sites known from experimental studies.

The workhorse approach is to use dynamic programming methods to fold successive overlapping segments of various fixed lengths to calculate the free energy of the most stable structure. For each segment a number of randomizations are performed at the same base composition and minimum energies calculated and averaged. A z-score is calculated by dividing the difference between the real sequence energy and the average of random sequence energies, by the standard deviation of the random sequence energies. Plots of these data along the sequence reveal regions of unusual difference from what would be expected for a random sequence of the same composition as the segment. When unusual features are found using this program, called SIGSTB, additional procedures are used to predict potential structures.

SEGFOLD employs a systematic search with automatically varying segment size to define the limits of the unusual folding regions(UFRs). EFFOLD explores alternative structures as the folding energy rules are perturbed. COMFOLD examines the covariation in paired bases between folding of phylogenetically related sequences, if available, or alternative energy foldings to produce a "consensus" folded structure. Favorable tertiary interactions between loops and open strands, loosely called pseudoknots, are searched using RNAKNOT. Particular structural motifs are sometimes extracted and searched against databases using the RNAMOT program of Gautheret and Laferriere. In some cases three-dimensional, atomic scale models are built manually, or using the RNA2D3D program, and refined using molecular mechanical/molecular dynamical programs.

Using these tools a number of interesting features in viral and cellular RNAs have been found, including retroviral rev-response elements(RRE) and frame-shift sites, picornavirus internal ribosome entry sites(IRES), cellular IRESs and other unusual features in 5' and 3'-untranslated regions(UTRs) of mRNAs.

Results representing the work of Shu-yun Le and other colleagues will be presented to demonstrate some of these features. As time permits a brief sketch may be given of other bioinformatics groups in the Laboratory, some of which may be seen at http://www.lecb.ncifcrf.gov.

Retour au programme