Purpose Participants Meetings Publications Calls Reports Private Contact

Associated Team HELIX-USP

Report for 2005

Various visits in both directions happened in 2005, or will still happen before the end of the year.

Following Cristina Gomes Fernandes' visit to France for 7 weeks in December 2004 (visit funded by the INRIA for the stay and a Brazilian project including only the combinatorics group at the USP for the flight ticket), Eric Tannier (CR2 INRIA) went to Brazil for approximately 2 weeks to attend GRACO 2005 (which was co-organised by people from the combinatorics group), to give a talk at the USP, and to continue the discussions started in December with Cristina on the problem of the transposition distance between two genomes. The complexity of this problem has been open for many years now. Eric and Cristina have made some progress on it. Work will continue on the topic during Cristina's second visit to us later this year, from December 3 to 19. The more general topic of genome rearrangements and dynamics will be investigated also in the next years by a currently Master student of Yoshiko Wakabayashi (coordinator of the associated team), Andrea Nakasato, in close collaboration with the French group.

During Cristina's visit to France in 2004, she also worked with Vincent Lacroix (PhD student with M.-F. Sagot) on the problem of modelling and searching for reaction motifs in metabolic networks. Very informally, metabolic networks are graphs with labels at the nodes (accessorily at the edges also) and reaction motifs in such graphs are multisets of labels connected in the graph (but not necessarily topologically equivalent). Efficiently finding a given motif in a metabolic network is an original and interesting combinatorial problem for which a first algorithm was provided in a paper submitted in 2005 and accepted at WABI [4]. This paper is currently being extended for a Journal version to be submitted towards the end of 2005.

This latter work with Cristina continued when a group of 7 French researchers or students from BAOBAB-HELIX went to Brazil from August 28 to September 10, to visit the associated team at the USP. The 7 visitors were Christian Gautier (Professor), Vincent Lacroix, Leonor Palmeira (PhD student with Laurent Guéguen and Jean Lobry from HELIX), Pierre Peterlongo (PhD student with M.-F. Sagot and Maxime Crochemore from the University of Marne-la-Vallée; associated with BAOBAB), Claire Lemaitre (just starting her PhD with M.-F. Sagot and C. Gautier), Ludovic Cottret (trainee).

In the case of reaction motifs, work continued during this visit with the more difficult problem of inferring motifs given not the motifs themselves, but only some very general properties concerning them, among which the most important is the fact that motifs of potential interest appear approximately repeated at least twice in the metabolic network. The main motivation of looking for repeated reaction motifs in metabolic networks comes from both a functional and an evolutionary perspective. The systematic identification and analysis of regularities such as repeated reaction motifs may indeed help us understand the inner structure of biological systems, if such structure exist, how the systems have been set up over the course of evolution, and what could have been alternative routes followed. The problem of inferring reaction motifs further stresses difficulties that appeared already in the search for a single motif and that we have been eluding until now. These concern providing a sensible way of presenting the occurrences of a motif (numerous in some cases) that is intimately related to the problem of identifying which further constraints/characteristics of the motifs are important for extracting the maximum information from them for evolutionary and functional purposes. Such constraints could concern, for instance, inputs and outputs to the reactions, topology, correlation with genome organisation.

During the group of 7 visit to the USP, all these questions were intensively discussed with Cristina and initial ideas were explored. These ideas are being currently pursued and should lead to a new algorithm and paper in the next months.

This visit was also the occasion to continue work with Alair Pereira do Lago (Professor) who had been visiting our group in France for one month, from June 28 to July 22. During this visit, Alair, who has been doing a lot of work on sequence alignments since starting to be interested in computational biology, interacted essentially with two persons, Pierre Peterlongo and Claire Lemaitre who both were part of the French group that went to Brazil later during the (French) summer. Both are working on alignment or alignment related problems. Pierre has been more concerned with developing lossless filtering techniques that use multiple seeds and are applicable to multiple alignments as opposed to only pairwise ones as has been considered until now. His motivation is to make feasible in both theory and practice the identification of long approximate repeats in huge datasets (represented by a single very long sequence such as a whole genome, or by multiple sequences). Claire is interested in understanding the mechanisms that lead to genome rearrangements such as sequence inversions, transpositions (i.e. sequence displacements along a genome), translocations (exchange of sequences between different chromosomes of an eukaryotic genome), deletions, duplications etc. For this, she intends to closely examine the regions around which breaks in the genome sequence have been observed (so-called breakpoint regions) to see whether these regions contain features that uniquely characterise them. Such features could be repeats of a certain type, or special compositional landscapes, etc. In fact, should such features exist, we have no really clear idea of what could be their nature. In order to find the features, if they exist, the breakpoints must first be precisely localized. For this, an alignment of the potentially broken region in one species must be performed with the two sequences that appear separated along the genome of at least one other species. Since these regions are in general very long, and the breakpoints many, fast techniques must be found and applied, possibly in a multi-step approach, each step representing an increasingly more refined filter enabling to narrow down on the precise location of the breakpoint.

For both Pierre's and Claire's work, the discussions with Alair during his visit to France and ours to Brazil have already lead to very interesting results that are turning into efficient algorithms and that should be validated in the next months. Alair is planning a visit again to France in 2006, possibly around the same period as in 2005. On the same topic of alignment related problems, during the year 2005, Said Sadique Adi defended his PhD (in May 23) on the gene identification problem (genes may be seen as sequences "repeated" across species) done under the supervision of Carlos Eduardo Ferreira in close collaboration with Marie-France Sagot. A paper concerning this work is currently submitted [5].

The visit of 7 French researchers and students to Brazil in August was also the occasion for three persons to present their work at the "Instituto de Matemática e Estatística" (IME) of the USP. These persons were Christian Gautier ("Similarity between neighbors: selection or mutational bias inside genomes"), Leonor Palmeira ("Dinucleotide over- and under-representation in complete bacterial genomes (Bacteria and Archea)") and Pierre Peterlongo ("Lossless filter for finding long multiple approximate repetitions using a new data structure, the bi-factor array"). M.-F. Sagot did an initial general presentation of the work done in HELIX. During this visit, we also took the opportunity to talk again to researchers not directly involved with the combinatorics group but who belong to the IME-USP and have been doing work in the area of computational biology. Notably included among these persons are Eduardo Jordão Neves and Roberto Marcondes Cesar Junior. In February 2006, the latter will be visiting the ENST in Paris (where he did a sabbatical stay in 2001 and with which he maintains collaborations) and we have already invited him to visit us in Lyon. Roberto has done very interesting work in the area of microarray data analysis and gene network inference. We believe a further fruitful collaboration can be established between us.

Before the end of the year, three visits of Brazilian researchers or student are planned: Cristina Gomes Fernandes and Yoshiko Wakabayashi from Dec 3 to 19, and Alair's PhD student, Augusto Vellozo from Dec 3 to 24.

In general, the fact that this long term, sometimes official, at other times officious, collaboration has now been made formal through the establishment of an associated team has tremendously boosted the work, and energy, that the French (Brazilian-born) coordinator of the team had been putting into building links with what is her ex-University whose involvement with computational biology was very timid for many years. The visit of six other French people to Brazil, notably Christian Gautier, head of the "Laboratoire de Biométrie et Biologie Évolutive" of the University of Lyon I, has also enabled to show to non Brazilian persons that the work being done at the University of São Paulo is up to the level of the very good universities in Europe, despite often more difficult scientific and economic conditions. Two PhDs have already been conducted in collaboration between a member of HELIX and the IME-USP (Estela Maris Rodrigues and Said Sadique Adi). More are expected in the coming years. The year 2006 should also see a high level of visits from and to Brazil. Finally, the 2005 visits enabled us already to start (re)establishing strong contacts with ex-PhD students of the combinatorics group who are now at other universities in Brazil, the main one at the current time being the UFMS ("Universidade Federal do Mato Grosso do Sul"). This should lead to further collaborative projects in the years to come.

Reminder of references before 2005

[1] E.M. Rodrigues, M.-F. Sagot, and Y. Wakabayashi, Some approximation results for the maximum agreement forest problem, Proceedings of APPROX-RANDOM 2001 (Approximation, Randomization, and Combinatorial Optimization: Algorithms and Techniques) (M. Goemans and K. Jansen and J.D.P. Rolim abd L. Trevisan, eds.), Lecture Notes in Computer Siences, vol. 2129, Springer-Verlag, 2001, pp. 159--169.
[2] E.M. Rodrigues, M.-F. Sagot, and Y. Wakabayashi, The Maximum Agreement Forest Problem: approximation algorithms and computational experiments, submitted.
[3] M.-F. Sagot and Y. Wakabayashi, Pattern inference under many guises. Recent advances in algorithms and combinatorics, 245--287, CMS Books Math./Ouvrages Math. SMC, 11, Springer, New York, 2003.

References for 2005

[4] V. Lacroix, C. G. Fernandes, and M.-F. Sagot, Reaction motifs in metabolic networks, Proceedings of WABI 2005 (Workshop on Algorithms in BioInformatics) (R. Casadio and G. Myers, eds.), Lecture Notes in BioInformatics (LNBI), topical subseries of Lecture Notes in Computer Siences, vol. 3692, Springer-Verlag, 2005, pp. 178--191.
[5]R. Tavares, S. S. Adi, P. Blayo, and M.-F. Sagot, Utopia: an exact generic core algorithm for gene prediction using homology, submitted.