Purpose Participants Meetings Publications Calls Reports Private Contact

Associated Team HELIX-USP


Report for 2006

Various visits in both directions have taken place again in 2006, or are going to take place before the end of the year.

The visits in the direction Brazil to France concerned Said Sadique Adi (two weeks in January), Fabio Henrique Viduani Martinez (three weeks in July) and Alair Pereira do Lago (one week in early September). The visits in the other direction will take place from November 25 to December 10, 2006, and will involve a group of ten persons from HELIX. These include Christian Gautier (head of the Laboratory of Biometry and Evolutionary Biology - LBBE - of the University Claude Bernard) and the French-side coordinator of ArcoIris (Marie-France Sagot), as well as eight PhD students from HELIX, two originally from Brazil and doing their PhD (regular or sandwich - a "sandwich" PhD student spends approximately one year of his/her doctoral studies outside Brazil) in France, Marília Dias Vieira Braga and Paulo Gustavo Soares da Fonseca, one Chilean, Vicente Acuña, and five French, Élise Billoir, Claire Lemaitre, Ludovic Cottret, Pierre Peterlongo and Emmanuel Prestat. This group will be visiting the USP (8 days), but also the Federal University of Mato Grosso do Sul (6 days) where Said and Fabio, who are ex-PhD students of the USP group, hold now permanent positions as associate professors. The Federal University of Mato Grosso do Sul (UFMS) is located, as its name indicates, in the state of Mato Grosso do Sul, more precisely in its capital, Campo Grande.

Said was a sandwich student with HELIX in 2003-2004. He defended his PhD under the supervision of Carlos Eduardo Ferreira (Carlinhos) from the USP group in May 2005. We are now working, together also with Fabio, on some alignment and motif problems. These problems were discussed during both Said's and Fabio's visits to France and will be continued during the French group's visit to Campo Grande. The motif problem in particular may be related to the Steiner tree problem on which Fabio has worked during his PhD at the USP. More precisely, the problem concerns, given a set S of N strings, identifying a minimal set W of p <= N words, such that each has a substring in a distinct element of S that is at an edit distance at most k from an element of W. This problem is known to be hard in general, but we believe that it can be profitably formulated in terms of variants of the Steiner tree problem which could lead to computationally and, hopefully, also biologically interesting approximation resolutions. Carlinhos will also participate in this work, which should lead to a publication, as well as possibly Nalvo Franco de Almeida Junior from the UFMS, ex-PhD student of João Carlos Setubal then at the University of Campinas and who, with João Meidanis, pioneered the field of computational biology in Brazil.

Alair's visit to France in early September, although short, was very productive. Collaborative work with him is well advanced as concerns filtering techniques for identifying long repeats in whole genomes. This work is done with Pierre Peterlongo, and with another long-term Italian collaborator (Nadia Pisanti). A first paper has been submitted in 2006 ([6]) and a second is in preparation ([7]) and should be submitted at the end of 2006 (with also a student of Alair, Gustavo Akio Tominaga Sacomoto, as participant). The latter work has already led to an algorithm, Ed'Nimbus, that may be used through the web at this address. This work will be continued in 2007, notably in an extended collaboration with SYMBIOSE from the IRISA where Pierre Peterlongo is now postdoc, and who was one of the partners in the ARC IBN coordinated by HELIX. During his postdoc, Pierre will indeed keep working with filtering techniques for finding long repeats. His postdoc will be in collaboration with another group who has applied to become an INRIA team, SEQUOIA of Hélène Touzet and Grégory Kucherov. Pierre will work mostly with Grégory and Laurent Noé, associate professor at the University of Lille I. Finally, the work on filters for sequence analysis is also being conducted for its statistical aspects in collaboration with Frédérique Bassino from the University of Marne-la-Vallée where Pierre Peterlongo did his PhD under the co-supervision of Marie-France Sagot and Maxime Crochemore, and will be done in close relation with the PhD work of Marc Deloger (co-supervisors: Cristina Vieira from the LBBE and Marie-France Sagot) on the relation between repeats along a genome and function (for instance, regulation of gene expression).

Since 2005, Alair has also started participating in the work of Claire Lemaitre on the precise identification of breakpoints (points along a genome that were broken following a genomic rearrangement such as an inversion, transposition, translocation, duplication, deletion, fusion, or fission) and analysis of the sequences around the breakpoints. The latter is motivated by trying to find whether such regions have characteristics that would help understand the mechanisms underlying possibly each different type of rearrangement. Will also be involved in this work Augusto Fernandes Vellozo, who visited the HELIX team at the end of 2005 and who is already seeking for funds to do a one-year postdoc with HELIX starting in March or April 2007 (Augusto will defend his PhD under the supervision of Alair in February 2007).

During their visit to HELIX in December 2005, Cristina Gomes Fernandes and Yoshiko Wakabayashi started working with Marília on the problem of computing the inversion distance between two genomes in the presence of duplicated genes. Few methods exist to address this problem. The main known ones are the exemplar distance model of David Sankoff ("Genome rearrangement with gene families", Bioinformatics, 15:909-917, 1999) and David Bryant ("The complexity of calculating exemplar distances", in "Comparative Genomics", pages 207-212, 2000), the work of Guillaume Fertin (e.g. "Genomes containing Duplicates are Hard to compare" (extended abstract), LNCS Vol. 3992, pages 783--790, 2006) and the matching model of Tao Jiang and co-authors ("Assignment of orthologous genes via genome rearrangement", IEEE/ACM Trans. on Comp. Biology and Bioinformatics, 2:302--315, 2005). A preliminary text proving that one very constrained version of the problem is already hard was written in 2006 and the work will pick up again during the visit of part of the HELIX team to the USP. The USP team has already started discussing it during a local meeting among its members that involved also, besides Cristina and Yoshiko, Carlos Eduardo Ferreira, Said, Fabio and another professor of the UFMS, Marco Aurélio Stefanes. Clearly, rearrangement problems, by their very combinatorial nature, are quite attractive to people from the USP combinatorics group and will hopefully continue to be developed in common.

Cristina Gomes Fernandes has also been involved in the work on reaction motifs in metabolic networks of Vincent Lacroix, PhD student in the HELIX group. This has already led to a conference paper in 2005 ([4]). This work has been extended, in particular as regards its complexity characteristics, into a journal paper submitted and accepted in 2006 ([8]). One open problem remained that will be addressed in another paper in 2007. Still open also is the question of efficiently and pertinently clustering the motifs output according to some criterion (for instance, topology). HELIX hopes also to get the USP group, notably Cristina, interested in: 1. the problem of aligning whole metabolic graphs that Ludovic Cottret will have to address as part of his PhD in collaboration with another HELIX PhD student, Laurent Canet and in co-supervision by Hubert Charles from the laboratory BF2I of the INSA-Lyon, and 2. in the work of analysing the impact of genomic rearrangements on biochemical networks that is Vicente Acuña's PhD topic in collaboration with various other members of HELIX. Both involve hard algorithmic and combinatorial problems.

Other new collaborations that HELIX hopes to develop during the trip to Brazil of various of their members include also:

The first collaborations can be particularly interesting for Christian Gautier, Paulo Fonseca, Emmanuel Prestat and Marie-France while the second will concern mainly Christian and Élise Billoir. Élise will also visit a researcher at the FioCruz in Rio de Janeiro, Aloysio da S. Ferrão Filho de la Fiocruz, during the second week of the HELIX stay in Brazil. It is hoped that this will also lead to a collaboration. The possible collaboration with Chile will be developed through a visit to his country by Vicente Acuña (who did his Master with Alejandro Maass) who will stay in Latin America until Jan. 17. He will in particular interact with Eduardo Moreno, from the University of Chile. During their visit to Brazil, the French group will also meet João Meidanis, professor at the University of Campinas, and head of a start-up, Scylla.

As a final observation, the work on a rearrangement distance between phylogenetic trees of genomes done with Yoshiko Wakabayashi and Estela Maris Rodrigues ([2]) has been accepted for publication in 2006 in Theoretical Computer Science.


Reminder of references for 2005

[4] V. Lacroix, C. G. Fernandes, and M.-F. Sagot, Reaction motifs in metabolic networks, Proceedings of WABI 2005 (Workshop on Algorithms in BioInformatics) (R. Casadio and G. Myers, eds.), Lecture Notes in BioInformatics (LNBI), topical subseries of Lecture Notes in Computer Siences, vol. 3692, Springer-Verlag, 2005, pp. 178--191.
[5] R. Tavares, S. S. Adi, P. Blayo, and M.-F. Sagot, Utopia: an exact generic core algorithm for gene prediction using homology, submitted.


References for 2006

[6] P. Peterlongo, N. Pisanti, F. Boyer, A. P. do Lago, M.-F. Sagot, Lossless filter for multiple repetitions, submitted.
[7] P. Peterlongo, N. Pisanti, A. P. do Lago, M.-F. Sagot, Lossless Filter for Long Multiple Repetitions with Edit Distance, to be submitted (a preliminary version appears as a Technical Report from the University of Pisa).
[8] V. Lacroix, C. G. Fernandes, M.-F. Sagot, Motif Search in Graphs: Application to Metabolic Networks, accepted at IEEE/ACM Trans. on Comp. Biology and Bioinformatics, 2006.