Direction des Relations Internationales (DRI)

Programme INRIA "Equipes Associées"

I. DEFINITION

EQUIPE ASSOCIEE

SIMBIOSI
sélection
2009

Équipe INRIA: BAMBOO Organisme étranger partenaire : Free Amsterdam University Organisme étranger partenaire : University of Rome "La Sapienza"
Unité de recherche INRIA : Rhône-Alpes Grenoble
Thème INRIA : BIO A
Pays : The Netherlands Pays : Italy
 
 
Coordinateur français
Coordinateur étranger
Coordinateur étranger
Nom, prénom Sagot Marie-France Stougie Leen Marchetti-Spaccamela Alberto
Grade/statut DR2 Full professor Full professor
Organisme d'appartenance
(précisez le département et/ou le laboratoire)
INRIA Rhône-Alpes Grenoble Free University Amsterdam and CWI, Amsterdam Dipartimento di Informatica e Sistemistica, Università di Roma "La Sapienza"
Adresse postale LBBE, Université Claude Bernard, 69622 - Villeurbanne cedex P.O. Box 94079, NL-1090 GB, Amsterdam, The Netherlands Via Ariosto 25, 00185 Roma, Italy
URL http://www.inrialpes.fr/helix/people/sagot/ http://www.win.tue.nl/~leen/ (web page at Eindhoven Technical University where L. Stougie will be until Nov. 2008) http://www.dis.uniroma1.it/~alberto/
Téléphone +33 (0)4 72 44 82 38 +31 20 5924045 +39 06 77274021
Télécopie +33 (0)4 72 43 13 88   +39 06 77274002
Courriel Marie-France.Sagot@inria.fr stougie@cwi.nl alberto@dis.uniroma1.it


La proposition en bref

Titre de la thématique de collaboration (en français et en anglais) :
Exploration mathématique et algorithmique de la symbiose
Mathematical and algorithmic investigation of symbiosis

Descriptif :
The scientific objectives of the proposal center around a mathematical and algorithmic investigation of close and long-term relations observed between different biological species, the so-called symbiotic relations that involve a symbiont and its host.
The huge variety in such types of relations is mirrored by a huge variety of genomic and biochemical landscapes inside the symbiont world, and at the interface between symbionts and hosts. The purpose of this proposal is to combinatorially explore those landscapes at the molecular level, that is at the level of the genome (rearrangements) and of two of the main types of biochemical networks that may be reconstructed from the sequenced genomes of symbionts and hosts. Such networks are the metabolic and protein-protein interaction (PPI) networks. The final objective is to try to relate the contours of the landscapes to the modus operandi of the symbiotic relation, thereby offering a hope of better understanding the latter, in particular its evolution and environmental impact.
Graph (tree) combinatorics and algorithmics underlie each of these problems, as well as issues related to random graph enumeration under certain models to improve confidence in the evolutionary and co-evolutionary scenarii inferred.

Présentation détaillée de l'Équipe Associée

1. Objectifs scientifiques de la proposition

The symbiosis issue is vast and complex. The project will focus on two questions concerning the evolution of symbionts, one at the genome level, namely the studies of rearrangements, and one at the biochemical network level and interface between genome and network. The evolution of symbionts may be largely dependent on the evolution of their hosts. We therefore address also the question of the evolution of the intimate relations themselves by studying the co-cladogenesis (co-speciation) of hosts and symbionts, and more generally their co-evolution, that is the mutual evolutionary influence they exert on each other.

On the methodological front, symbionts and the computational problems they raise have been the specific object of study of very few papers. A subgroup of the proteobacteria phylum, the gamma-proteobacteria (which includes Escherichia coli), has thus been used as illustration for a gene order and phylogenetic reconstruction [1], while phylogenetic co-evolution of host-parasites has been more extensively studied from a statistical and algorithmic point of view (see [2] and [3] for surveys) but remains frought with problems [4] and is able to address only simplified models because of the underlying combinatorics of the question. The group of the late Reinhardt Heinrich (Humboldt University, Berlin) has been the most prolific in developing methods for comparing the metabolic capabilities of different species of bacteria [5] and relating this to the bacteria's environment [6] but the group has developed computationally naive approaches that in particular have avoided to explicitely address such hard problems as the presence of numerous cycles in metabolic networks. The relation with evolutionary processes has also been little explored by the authors. Finally, we know of no current systematic algorithmic method specifically made for comparatively analysing the PPI networks of symbionts and hosts. A recent analysis of Buchnera aphidicola as compared to Escherichia coli [7] was performed using a method that is generic and questionable [8] because it is based on a definition of modules that is purely structural, uses a hard to justify modularity function and is not guaranteed to be optimal.

The present collaborative project is exploratory. We know there is great variety in the gene order and genome dynamics of different symbionts, a puzzling relationship between these and other features of a genome, a great variety in the characteristics of the symbionts biochemical networks that probably bear a complicated relation with both the genomes and the different environments (hosts) in which the species live, and a complex co-evolutionary history. The objective is to develop combinatorial methods that extensively investigate the variety of the landscapes observed given formally established models for genome and network evolution, compare them to one another and to what could be expected in the absence of environmental constraints and evolutionary forces.

The proposal is divided into three main parts described below.

Symbiont genome evolution and dynamics

The aim is to explore gene order dynamics by inferring gene rearrangement scenarii between different species of symbionts and by attempting to relate this to: 1. other features in the symbionts genome as well as to their environment and relation with the host, 2. random scenarii under certain models for randomness.

Gene order dynamics is mathematically modelled as distances between permutations of sets or multisets of gene identifiers under one or a set of operations that include principally reversal of a sequence of gene identifiers, but also displacement, deletion and duplication.

Numerous algorithms for computing a rearrangement scenario between different gene orders exist. As far as we know, only three, including one from some of the participants in this project [9,10], explore all possible optimal scenarii, and this only for the reversal permutation distance. There can be thousands or even millions of such scenarii which would seem to make them useless for any biological interpretation. There appears, however, to be structure in this huge set that has been very little explored yet, mainly because of the high complexity of the problem. How much biological information there is in this structure has, to the best of our knowledge, never been analysed.

Symbiont biochemical network evolution

Evolution is in general studied by comparative approaches. In the case of networks, many criteria could be used, from full network alignment - but this leads to hard computational problems - to the comparison of general network properties that ignore altogether topology and identify the networks to be compared with bags of proteins or of enzymes and metabolites (see [11] for metabolism), or with various structural indices such as clustering coefficient, betweeness centrality, average path length, diameter, concentration of subgraphs etc. (as examples, see [12,13,14]). Other measures have been used such as scope of compounds or its inverse problem [5,6]. Comparing flux distributions and regulation could represent a biologically richer approach in the case of metabolic networks but poses already the still largely unsolved problem of efficiently computing and then manipulating all possible or all optimal fluxes. The question of which method is best for comparing two networks is open, in particular when the purpose is to understand evolution and to elucidate the relation between genotype (genetic constitution of an individual, including here the constituents of its biochemical networks) and phenotype (describing any observed characteristic of an organism).

As concerns the evolution of symbiont networks more precisely, we know of very little work that has this as its specific object of study, although different analyses have been made on the evolution of metabolic networks in general [15,16,17] including ones that attempt to relate this to changes in the genomic landscape [18]. The most notable exception of a symbiotic study in the case of metabolism comes from the Hurst group [19]. This however makes use of a metabolic flux analysis model for Escherichia coli to predict the gene content of two organisms, in the case Buchnera aphidicola and Wigglesworthia glossinidia. The method is based on a greedy procedure which consists in removing a randomly chosen enzyme from the Escherichia coli network and calculating the impact of this deletion on the production of the biomass components that each of the two other symbiotic bacteria is known to output in normal conditions. Nothing guarantees that the order in which the enzymes are removed does not bias the outcome, and thus its biological interpretation. Furthermore, metabolic flux analysis models require precise knowledge of the nutrients that are available to the symbiont, an information that is in general not known, notably for symbionts living inside their hosts. The combinatorial scope of the investigation is also somewhat limited as other more readily available characteristics of a network could be used that would enable to apply the comparison to more species.

Whether we consider metabolic or PPI networks, we are dealing with node- and sometimes edge-labelled graphs (or bipartite graphs or hypergraphs in the case of metabolism) so topology only is not a fine enough measure when one of the main objectives is to study evolution. The information sustained by the labels, that bear a relation with the genomic text is also important. Whether the relation is independent or, inversely, dependent on its context, that is on the genes that are near along the genome or in the cellular space, is another open issue.

Symbiont-host co-cladogenesis and co-evolution

Co-cladogenesis (cladogenesis refers to the evolutionary process whereby one species splits into two or more species) and in general co-evolution of symbiont and host will be examined at the sequence level and, more speculatively, at the network level.

Sequence level

Among the questions that will be addressed concerning co-cladogenesis examined at the sequence level are: how often are symbionts horizontally transferred among branches of the host phylogenetic tree, how long do symbionts persist inside a given host following its invasion, and finally, what processes underlie this dynamic gain/loss equilibrium?

Current methods for answering to these questions are based on a phylogenetic approach where an evolutionary tree is built for symbiont and host separately and then compared to determine whether the two are identical (the two have indeed evolved together) or not. In the latter case, various events could have happened: the host has speciated (i.e. given risen to two new species) but not the symbiont (this is the "miss the boat" scenario), the symbiont has speciated but not the host or the symbiont has duplicated and one of the two went to infect another host (this is the "host-switching" scenario), the symbiont has gone extinct, or both host and symbiont have speciated. Other events may further complicate the co-evolutionary history: a host may live with various symbionts ("multi-infection"), symbionts may exchange genetic material further scrambling the evolutionary signal in either sense, etc. The latter events are, so far as we know, never taken into account in all existingcombinatorial or statistical methods.

Because of possible mistakes at various levels, for instance failure to detect the presence of a symbiont or mistake in the phylogenetic reconstruction, a difference between the two trees may also not mean a different host-symbiont evolution and although some methods consider this possibility, random models for co-evolution scenarii may be improved. Here again, combinatorial approaches may serve as important guides.

Network level

Rough as the measures for comparing two networks mentioned in the previous section may be for now, some could provide another looking-glass into the co-evolution of symbionts and hosts. This is a far more open issue of this project as fewer data are available to get at an accurate enough picture of the evolution of the interface between symbiont and host networks. The topic is however fully open and any new result in this area would thus be interesting.

References
[1] Blin G, Chauve C, Fertin G. Genes order and phylogenetic reconstruction: application to gamma-Proteobacteria. Comparative Genomics (RCG 2005), Lecture Notes in Bioinformatics, vol. 3678, pages 11-20, Springer-Verlag, 2005.
[2] Charleston MA, Perkins SL. Traversing the tangle: algorithms and applications for cophylogenetic studies. J Biomed Inform, 39:62-71, 2006.
[3] Stevens J. Computational aspects of host-parasite phylogenies. Brief Bioinform, 5:339-349, 2004.
[4] de Vienne DM, Giraud T, Shykoff JA. When can host shifts produce congruent host and parasite phylogenies? A simulation approach. J Evol Biol, 20:1428-1438, 2007.
[5] Handorf T, Ebenhõh O, Heinrich R. Expanding metabolic networks: scopes of compounds, robustness, and evolution. J Mol Evol, 61(4):498-512, 2005.
[6] Handorf T, Christian N, Ebenhõh O, Kahn D. An environmental perspective on metabolism. J Theor Biol, ahead of print, 2008.
[7] Tamames J, Moya A, Valencia A. Modular organization in the reductive evolution of protein-protein interaction networks. Genome Biol, 8(5):R94, 2007.
[8] Guimerà R, Nunes Amaral LA. Functional cartography of complex metabolic networks. Nature, 433(7028):895-900, 2005.
[9] Braga MDV, Sagot M-F, Scornavacca C and Tannier E. The solution space of sorting by reversals. in International Symposium on Bioinformatics Research and Applications (ISBRA). Lecture Notes in Bioinformatics (LNBI), vol. 4463, pages 293-304, 2007.
[10] Braga MDV, Sagot M-F, Scornavacca C and Tannier E. Exploring The Solution Space of Sorting by Reversals With Experiments and an Application to Evolution, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5:348-356, 2008.
[11] Clemente JC, Satou K, Valiente G. Reconstruction of phylogenetic relationships from metabolic pathways based on the enzyme hierarchy and the gene ontology, Genome Inform,16:45-55,2005. [12] Parter M, Kashtan N, Alon U. Environmental variability and modularity of bacterial metabolic networks. BMC Evol Biol, 7:169, 2007.
[13] Wang Z, Zhu XG, Chen Y, Li Y, Hou J, Li Y, Liu L. Exploring photosynthesis evolution by comparative analysis of metabolic networks between chloroplasts and photosynthetic bacteria, BMC Genomics, 7:100, 2006.
[14] Zhu D, Qin ZS. Structural comparison of metabolic networks in selected single cell organisms. BMC Bioinformatics, 6:8, 2005.
[15] Parter M, Kashtan N, Alon U. Environmental variability and modularity of bacterial metabolic networks. BMC Evol Biol, 7:169, 2007.
[16] Wang Z, Zhu XG, Chen Y, Li Y, Hou J, Li Y, Liu L. Exploring photosynthesis evolution by comparative analysis of metabolic networks between chloroplasts and photosynthetic bacteria, BMC Genomics, 7:100, 2006.
[17] Zhu D, Qin ZS. Structural comparison of metabolic networks in selected single cell organisms. BMC Bioinformatics, 6:8, 2005.
[18] Mahadevan R, Lovley DR. The degree of redundancy in metabolic genes is linked to mode of metabolism. Biophys J, 94(4):1216-1220, 2008.
[19] Pál C, Papp B, Lercher MJ, Csermely P, Oliver SG, Hurst LD. Chance and necessity in the evolution of minimal metabolic networks. Nature. 440(7084):667-670, 2006.

2. Présentation des partenaires

Brief history of the collaboration

The collaboration between the INRIA team and the other two partner teams started with a meeting at the 4th Workshop on Algorithms in Bioinformatics - WABI 2004, in Bergen, Norway where Marie-France Sagot was an invited speaker and Alberto Marchetti-Spaccamela an attendee. The following year, the two met again at the same conference, together this time with Leen Stougie with whom Alberto has maintained a more than 25-years working collaboration.

The two were interested in the work of the, now former, PhD student of Marie-France, Vincent Lacroix, was presenting on motifs in metabolic networks. A collaboration then started, initially on the enumeration of elementary modes in metabolic networks, that is of minimal-sized sub-networks capable to function in steady state, that then extended into another problem, namely of enumerating all minimal sets of precursors for a given set of target chemical compounds. The collaboration got off well from the start and has been extremely fruitful, with two papers already in common [15,16], regular visits (2 to Amsterdam with a 3rd already planned for Nov. 2008, 2 to Rome, and some 3-4 to Lyon with again a new one already planned for Jan. 2009), and informal exchange of PhD students (Vincent Lacroix, Vicente Acuña, Ludovic Cottret, Flavio Chierichetti) and one Master student (Paulo Milreu, Federal University of Mato Grosso do Sul, Brazil; visit to France funded by the HELIX-USP Associated Team ArcoIris and a STIC AmSud project involving BAMBOO, two teams in Brazil and one in Chile).

Both Alberto and Leen were also indicated in two ANR projects as outside, non-funded collaborators. The ANRs concern one ANR Blanc project (Miri, 2009-2012) and another ANR-BBSRC (MetNet4SysBio, 2008-2010) which involve, resp. the French coordinator and her project-team alone (ANR Blanc), and the French cooordinator and funded partners that could not, by the rules of the ANR and ANR-BBSRC, be the two partners proposed for this Associated Team project. It is hoped Paulo Milreu will be able to come back to France to do his PhD in co-supervision between Marie-France Sagot and Leen Stougie.

The collaboration with Alberto Marchetti-Spaccamela and Leen Stougie has been extremely important for the BAMBOO team by enabling it to establish a close relationship with computer science and mathematical groups that are both very strong and have an extensive network of other collaborative research contacts. Because of its localisation inside a mainly biological lab, the team had felt somewhat scientifically cut off from its coordinator's and main group members' original academic background. In turn, the collaboration with an interdisciplinary team inside a biological lab has brought to the Dutch and Italian teams a whole set of new interesting algorithmic and combinatorial problems. The extension of the collaboration to other permanent investigators and students in the groups of each of the coordinators can only help to make this collaboration even more fruitful.

Although this does not stricto sensu concern science, it is worth observing that this collaboration has also been immensely agreeable. It certainly helps explain why it has been scientifically productive despite the fact that the topic around which the collaboration started (metabolic networks) was entirely new to the Dutch and Italian teams, that the latter was not at all at the time involved with bioinformatics while the first is being funded on a bioinformatics project only from this year on, and that all money for travelling among the teams was taken from other projects.

References for the on-going collaboration
[20] Acuña V, Chierichetti F, Lacroix V, Marchetti-Spaccamela A, Sagot M-F, and Stougie L. Modes and cuts in metabolic networks: Complexity and algorithms. accepted in BioSystems, 2008.
[21] Cottret L, Milreu P. V., Acuña V, Marchetti-Spaccamela A, Martinez F V, Sagot M-F and Stougie L. Enumerating precusor sets of metabolites in a metabolic network. WABI'08, Lecture Notes in Computer Science/Lecture Notes in BioInformatics, volume 5251, pages 233-244, 2008.

INRIA BAMBOO team

Researchers from the INRIA BAMBOO team belong to the INRIA, to the University Claude Bernard (Lyon 1), and to the INSA-Lyon. The team has a background mainly in the design and analysis of algorithms as well as in combinatorics and statistics for biology, and in bioinformatics. It is strongly interdisciplinary, gathering together people from computer science, mathematics and biology (both theoretical and experimental).

Team members
Marie-France Sagot, INRIA, responsible, CV is found below, Computational biology/Algorithmics/Combinatorics, Director of Research
Hubert Charles, BF2I Lab at INSA Lyon and BAMBOO, Biology/Bioinformatics, Associate professor
Christian Gautier, University Claude Bernard and BAMBOO, Statistics/Computational biology, Full professor
Yvan Rahbé, BF2I Lab at INSA Lyon and BAMBOO, Biology/Bioinformatics, Director of Research
Vincent Miele, CNRS and BAMBOO, Statistics/Computational biology, Research Engineer
Franck Picard, CNRS and BAMBOO, Statistics/Computational biology, Research Associate
Eric Tannier, INRIA and BAMBOO, Computational biology/Algorithmics, Research Associate
Alain Viari, INRIA Rhône-Alpes Grenoble, Computational biology, Director of Research
Paulo Fonseca, University Claude Bernard and BAMBOO, PostDoc
Vincent Lacroix, University Claude Bernard and BAMBOO, PostDoc
Igor Nor, CNRS, University Claude Bernard and BAMBOO, PostDoc
Leonor Palmeira, University Claude Bernard and BAMBOO, PostDoc
Amélie Véron, CNRS and BAMBOO, PostDoc
Augusto Vellozo, INSA-Lyon, University Claude Bernard and BAMBOO, PostDoc
Vicente Acuña, University Claude Bernard and BAMBOO, PhD student
Ludovic Cottret, University Claude Bernard and BAMBOO, PhD student
Yves-Pol Deniélou, University Joseph Fourier and BAMBOO, PhD student
Emmanuel Prestat, University Claude Bernard and BAMBOO, PhD student
Patrícia Simões, University Claude Bernard and BAMBOO, PhD student
Possibly two Brazilian PhD students who have applied or will apply for a scholarship to come to France, one as a "sandwich" and the other as full PhD

CV of Marie-France Sagot
Marie-France Sagot was born on April 21st, 1956, in São Paulo, Brazil; she obtained a Bachelor of Science degree in Computer Science in December 1991 at the University of São Paulo, Brazil, a PhD in Theoretical Computer Science in July 1996 at the University of Marne-la-Vallée, France and the Habilitation à Diriger les Recherches (HDR) in July 2000 from the same University. She was appointed Associate Researcher at the Pasteur Institute in Paris from 1997 to 2001, then Associate Researcher from 2001 to 2003 at the INRIA in Lyon, in the HELIX project-team. In 2003, she was promoted to Director of Research at the INRIA.
She has co-authored about 60 papers in journals, international conferences or as book chapters. Her current research interests concern the design and analysis of algorithms notably for biology.
She has co-founded one national and one international Conference (resp. JOBIM and the European Conference on Computational Biology - ECCB) and (co-)chaired national and international Conferences and Workshops (JOBIM 2000 - Montpellier; JOBIM 2004 - Montreal, Canada; ECCB 2003 - Paris; CompBioNets 2004 - Recife, Brazil; CompBioNets 2005 - Lyon; BSB 2007 - Rio de Janeiro, Brazil; ISMB-ECCB 2009 - Stockholm). She has served, resp. serves, in the Steering Committee of JOBIM and ECCB. She is Associate Editor in Lecture Notes in BioInformatics, Journal of Discrete Algorithms, BMC Bioinformatics, BMC Algorithms for Molecular Biology and IEEE/ACM Transactions on Computational Biology and Bioinformatics of which she will become the Editor-in-Chief starting in January 2009.
She is or has been the PI, for her team or general, of several national and international grants (Welcome Trust, Royal Society UK, Brazilian and Portuguese projects, ARCs INRIA, ACIs, ANRs of which two as sole funded team).
She founded and directed from 2004 to 2007 the first PhD Program on Computational Biology in Portugal, is a Visiting Researcher Fellow at King's College, London (since 2002) and at the INESC-ID, Instituto Superior Técnico, Lisbon (since 2004).
She has served as committee member in the Comité national de la recherche scientifique CID 44 (2004-2008), has served in various researchers selection committees for the INRIA and is currently serving in the 01 (Maths) section of the CoNRS.

Free University and CWI team

The majority of the CWI/VU Amsterdam team is mathematician or computer scientist. Steven Kelk, Gunnar Klau and Leen Stougie have been working for several years in the field of computational biology. In a few years time they have acquired a reputation for solving complicated mathematical optimization problems resulting from modelling biological problems. Rene Sitters, who has mainly worked on optimization problems related to logistics and telecommunication, will join the team on January 1, 2008.

Frank Bruggeman and Leen Stougie, with the assistance of Wilfred Röling, obtained a Dutch grant for studying "microbial ecosystems and multiple environment stoichiometric analyses", having significant overlap with the subject of our present research proposal. Brett Olivier and Steven Kelk are the PostDocs on the project and Jerien Klaver the PhD-student. The project starts on November 1, 2008. Within the Dutch project there is money for travelling and for receiving guests, some of which can be supplementary to the present project proposal. More PhD-students are expected to join.

Team members
Leen Stougie, Free University and CWI Amsterdam, responsible, CV is found below, Mathematician/Operations Researcher, Full Professor
Gunnar Klau, CWI Amsterdam, Mathematician/ Operations Researcher, Tenure track
Steven Kelk, CWI Amsterdam, Computer Scientist, PostDoc
Frank Bruggeman, Free University and CWI Amsterdam, Systems Biologist, PostDoc
Wilfred Röling, Free University Amsterdam, Micro-ecological biologist, Assistant Professor
Rene Sitters, Free University Amsterdam, Mathematician, Assistant Professor
Jerien Klaver, Free University Amsterdam, PhD-student

CV of Leen Stougie
Leen Stougie was born on July 20, 1956, in Oud-Beijerland, The Netherlands. He will hold a full professorship at the Department of Economics and Business Administration of the Free University in Amsterdam from November 1, 2008 on, being currently an associate professor at theDepartment of Mathematics and Computer Science of the Technical University Eindhoven. Next to that he will continue holding a researcher position at the CWI in Amsterdam, since 2008. Before this he worked full-time at CWI, and at the University of Amsterdam. He held several visiting professorships, among others at UC Berkeley, La Sapienza University of Rome, and Technical University of Berlin. He obtained a PhD-degree in May 1985 at Faculty of Economics at the Erasmus University Rotterdam on a thesis titled "Design and Analysis of Algorithms for Stochastic Integer Programming" under the supervision of Jan Karel Lenstra and Alexander Rinnooy Kan. 8 PhD-students graduated under his supervision. In 2006 he was awarded the Howard W. Kuhn Award, during the INFORMS meeting, Pittsburgh, Pennsylvania, USA. His research interests are mathematics on the interface of operations research, combinatorial optimization and probability theory. A common theme in his research is optimization under uncertainty: stochastic programming and on-line optimization. But he has recently made contributions to graph optimization problems related to telecommunication networks. Since 2004 he is involved in projects on computational biology: metabolic network analysis and the construction of phylogenetic networks.
He (co-)authored over 50 papers in journals, reviewed conference proceedings and books (list here).
He was in 2001 Coordinator of NWO-Special Year on Mathematical Biology, in 2001 and 2003 Organizer of the Winterschool on Mathematics and Biology at Wageningen in The Netherlands, and is since 2005 until 2009 Project leader of the BSIK/BRICKS-project AFM2, and from 2005 until 2009 Site leader of the European FET-project ARRIVAL.

La Sapienza University Team

Researchers from Sapienza belong to the Algorithm Engineering group at the Department of Computer and Systems Sciences of Sapienza, University of Rome that has a strong background in the design and analysis of algorithms for network applications, combinatorial optimisation, massive data sets, and in the analysis of the structure of large and complex networks.

The group's strong theoretical background is complemented by extensive experimental activities in the analysis of very large complex networks.

Team members
Alberto Marchetti-Spaccamela, University of Rome La Sapienza, responsible, CV is found below, Design and analysis of algorithms/Combinatorial optimisation, Full Professor
Giorgio Ausiello, University of Rome La Sapienza, Design and analysis of algorithms/Combinatorial optimisation, Full professor
Stefano Leonardi, University of Rome La Sapienza, Design and analysis of algorithms/Combinatorial optimisation, Full professor
Luca Becchetti, University of Rome La Sapienza, Design and analysis of algorithms/Combinatorial optimisation, Research Associate
Vincenzo Bonifaci, University of Rome La Sapienza, Design and analysis of algorithms/Combinatorial optimisation, PostDoc
Two PhD students

CV of Alberto Marchetti-Spaccamela
Alberto Marchetti-Spaccamela was born on September 10th, 1954, in La Spezia, Italy; he graduated in Electrical Engineering cum laude in October 1977 at the University of Rome. During 1982-1983 he was visiting scholar at University of California at Berkeley. In 1987 he has been appointed Full Professor at the University of L'Aquila and since 1991 he is at Sapienza University of Rome.
Alberto Marchetti-Spaccamela has (co)-authored about 100 papers in journals and international conferences. His current research interests concern the design and the analysis of algorithms, and their applications to computer networks, bioinformatics and high performance computing. He has co-authored a book on "Approximation Algorithms" (Springer Verlag 1999) and four textbooks for University classes (in Italian). He has chaired International Conferences and Workshops (WG 96, ICALP 97, WSDAAL 97, ICTCS 91, OLA 99, WAE 01, Hot TiNA 08) and will chair in 2009 track C of ICALP on "Foundations of networked computation"; he has been guest editor for special issues of Theoretical Computer Science, ACM Computing Surveys, Discrete Applied Mathematics, J. of Discrete Algorithmss; he has co-edited three volumes published by Springer Verlag and has served in the program committee of more than forty international conferences and workshops (e.g. 2007: MFCS - Prague, ICTCS - Rome; 2008: WG - Leicester, ALGOSENSORS - Reykjavik, WINE - Hong Kong, AAIM - Shanghai, TGC - Nice). He has served in the steering committee of ESA and is currently member of the WG steering committee. He is and has been PI for Sapienza of several European grants (current projects: AEOLUS, FRONTS, PANORAMA, Graal) and is in the advisory board of PERADA. He is a member of the National Committee for the International Olympiads in Informatics and he is current president of the Italian Chapter of the European Association for Theoretical Computer Science (EATCS).

3. Impact

The three teams in this collaboration are both complementary and similar. They are complementary in the sense that both the Italian and Dutch teams are specialised in mathematical modelling and the design and analysis of optimization and approximation algorithms while the French team is fully interdisciplinary, with a spectrum of scientific knowledge that is larger (including also statistics, biology, string algorithmics) but in most cases less expert. The teams are similar in that all three teams are composed mainly of computer scientists and mathematicians with furthermore a background more deeply rooted in discrete mathematics (algorithmics, graph theory, enumerative combinatorics). There is therefore already a common language which greatly facilitates communication and establishes strong common goals although the spectrum of scientific interests are more varied among the teams. The French team has a clear interest in having some impact also on biology, which is increasingly the goal of the Dutch team too, particularly now that it is extended to more people, and houses a project similar to the one of the proposal. This can only enhance the focus and add to the critical mass of the proposed project. The biological part of the team will most likely be involved mainly through seminar days to be organised once or twice per year at one of the three sites. The Italian team remains interested essentially in the algorithmic and combinatorial questions raised by biology of which they are however getting an increasingly sophisticated grasp.

The expected impact of this collaboration is therefore to 1. enrich all three teams with new expert computational and mathematical knowledge and problems, 2. pull together the bioinformatics forces of the French and Dutch teams, 3. enable a strong flow of Master, PhD and PostDoc students among all three teams, and finally 4. prepare the way for more narrow scientific and possibly institutional projects and administrative links between all three partners.

Indeed, the teams would like to be able to find some stronger administrative framework for their work together. The ideal would be to be able to establish an international INRIA team involving at least the coordinators and the main investigators of the non French teams.

II. PREVISIONS 2009

Programme de travail

Description du programme scientifique de travail

In the first year of the Associated Team, attention will concentrate mostly on the analysis of metabolic networks and co-phylogeny. Planned work on these two topics is developed further below knowing that, as always, which path is precisely followed may vary sometimes widely with the results obtained along the way.

Symbiont biochemical network evolution

In a recent book (Darwin, dessine-moi les hommes. Ed. Le Pommier, 2007), Claude Combes, an expert on symbiosis, wrote that "The constituents of life are being constantly renewed, it is the organisation, and it alone, which perpetuates itself". Part of this organisation resides in the networks of interactions existing between the genes, proteins, and small molecules that compose a living organism. It is now commonly accepted that the functioning and development of any organism is under the partial control of such networks.

The interactions may be direct (for instance, in the case of a physico-chemical contact between two proteins) or indirect (as when two enzymes catalyse the rate of two successive steps in a series of chemical reactions occurring within a cell). The term "network of interactions" is habitually and generically used to designate what are in fact at least three different biochemical processes: metabolism, gene regulation and signal transduction. We concentrate in this first year on metabolism that is the complete set of chemical reactions that occur in living cells and involves proteins called enzymes and small molecules called metabolites or compounds.

The amount and spread of data now becoming available by the high-throughput techniques developed in recent years have enabled to obtain increasingly bigger and more realistic networks. This has also allowed to introduce an evolutionary perspective into the study of such networks. Indeed, that organisation per se may perpetuate itself does not mean it undergoes no changes. As with genomic sequences, determining whether and how the different networks evolve, indeed how they may possibly co-evolve with the molecules whose interactions they model, is an essential issue. This is particularly true of symbiosis where the intimate relation of one organism with another may lead to a dramatic alteration of the genetic and biochemical capacities of both.

This part of the project will involve the following steps.

Investigate the literature on the topics of comparing metabolic networks of different organisms. Currently, this is done either by using alignment algorithms or, since the general problem of aligning graphs is hard, by using very general and synthetic graph indexes and by comparing those. The objective here is to compare different functional capacities, and the question one should keep in mind in examining previous work is how well the measure used relates to functional and biochemical capacity and to its evolution.

Investigate other means of measuring a network capacity that have not been used, or seldom, for comparing networks. These concern motifs and modules that could be seen as extensions of motifs with a more functional flavour but which remain ill-defined. It concerns also in the case of metabolic networks what has been called elementary modes [22]. Intuitively, these correspond to the smallest subnetworks that can function in steady state where by steady state we mean a state in which all compounds internal to the subnetwork that are produced are also consumed. Another measure of network capacity could be the number of biologically viable alternative paths to produce a given compound or biomass, that is, the degree of redundancy of the network.

For comparative measures, more classic or new, that could help provide some insight into the evolution of an organism's capacities, develop a comparative framework and the algorithms needed to perform the comparison in that framework. Implement and test the algorithms.

Develop methodology to investigate how the evolution of the interaction and metabolic capacities of the networks for different symbionts correlates with the evolution of their genome organisation, or where only partial data is available, of their gene contents. Some symbionts are known to suffer a sometimes dramatic loss of genes (enzymatic and in general) due to their close relation with their hosts, how does this loss map onto the metabolic networks for different types of symbiotic relationships? Are the effects more structural (lost genes are more often neighbours in the networks of non symbiotic organisms), or more functional, that is affecting the networks capacity in which case the lost genes need not be neighbours but could occupy other strategic positions instead, and in that case, which?

Symbiont-host co-cladogenesis and co-evolution

Combinatorially exploring the molecular landscape of symbionts at the sequence and network levels has for objective to understand how species that live in such close relationships with others evolve. However this requires understanding also how and why the symbiotic relationships are maintained or change through time, that is understanding the evolution of such intimate relations themselves. This is the ultimate goal of the present project. Joint symbiont and host evolution will, again, be studied at the level of the sequence and, more speculatively, at the level of the interface between the host and symbiont networks.

At the sequence level

The aim of this part is to develop co-cladogenesis inference methods in order to better understand how tight is the association between symbionts, more specifically endosymbionts that have also been called "influential passengers", and their hosts. This question has received so far only limited attention, although it is crucial to the assessment of the evolutionary consequences such passengers may have, and to an understanding of their long term evolutionary trajectories.

This part of the project will involve the following steps.

Investigate the literature on the topics of phylogenetic inference and co-cladogenesis analysis. This includes both statistical and combinatorial approaches.

Investigate the relation between co-cladogenesis analysis and other closely related topics such as gene and species tree comparison for which the literature is more extensive, and phylogenetic tree comparison where the literature is also vast and to which members of this project have contributed in the past [31,32]. It is clear that there are some common points as suggested in the Huelsenbeck's 2000 paper, but also important differences.

In discussion with the biologists (see below), carefully formalise the problem and the different simplifications that could be made, or inversely, should be avoided. Preliminary investigations indicate for instance that ignoring the possibility of multi-infections may be too gross a simplification but there could be constraints (such as concerning a maximum number of possible simultaneous infections) that would enable to put a bound on the complexity of the problem. Considering that the rate of loss and gain of infection is the same for every host taxonomic group is also a simplification. For example, rates of host switching might depend on geographic distance between hosts and this could be taken into account in the model.

Together with the biologists, carefully plan a series of simulated data that will later enable to evaluate the performance of the method(s) proposed.

Investigate new approaches that better address the problem of co-cladogenesis inference than currently existing methods, propose algorithms and investigate their complexity. This should include giving a quantification for the similarity between host and symbiont trees; establishing a good random model that would enable to determine whether the trees can be considered statistically similar or not; if they are not similar, providing possible scenarii to explain the differences. Implement the methods(s) proposed and test.

This part of the work will be able to rely on the expertise of the ``Génétique et Évolution des Interactions Hôtes-Parasites (GEIHP)'' team inside the Lab. of Biométrie and Biologie Évolutive, in particular Sylvain Charlat (CR1, CNRS) and on one PhD student, Patrícia Simões, with a biology-bioinformatics background to work the models with, help with planning the simulated data and be a critical user of the method(s) developed.

At the network level

Studying the co-evolution of host-symbiont networks may be in part an unreachable goal at the current time due to lack of data. The data is however coming in at great speed and what is available enables already to address a few initial questions while others may rapidly come within our reach in the next few years. In this last, more speculative part of the project, we shall concentrate our attention for now on the most intimate of all symbiont relationships, endosymbiosis, where the bacterium lives inside the cell of its hosts.

More precisely, we shall investigate the interface between host and symbiont metabolic networks, and initially address the following issues.

Develop methods for answering to the following two questions: 1. what are the metabolites that an endosymbiont is capable of producing, and 2. what metabolites does the endosymbiont require from its environment (in this case, the host) to produce one or a set of metabolites (called target(s)) that the host in turn may need in order to survive but is unable anymore to produce on its own. The latter are called precursors.

Investigate a formal framework for comparing the precursor sets from bacteria living in different symbiotic relations with their hosts and develop algorithmic approaches for doing the comparison within this framework.

References
[22] Schuster S, Fell DA and Dandekar T. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nature Biotechnology. 18:326-332, 2000.
[23] Charleston MA, Perkins SL. Traversing the tangle: algorithms and applications for cophylogenetic studies. J Biomed Inform, 39:62-71, 2006.
[24] Stevens J. Computational aspects of host-parasite phylogenies. Brief Bioinform, 5:339-349, 2004.
[25] Legendre P, Desdevises Y and Bazin E. A statistical test for host-parasite coevolution. Systematic Biology, 51(2): 217-234, 2002.
[26] Charleston, M. A. Jungles: A new solution to the host/parasite phylogeny reconciliation problem. Mathematical Biosciences, 149:191-223, 1998.
[27] Page RDM (Editor). Tangled Trees: Phylogeny, Cospeciation, and Coevolution, University of Chicago Press. 2002.
[28] Huelsenbeck JP, Rannala B, Larget B. A Bayesian framework for the analysis of cospeciation. Evolution, 54:352-364, 2000.
[29] Huelsenbeck JP, Rannala B, Yang Z. Statistical Tests of Host-Parasite Cospeciation Evolution, 51:410-419, 1997.
[30] de Vienne DM, Giraud T, Shykoff JA. When can host shifts produce congruent host and parasite phylogenies? A simulation approach. J Evol Biol, 20:1428-1438, 2007.
[31] Rodrigues EM, Sagot M-F, Wakabayashi Y. The maximum agreement forest problem: Approximation algorithms and computational experiments. Theor Comput Sci, 374(1-3): 91-110, 2007.
[32] Rodrigues EM, Sagot M-F, Wakabayashi Y. Some Approximation Results for the Maximum Agreement Forest Problem. in RANDOM-APPROX, pages 159-169, 2001.

 

Programme d'échanges avec budget prévisionnel

1. Echanges

All expenses will be for regular trips to Amsterdam, Rome or Lyon by the other partners of the Associated team. Each trip should be for approximately one week. At least one of the one-week-trips will be organised in the form of a small workshop open to non-partner members in the place where the meeting is chosen to take place.

 1. ESTIMATION DES DÉPENSES EN MISSIONS INRIA VERS LE PARTENAIRE
Nombre de personnes
Coût estimé
Chercheurs confirmés 1 1 x 2 (trips) x 750
Post-doctorants
2 2 x 2 (trips) x 750
Doctorants 3 3 x 2 (trips) x 750
   Total
6 9000 euros

 

 2. ESTIMATION DES DÉPENSES EN INVITATIONS DES PARTENAIRES
Nombre de personnes
Coût estimé
Chercheurs confirmés 4 4 x 2 (trips) x 750
Post-doctorants
4 4 x 2 (trips) x 750
Doctorants 6 6 x 2 (trips) x 750
   Total
14 21000 euros

2. Cofinancement

The collaboration benefits from funds obtained through the ANR Blanc, project Miri, accepted in 2008 and due to last from 2009 to 2012. Although not partners in the project (BAMBOO is the sole partner of Miri), some funds were sollicited to invite Alberto Marchetti-Spaccamela and Leen Stougie, as well as to pay for trips to visit their labs. The value asked per year for Alberto and Leen amounted to 2 times 3000 euros (out of a total of 52000 euros over 4 years for visits, conference participations and publication costs), while the remaining 4000 euros may be taken from the ANR-BBSRC MetNet4SysBio project (out of a total of 20118 euros over 3 years for visits, conference participation and publication costs). Some further funds for travelling and for receiving guests, which can be supplementary to the present project proposal, may possibly be from a bioinformatics Dutch project involving members of the team at Amsterdam, including its coordinator (see Free University and CWI team description).

3. Demande budgétaire

Commentaires
Montant
A. Coût global de la proposition (total des tableaux 1 et 2 : invitations, missions, ...) 30000
B. Cofinancements utilisés (financements autres que Equipe Associée) 10000
Financement "Équipe Associée" demandé (A.-B.)
(maximum 20 K€)
20000