From the molecule to the cell:
development, confrontation and integration of formal models and methods of analysis
The extraordinary diversity observed at all levels of living organisms seems to rely not on a diversity of the elements composing such organisms (genes, metabolites, hormones, etc.) but rather on the extraordinary combinatorics of the possible spatial and temporal interactions among these elements, and between these elements, individually or collectively, and their environment.
Getting an insight into the complex network of such interactions forces to adopt the point of view of an organism that is at the same time local and global, static and dynamic, going from the molecular level to a general vision of the functioning of a cell.
The aim of this project is to try to arrive, by means of a comparative approach (from publicly available data), to a better understanding of the diversity of the modes of evolution and functioning of organisms, both prokaryotes and eukaryotes. This project is built on the large spectrum of skills gathered in a recently created team, BAOBAB-HELIX, and, extending them, on the work and various methodological results the members of the team have already obtained. In this project, we shall be interested in analysing each level separately, from the genome to the cell, but above all in studying the links among these different levels.
More precisely, there are essentially three biological questions we wish to address: 1. are there regularities, structural and functional, in the diversity that is observed, regularities that could provide evidence of a deeper organisation of living organisms; 2. can we identify these regularities in a systematic fashion and thus manage to distinguish an order in the complex network of the observed interactions; finally, 3. how has this network been set up in the course of evolution, to accomplish what functions, and could it have evolved in a different way?
This study will be conducted through the development of diverse models and methods of analysis (statistical, combinatorial, etc.) The models and methods will be systematically explored and confronted in an agnostic spirit with the final objective of answering the three questions above. Methodological development and a better understanding of biological phenomena will thus represent two processes that proceed together, side by side. The expected results first concern each such process taken separately: 1. getting at better mathematical models and algorithms for analysing the networks; and 2. answering both specific biological questions (hypothesis tests) and more general ones (exhaustive exploration of available data). The simultaneous confrontation of our modelling attempts with biological data should also, above all, allows us to get a better grasp of the structure of living systems: is this structure simple or simplifiable into some general principles, or is life made essentially of exceptions?
Overall objectives and context
We wish to address essentially three questions in this project, namely:1. are there regularities, structural and functional, in the extraordinary diversity observed at all levels of living organisms, regularities that could represent evidence of a deeper organisation? 2. if these regularities exist, can we identify them in a systematic manner and thus manage to distinguish an order in the complex network of the detected interactions underlying such diversity? 3. how has this network been set up in the course of evolution, to accomplish what functions, and could it have evolved differently?
By regularity, we mean the conservation of some elements, at the level either of the genome, or of the network of molecular interactions inside a cell. This conservation can be observed within a same organism (we then speak of approximate repetitions of parts of a genome or of a network of interactions), or among different organisms.
It seems clear that we can already answer to the first question in a positive manner. Indeed, regularities have been found at different levels, going from the genome to the network of molecular interactions inside a cell (and further on, to morphology but we shall not be interested by this in the current project). Examples of regularities at the level of the genome could be repetitions of motifs corresponding to regulatory sequence binding sites upstream of genes, or operon structures in prokaryotic organisms. The notion of regularity has been more recently extended to the level of the network of interactions and remains less well defined in this context. Intuitively, it corresponds to the existence of sub-networks that are "similar" one to each other and appear at different places in the overall network. The notion of "conservation" and of searching for conserved elements is common to the two levels, of the genome and of the network, and we therefore adopt the general expression of "regularity" as a link between these two levels.
In relation to previous national or international work, various problems remain open, even at each of the levels considered separately.
The main problem when one wishes to detect regularities consists in determining which ones represent real biological phenomena, and which are just artifacts due to chance. This question has already been addressed, from a statistical point of view, in various ways. Besides the fact that a statistical point of view brings only an indirect and partial answer to the question, no currently available approach solves it in a completely satisfying manner.
In all cases, the difficulty is to define the null hypothesis against which the statistical significance of a regularity must be evaluated. Let us consider the example of possible regularities in the metabolic network of an organism. What random network must we consider to test the surprising, and thus potentially functional character of a regularity in metabolism? For other networks (for instance, of gene interactions), topology is the only character that is considered. However, topology appears to be an unsuitable criterion to express the notion of function in relation to metabolism (and possibly also in relation to the network of gene interactions). Similar examples may be provided for other types of regularities, such as the order of genes along a genome. This apparently methodological problem requires biological thought to decide which characteristics to include in the random network so as to avoid detecting trivial regularities.
Answering to this represents already a first step towards the second question we wish to address: identify regularities in a systematic manner, at each of the levels separately, from the genome to the cell (through the networks that allow in part to model it), and then identify the regularities that relate to various levels simultaneously. As regards each level separately, previous work (some of which has been done by members of the current project) has concerned two types of regularities: 1. repetitions or motifs, possibly permuted (segments conserved by rearrangements), identifiable from the genomic sequence, and 2. topological motifs in gene or protein-protein interaction networks. The approaches developed to detect such regularities present various problems, the main ones being: 1. they lack biological realism in the model of regularity to search for with, for instance, little or no consideration for their cooperative aspect (case of the regulatory sequences) or their function; 2. they are often biased towards the method (meaning that the specific interest of the method overcomes the interest of the characteristics searched for, even though no method has all the qualities required by the initial question); 3. they often adopt heuristic methods of detection that do not allow to precisely characterize what is identified, and thus do not lead to a systematic search for regularities.
The search for regularities at various levels simultaneously has been, to the best of our knowledge, very rarely attempted up to now. There are a few works that try to relate, for instance, the order of genes along a genome and metabolism but they remain few and are seldom exhaustive. As a consequence, we still miss essential pieces of information to answer in a way other than anecdotal to the third of our questions, namely concerning the evolution of the complex network of interactions that lead to the diversity observed at the level of living systems.
The general objectives of this project are first to: 1. improve and extend the existing methods, notably for the inference of regularities at each level by exploring the various possible approaches (statistical, combinatorial etc.) with the aim of gathering together the qualities of each; 2. develop systematic methods of inference of regularities for the cases little or not at all addressed in the literature, in particular regularities concerning information situated at various levels simultaneously. Solving the latter must initially go through a non trivial step of representing the information of various types of networks (for instance, metabolic and genetic) in a single one. Also in this case, very few works exist on the integration of highly heterogeneous information (see the work, for instance, of B. Palsson).
The objectives and stakes in the longer course are, starting from the regularities detected, to arrive at an understanding on how the complex network of interactions has been set up during evolution, and in which cases a simpler structure exists, if it exists. This problem is widely open. It leads to applications in bio-engineering, and thus in agronomy and medicine, but our interest is already in getting at a better identification of the function, local and global, of the different constituents of a cell.