Welcome to kibior
package introduction vignette!
As one of the hot topics in science, being able to make findable, accessible, interoperable and researchable our datasets (FAIR principles) brings openness, versionning and unlocks reproductibility. To support that, great projects such as biomaRt R package enable fast consumption and ease handling of massive validated data through a small R interface.
Even though main entities such as Ensembl or NBCI avail massive amounts of data, they do not provide a way to store data elsewhere, delegating data handling to research teams. During data analysis, this can be an issue since researchers often need to send intermediary subsets of analyzed data to collaborators. Moreover, it is pretty common now that, when a new database or dataset emerges, a web platform and an API are provided alongside it, allowing easier exploration and querying.
Multiplying the number of research teams in life-science worldwide with the ever-growing database and datasets publication on widely varying sub-columns results in an even greater number of ways to query heterogenous life-science data.
Here, we present an easy way for datasets manipulation and sharing throught decentralization
. Indeed, kibior
seeks to make available a search engine and distributed database system for sharing data easily through the use of Elasticsearch (ES) and Elasticsearch-based architectures such as Kibio.
It is a way to handle large datasets and unlock the possibility to: