Welcome to kibior package introduction vignette!





1 General notions

As one of the hot topics in science, being able to make findable, accessible, interoperable and researchable our datasets (FAIR principles) brings openness, versionning and unlocks reproductibility. To support that, great projects such as biomaRt R package enable fast consumption and ease handling of massive validated data through a small R interface.

Even though main entities such as Ensembl or NBCI avail massive amounts of data, they do not provide a way to store data elsewhere, delegating data handling to research teams. During data analysis, this can be an issue since researchers often need to send intermediary subsets of analyzed data to collaborators. Moreover, it is pretty common now that, when a new database or dataset emerges, a web platform and an API are provided alongside it, allowing easier exploration and querying.

Multiplying the number of research teams in life-science worldwide with the ever-growing database and datasets publication on widely varying sub-columns results in an even greater number of ways to query heterogenous life-science data.

Here, we present an easy way for datasets manipulation and sharing throught decentralization. Indeed, kibior seeks to make available a search engine and distributed database system for sharing data easily through the use of Elasticsearch (ES) and Elasticsearch-based architectures such as Kibio.

It is a way to handle large datasets and unlock the possibility to:

  • pull/download datasets from a local or remote instance of Elasticsearch,
  • filter, query and search in large amounts of data,
  • push/store datasets to local or remote instance of Elasticsearch,
  • share datasets for collaborators around the world,
  • perform joins between R in-memory and ES-based datasets,
  • import and export datasets from and to files,
  • valid safe-state datasets during pipeline execution,
  • comply to FAIR-sharing requirements by allowing REST requests on data and metadata from Elasticsearch API.