Skip to content

Welcome

This is the documentation of the data processing backbone of the Research Software Observatory, supporting large-scale monitoring of software FAIRness in the life sciences.

The pipeline consolidates and harmonizes metadata from multiple registries and repositories, enriches it with external information, and pre-computes the FAIRsoft indicators and other metrics displayed in the Software Observatory interface.

At a glance

Language: Python 3.10
Execution: CLI (rsetl)
Dependencies: pydantic, tenacity, pymongo, ... (see more)
Database: MongoDB
Main stages: Transformation → Enrichment (in parallel) → Integration → Evaluation
Enrichment sub-pipelines: SPDX · EDAM · Publications · Service availability
Maintained by: Spanish National Bioinformatics Institute

Quickstart

Clone and install:

git clone https://github.com/inab/research-software-etl.git
cd research-software-etl
pip install -e .

Each execution can run as a single stage or as part of the full workflow through the unified CLI command:

rsetl

Use rsetl --help or go to the CLI docs for more information.

Next steps


Next stepInstallation & Configuration