Skip to content

Installation & Configuration

Overview

The Research Software Observatory – Data Pipeline can be installed as a standalone Python package.
It includes the trasformation, integration, and enrichment stages required to build the Observatory’s metadata database and precompute quality and FAIRness statistics for the UI.

Some stages call external services (APIs and model providers); make sure credentials are set before running.

Requirements

  • Python ≥ 3.10
  • MongoDB instance
  • Tokens to access to the following services (depending on stages you run):
Other services used

The following services are also accessed in some steps but require no credentials:


Install

git clone https://github.com/inab/research-software-etl.git
cd research-software-etl
pip install -e .

This will install the package in editable mode and expose the CLI command rsetl.


Environment variables

Before running the pipeline, export the following variables (or include them in a .env file):

MongoDB connection

MONGO_HOST=...
MONGO_PORT=...
MONGO_USER=...
MONGO_PWD=...
MONGO_AUTH_SRC=...
MONGO_DB=...

API tokens

Used in disambiguation steps:

GITHUB_TOKEN=...
GITLAB_TOKEN=...
OPENROUTER_API_KEY=...
HUGGINGFACE_API_KEY=...

Verifying the installation

Run the following command to ensure the package is installed and the CLI entry point is available:

rsetl --help

You should see a description of the available arguments or stages.

To check connectivity with the database and API:

rsetl check-env

If your MongoDB is reachable and your tokens valid, you should see something like this:

=== Research Software Observatory – Environment Check ===

✅ MongoDB                   connected (v8.0.13)
✅ Observatory API           reachable (200)
✅ Licenses API              reachable (200)
✅ Europe PMC                reachable (200)
✅ Semantic Scholar          reachable (200)
✅ Hugging Face API          reachable (200)
✅ OpenRouter API            reachable (200)
✅ GitHub API                reachable (200)
✅ GitLab API                reachable (200)

=== Summary ===
✅ Environment looks OK.


Documentation

This documentation is built using MkDocs and the specific Material for MkDocs theme.

To build or preview this documentation locally:

pip install mkdocs mkdocs-material pymdown-extensions
mkdocs serve

Then open http://127.0.0.1:8000/research-software-etl in your browser. See more CLI options here.


Next Steps