Skip to content

Evaluate GitHub repositories

Info

This tutorial explains each step involved in programmatic FAIRsoft evaluation given one or more GitHub repositories.

Jupyter notebook

Download the jupyter notebook of this tutorial here

Evaluation Workflow

In this workflow, metadata is first extracted automatically from a GitHub repository. It can then be reviewed, enriched, and used to compute FAIRsoft indicators.

GitHub repository URL
        ↓
1. Metadata extraction
        ↓
2. (Optional) metadata review and enrichment
        ↓
3. FAIRsoft evaluation
        ↓
Scores, logs and feedback

Requirements

To run this workflow, you will need:

  • Python 3.9+
  • The Python requests library. You can install it with:
    pip install requests
    
  • A GitHub personal access token with read permissions for repositories
  • One or more GitHub repository URLs

The token is used by the GitHub Metadata REST API to access repository information and inspect repository contents needed for metadata extraction.

1. Extract repository metadata

Metadata is extracted from GitHub using the GitHub Metadata API.

def get_repository_metadata(owner, repo):

    payload = {
        "owner": owner,
        "repo": repo,
        "userToken": token,
        "prepare": False
    }

    response = requests.post(GITHUB_METADATA_API, json=payload)
    response.raise_for_status()

    return response.json()["data"] 

Note

The GitHub Metadata API is not GitHub’s official API. It is a higher-level service that extracts FAIR-relevant metadata from repositories.

Metadata extraction under the hood

The GitHub Metadata API extracts information such as:

  • repository name and description
  • homepage and repository URLs
  • releases and version tags
  • license information
  • authors inferred from commit history
  • repository topics
  • publication information
  • documentation files and their types
  • citation metadata from CITATION.cff

The API combines data from:

  • the GitHub GraphQL API
  • repository contents
  • documentation directories such as docs/ and documentation/
  • standard project files like README, CONTRIBUTING, CHANGELOG, and CITATION.cff

2. Enrich metadata

Warning

The extracted metadata is often incomplete for a full FAIRsoft evaluation and may require manual enrichment.

Typical fields that may need enrichment include: type, webpage, dependencies, input, output, os, publication, download and test.

See the detailed guide on how to (manually) enrich metadata:

Metadata enrichment guide

3. Compute FAIRsoft indicators

Once metadata is ready, it can be sent to the Software Observatory evaluation REST API.

def compute_fairsoft(metadata):

    payload = {
        "tool_metadata": metadata,
        "prepare": False
    }

    response = requests.post(OBSERVATORY_API, json=payload)
    response.raise_for_status()

    return response.json()

The response contains: computed FAIRsoft indicators, evaluation logs and improvement suggestions.

Understanding the results

The evaluation response contains three main components:

Field Description
result FAIRsoft scores and indicator values
logs Detailed explanation of each indicator
feedback Suggestions for improving FAIRness

Result

The result field includes the four main FAIRsoft dimensions:

F  Findability
A  Accessibility
I  Interoperability
R  Reusability

It also contains lower-level indicators such as:

F1, F2, F3
A1, A2, A3
I1, I2, I3
R1, R2, R3

and detailed checks:

F1_1
F1_2
A1_3
I3_1

Logs

The logs field explains how each indicator was computed. Each entry describes:

  • what was checked
  • which metadata fields were used
  • whether the check passed or failed and why

Feedback

The feedback field provides human-readable improvement suggestions. For each FAIR principle it lists:

  • strengths
  • improvements