Git-sourced Pipelines (experimental)

Introduction

Git can be used as source-control system for your pipelines. This offers portability, easy versioning of pipelines and easy integration of existing public pipelines. You configure where the pipeline is located on GitHub, which tag or commit-id of that pipeline to use and provide credentials if it is a private repository.

To use Git-sourced pipelines, you need to:

  1. Create Git Credentials with your Personal Access Tokens. (if you want to use a private repository)

  2. Configure a pipeline in ICA to access the Git-based pipeline and provide the front-end configuration

Git Credentials (private repository)

To access private pipelines stored on Git, you need to enter the Git credentials. This is done at System Settings > Credentials > Create > Git Credential.

Fill out the following fields:

  • Name—Provide a name to easily identify your Git credentials

  • Git url—This is fixed to https://github.com

  • Personal Access Token —the personal access token. Either an existing one, or you can generate one on Git.

  • Git username — This field appears after entering the personal access token and is extracted from your token.

A link is provided to create your personal access token on github.com in case you do not have a token.

The link to the personal access token on GitHub.com will open a pop-up window, so if it does not open, you might have a pop-up blocker active.

You can share the credentials you own with other users of your tenant. To do so, select your credentials at System Settings > Credentials and choose Share.

Personal Access Token

Personal access tokens work in the same way as OAuth access tokens. You can create a new personal access token at https://github.com/settings/tokens/.

  • The personal access token must have scope:repo (meaning all elements in the repo category must be selected).

  • If you set an expiration date, all pipelines which use this personal access token will stop working on that date. For this reason, it is advised not to set an expiration date.

Select Generate token at the bottom of the screen to get your personal access token. The token will only be presented once, so copy it over to a secure location after generating it.

Creating the pipeline in ICA

  1. Create a new pipeline at Projects > your_project > flow > Pipelines > Create > Nextflow/CWL Code > from Git.

  2. Fill out the fields to access the pipeline. During configuration, a test will be performed to see if the repository can be reached with the provided credentials. If this is not the case, you can still save the entered values, so you are not blocked from creating a configuration if the repository is unavailable.

Repository url

The url where the pipeline is located. For example https://github.com/nf-core/demo Do not use a / as last character of your path as this will result in a failed pipeline with Remote resource not found as error.

Git credential

Only needed for private repositories. This is your previously configured Git credential.

Main file path

For NextFlow, If the main.nf is not in the root folder, edit the main file path to the main.nf file, otherwise keep main.nf. For CWL, this applies to the workflow.cwl file.

Config file path (Nextflow)

If you are configuring a nextflow pipeline, this is usually nextflow.config and located in the root folder. Edit this field to match your file and folder.

Version

Use the tag or long commit-id for the version. ICA will automatically convert the tag to fill out the commit-id.

  1. Select Next to proceed to the pipeline definition screen.

  2. The details of your configured GitHub pipeline are now visible on details tab where you can change the name, provide a description, update the Nextflow version or edit any other details as needed.

  3. Proceed to the Inputform files tab to edit the inputForm.json file to match the inputs as needed by your pipeline. See the example for more information.

  4. Once all details are updated, save your pipeline. Analyses with the pipeline can now be executed in the same way as with regular pipelines.

To minimise the risk of triggering API-rate limiting on GitHub, the validations above are not performed when configuring pipelines via the API.

Practical Example

As a basic example, we'll configure the nf-core demo pipeline from https://github.com/nf-core/demo/ to run in ICA.

  1. Create a new pipeline at Projects > your_project > flow > Pipelines > Create > Nextflow > from Git.

  2. Fill out the following details:

    • Repository url: https://github.com/nf-core/demo (Tip: do not use a slash or trailing space at the end)

    • Git credential: This is a public repository, so we don't need a token and this should be left blank

    • Main file path: main.nf is in the root folder, so we keep this as main.nf

    • Config file path: nextflow.config is in the root folder, so we need to enter nextflow.config

    • Version: In this example, we use the tags to identify the version we want. Enter 1.0.1 . You will see that when you enter this version in ICA, it will get the commit-id for that tag. Note that the commit-id is the long version, the short 7-character version is not supported.

  3. Select Next to proceed to the pipeline definition screen. You can at any time update information or add more details here later, but for now we only need to edit the Nextflow version to be the latest one and edit the inputForm.json file so we can select a file as input. Proceed to the Inputform files tab to edit the inputForm.json file.

{
  "fields": [
        {
           "id":"input",
           "label":"Input Samplesheet (FASTQ file list)",
           "type":"data",
           "dataFilter":{
              "dataFormat":[
                 "CSV"
              ],
              "dataType":"file"
           }
        },
        {
         "id":"outdir",
         "type":"textbox",
         "label":"Output Dir",
         "minValues":1,
         "maxValues":1,
         "disabled": true,
         "value": "out/"
      }
  ]
}
  1. As per instructions on the nf-core demo page, create a samplesheet.csv file on your local machine with as contents:

sample,fastq_1,fastq_2
SAMPLE1_PE,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R2.fastq.gz
SAMPLE2_PE,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R2.fastq.gz
SAMPLE3_SE,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,
SAMPLE3_SE,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R1.fastq.gz,
  1. In ICA, navigate to Projects > your_project > Data and upload the created samplesheet.csv file to your project .

  2. Now you are ready to run your analysis. Projects > your_project > flow > Pipelines.

  3. Select the created pipeline. (The default name will have been nf-core/demo) and choose Start analysis at the top of the screen.

  4. Select your samplesheet.csv input file and a storage size (Tip: select 3XSmall to minimize costs)

  5. Select Start analysis and the wait for the analysis to be completed.

If your analysis fails with Module path must start with / or ./ prefix -- Offending module: plugin/nf-schema, then the Nextlow version is not set to the latest one. Edit it on your Projects > your_project > flow > Pipelines > your_pipeline > Details

The nf-core framework for community-curated bioinformatics pipelines. Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Restrictions

The following restrictions apply to Git-sourced pipelines.

  • Front-end files such as JSON forms are not supported on GitHub, they need to be in ICA.

  • Only JSON-based NextFlow/CWL pipelines are supported, not XML-based pipelines.

  • Only GitHub.com is supported as repository location.

  • GitHub enterprise is not supported.

  • API is not supported.

  • Git-branches are not supported, only tag or commit-id for version control.

Last updated

Was this helpful?