# Post Processing

A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs\_Intermediates and Results folders, making it versatile for addressing specific requirements.

```
Note - Post-Processing feature is avaialable only for ICA Environment.
```

This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.

## Key Features

* Customizability: Easily adaptable to different post-processing requirements.
* Reusability: Can be used in multiple pipelines, reducing development effort.
* Data transformation: Can be used to transform or modify output data in various ways.

## What you need ?

1. A config file which has Post-Processing parameters and values
2. A bash script , that implements desired functioanlity
3. Any other custom resources/files that will be required by the bash script
4. Docker container having dependencies to run the bash script

## Process

1. Upload and configure [Custom Docker](https://help.ica.illumina.com/home/h-dockerrepository#importing-a-private-image-tools--bench-images)
2. Modify config file; Set **postProcessing\_container** to the uploaded conatiner
3. Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client.
4. Configure ICA Web-UI on **'Start Analysis'** Page:
   1. Enable postprocessing, Set it to 'true'
   2. Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above
   3. Add 'Custom Resources Directory', set it to the custom-resource directory above.

## Config File - \<file-name>.config

```sh

postProcessing_container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/0f7f12a0-a6c8-4289-86c3-3e5310b97275:latest'
postProcessing_cpusMemoryConfig = 'single_threaded_low_mem'
postProcessing_shellScript = 'bam2cram.sh'

```

## Configurable Parameters in Config file

| Parameter                        | Description                                            |
| -------------------------------- | ------------------------------------------------------ |
| postProcessing\_container        | Docker Container URI , Must be present/uploaded to ICA |
| postProcessing\_cpusMemoryConfig | Compute Option to Use, allowed values given below      |
| postProcessing\_shellScript      | File name of shell-script                              |

## Allowed values for **postProcessing\_cpusMemoryConfig** in the config file

| Value                                | Description            |
| ------------------------------------ | ---------------------- |
| single\_threaded\_low\_mem (default) | CPUs: 2, Mem(GB): 8    |
| single\_threaded\_medium\_mem        | CPUs: 4, Mem(GB): 16   |
| single\_threaded\_high\_mem          | CPUs: 8, Mem(GB): 32   |
| multi\_threaded\_low\_mem            | CPUs: 16, Mem(GB): 64  |
| multi\_threaded\_medium\_mem         | CPUs: 32, Mem(GB): 128 |
| multi\_threaded\_high\_mem           | CPUs: 64, Mem(GB): 128 |

## Post-Processing : Sample Script (bam2cram.sh)

A Post-Processing bash script is a [Nextflow Template](https://www.nextflow.io/docs/latest/process.html#template), which has access to paths/variables defined in the parent **Nextflow Process**. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs\_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.

```sh

#========================================================#
# This is a SAMPLE Script only for illustration purpose  #
# Modify it, according to your specific Use Case         #
#========================================================#

#must create this folder to save output files
mkdir -p "${params.postProcessing.stepName}"

cd "${params.postProcessing.stepName}"

#BAMs are located in 'analysis/results' folder
resultsdir="${params.analysisDir}/Results"
#this file must be uploaded to custom-resources-dir
genomefa="${params.customResourceDir}/genome.fa"

sleep_interval=30 # seconds
max_attempts=3

#set sample ids
sample_ids=("Mariner_1_Feasibility_Biosample_45-smoke" "sample_id_2")

for sample_id in "\${sample_ids[@]}"; do
    counter=0
    while : ; do
        if [ "\$counter" -eq "\$max_attempts" ]; then
            echo "WARNING! \${sample_id}.bam was NOT found!"
            break
        fi
        counter=\$((counter + 1))
        bam_file=\$(find \$resultsdir -type f -name "\${sample_id}.bam")
        if [ -z "\$bam_file" ]; then
            echo "Attempt \$counter : Waiting for \${sample_id}.bam"
            sleep \$sleep_interval
        else
            #process and break
            filename=\$(basename -s .bam \$bam_file)
            samtools view -C -T "\$genomefa" -o "./\$filename.cram" "\$bam_file"
            break
         fi
    done
done

exit 0
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.connected.illumina.com/dragen-clinical-research-apps/readme/common/cloud/post-processing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
