Post Processing
A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs_Intermediates and Results folders, making it versatile for addressing specific requirements.
Note - Post-Processing feature is avaialable only for ICA Environment.This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.
Key Features
- Customizability: Easily adaptable to different post-processing requirements. 
- Reusability: Can be used in multiple pipelines, reducing development effort. 
- Data transformation: Can be used to transform or modify output data in various ways. 
What you need ?
- A config file which has Post-Processing parameters and values 
- A bash script , that implements desired functioanlity 
- Any other custom resources/files that will be required by the bash script 
- Docker container having dependencies to run the bash script 
Process
- Upload and configure Custom Docker 
- Modify config file; Set postProcessing_container to the uploaded conatiner 
- Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client. 
- Configure ICA Web-UI on 'Start Analysis' Page: - Enable postprocessing, Set it to 'true' 
- Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above 
- Add 'Custom Resources Directory', set it to the custom-resource directory above. 
 
Config File - <file-name>.config
postProcessing_container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/0f7f12a0-a6c8-4289-86c3-3e5310b97275:latest'
postProcessing_cpusMemoryConfig = 'single_threaded_low_mem'
postProcessing_shellScript = 'bam2cram.sh'
Configurable Parameters in Config file
postProcessing_container
Docker Container URI , Must be present/uploaded to ICA
postProcessing_cpusMemoryConfig
Compute Option to Use, allowed values given below
postProcessing_shellScript
File name of shell-script
Allowed values for postProcessing_cpusMemoryConfig in the config file
single_threaded_low_mem (default)
CPUs: 2, Mem(GB): 8
single_threaded_medium_mem
CPUs: 4, Mem(GB): 16
single_threaded_high_mem
CPUs: 8, Mem(GB): 32
multi_threaded_low_mem
CPUs: 16, Mem(GB): 64
multi_threaded_medium_mem
CPUs: 32, Mem(GB): 128
multi_threaded_high_mem
CPUs: 64, Mem(GB): 128
Post-Processing : Sample Script (bam2cram.sh)
A Post-Processing bash script is a Nextflow Template, which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.
#========================================================#
# This is a SAMPLE Script only for illustration purpose  #
# Modify it, according to your specific Use Case         #
#========================================================#
#must create this folder to save output files
mkdir -p "${params.postProcessing.stepName}"
cd "${params.postProcessing.stepName}"
#BAMs are located in 'analysis/results' folder
resultsdir="${params.analysisDir}/Results"
#this file must be uploaded to custom-resources-dir
genomefa="${params.customResourceDir}/genome.fa"
sleep_interval=30 # seconds
max_attempts=3
#set sample ids
sample_ids=("Mariner_1_Feasibility_Biosample_45-smoke" "sample_id_2")
for sample_id in "\${sample_ids[@]}"; do
    counter=0
    while : ; do
        if [ "\$counter" -eq "\$max_attempts" ]; then
            echo "WARNING! \${sample_id}.bam was NOT found!"
            break
        fi
        counter=\$((counter + 1))
        bam_file=\$(find \$resultsdir -type f -name "\${sample_id}.bam")
        if [ -z "\$bam_file" ]; then
            echo "Attempt \$counter : Waiting for \${sample_id}.bam"
            sleep \$sleep_interval
        else
            #process and break
            filename=\$(basename -s .bam \$bam_file)
            samtools view -C -T "\$genomefa" -o "./\$filename.cram" "\$bam_file"
            break
         fi
    done
done
exit 0Last updated
Was this helpful?
