Post Processing
Last updated
Was this helpful?
Last updated
Was this helpful?
A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs_Intermediates and Results folders, making it versatile for addressing specific requirements.
This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.
Customizability: Easily adaptable to different post-processing requirements.
Reusability: Can be used in multiple pipelines, reducing development effort.
Data transformation: Can be used to transform or modify output data in various ways.
A config file which has Post-Processing parameters and values
A bash script , that implements desired functioanlity
Any other custom resources/files that will be required by the bash script
Docker container having dependencies to run the bash script
Upload and configure
Modify config file; Set postProcessing_container to the uploaded conatiner
Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client.
Configure ICA Web-UI on 'Start Analysis' Page:
Enable postprocessing, Set it to 'true'
Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above
Add 'Custom Resources Directory', set it to the custom-resource directory above.
postProcessing_container
Docker Container URI , Must be present/uploaded to ICA
postProcessing_cpusMemoryConfig
Compute Option to Use, allowed values given below
postProcessing_shellScript
File name of shell-script
single_threaded_low_mem (default)
CPUs: 2, Mem(GB): 8
single_threaded_medium_mem
CPUs: 4, Mem(GB): 16
single_threaded_high_mem
CPUs: 8, Mem(GB): 32
multi_threaded_low_mem
CPUs: 16, Mem(GB): 64
multi_threaded_medium_mem
CPUs: 32, Mem(GB): 128
multi_threaded_high_mem
CPUs: 64, Mem(GB): 128
A Post-Processing bash script is a , which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.