Counting Algorithm

The Illumina miRNA sequencing fastqs are uploaded to either BSSH or ICA, where reads are processed through the following steps in the DRAGEN miRNA app:

Calibration of miRBase entries For miRNA entries with identical or nearly identical sequences in the miRBase mature database, manual calibration is performed. A combined entry is generated for each overlapping miRNA set. For instance, the sequence of hsa-miR-151b is entirely contained within hsa-miR-151a-5p; therefore, the resulting entry is reported as hsa-miR-151b/151a-5p.
Adapter and quality trimming Reads are processed with cutadapt (documentation) to remove 3′ adapters (AACTGTAGGCACCATCAAT) and low-quality bases. Reads lacking adapter sequences are separately tallied as no_adapter_reads.
Insert and UMI identification After trimming, insert and UMI sequences are extracted. Reads with inserts shorter than 16 bp (too_short_reads) or UMIs shorter than 10 bp (UMI_defective_reads) are discarded.
Insert sequence alignment A unique sequence set is generated across all samples within a submitted job. Insert sequences are annotated using a sequential alignment strategy with bowtie (bowtie-bio.sourceforge.net). Alignments proceed in the following order:
- Perfect match to miRBase mature
- miRBase hairpin
- Noncoding RNA, mRNA, otherRNA
- Secondary alignment to miRBase mature (allowing up to two mismatches)
At each step, only unmapped sequences are passed forward. Read counts are reported per RNA category (e.g., miRNA_Reads, hairpin_Reads, piRNA_Reads, tRNA_Reads, rRNA_Reads, mRNA_Reads). miRBase is used for miRNAs (v21 or v22), while piRNABank is referenced for piRNAs.
For human, mouse, and rat, a species-specific miRBase mature database is used, followed by genome alignment of remaining sequences to identify potential novel miRNAs (human: GRCh38, mouse: GRCm38, rat: Rnor_6.0). For all other species, a comprehensive miRBase mature database is applied.
Counting reads and unique molecules For each sample, all reads assigned to a given miRNA or piRNA ID are tallied, and UMIs are aggregated to calculate unique molecule counts. Results are reported as follows:
- miRNA_piRNA sheet: read counts and UMI counts for miRNAs and piRNAs
- tRNA and otherRNA sheets: results for tRNAs and other RNAs
- notCharacterized_mappable sheet: reads and clustered UMIs aligned to the genome in the final step (human, mouse, rat only)
- notCharacterized_notMappable: tally of all remaining unmapped reads

PreviousIntroduction NextReference Database

Last updated 2 months ago

Was this helpful?