Counting Algorithm

The Illumina miRNA sequencing fastqs are uploaded to either BSSH or ICA, where reads are processed through the following steps in the DRAGEN miRNA app:

  1. Calibration of miRBase entries For miRNA entries with identical or nearly identical sequences in the miRBase mature database, manual calibration is performed. A combined entry is generated for each overlapping miRNA set. For instance, the sequence of hsa-miR-151b is entirely contained within hsa-miR-151a-5p; therefore, the resulting entry is reported as hsa-miR-151b/151a-5p.

  2. Adapter and quality trimming Reads are processed with cutadapt (documentation) to remove 3′ adapters (AACTGTAGGCACCATCAAT) and low-quality bases. Reads lacking adapter sequences are separately tallied as no_adapter_reads.

  3. Insert and UMI identification After trimming, insert and UMI sequences are extracted. Reads with inserts shorter than 16 bp (too_short_reads) or UMIs shorter than 10 bp (UMI_defective_reads) are discarded.

  4. Insert sequence alignment A unique sequence set is generated across all samples within a submitted job. Insert sequences are annotated using a sequential alignment strategy with bowtie (bowtie-bio.sourceforge.net). Alignments proceed in the following order:

    • Perfect match to miRBase mature

    • miRBase hairpin

    • Noncoding RNA, mRNA, otherRNA

    • Secondary alignment to miRBase mature (allowing up to two mismatches)

    At each step, only unmapped sequences are passed forward. Read counts are reported per RNA category (e.g., miRNA_Reads, hairpin_Reads, piRNA_Reads, tRNA_Reads, rRNA_Reads, mRNA_Reads). miRBase is used for miRNAs (v21 or v22), while piRNABank is referenced for piRNAs.

    For human, mouse, and rat, a species-specific miRBase mature database is used, followed by genome alignment of remaining sequences to identify potential novel miRNAs (human: GRCh38, mouse: GRCm38, rat: Rnor_6.0). For all other species, a comprehensive miRBase mature database is applied.

  5. Counting reads and unique molecules For each sample, all reads assigned to a given miRNA or piRNA ID are tallied, and UMIs are aggregated to calculate unique molecule counts. Results are reported as follows:

    • miR_piRNA sheet: read counts and UMI counts for miRNAs and piRNAs

    • tRNA and otherRNA sheets: results for tRNAs and other RNAs

    • notCharacterized_mappable sheet: reads and clustered UMIs aligned to the genome in the final step (human, mouse, rat only)

    • notCharacterized_notMappable: tally of all remaining unmapped reads

Last updated

Was this helpful?