Counting Algorithm
The Illumina miRNA sequencing fastqs are uploaded to either BSSH or ICA, where reads are processed through the following steps in the DRAGEN miRNA app:
Calibration of miRBase entries For miRNA entries with identical or nearly identical sequences in the miRBase mature database, manual calibration is performed. A combined entry is generated for each overlapping miRNA set. For instance, the sequence of hsa-miR-151b is entirely contained within hsa-miR-151a-5p; therefore, the resulting entry is reported as hsa-miR-151b/151a-5p.
Adapter and quality trimming Reads are processed with cutadapt (documentation) to remove 3′ adapters (AACTGTAGGCACCATCAAT) and low-quality bases. Reads lacking adapter sequences are separately tallied as no_adapter_reads.
Insert and UMI identification After trimming, insert and UMI sequences are extracted. Reads with inserts shorter than 16 bp (too_short_reads) or UMIs shorter than 10 bp (UMI_defective_reads) are discarded.
Insert sequence alignment A unique sequence set is generated across all samples within a submitted job. Insert sequences are annotated using a sequential alignment strategy with bowtie (bowtie-bio.sourceforge.net). Alignments proceed in the following order:
Perfect match to miRBase mature
miRBase hairpin
Noncoding RNA, mRNA, otherRNA
Secondary alignment to miRBase mature (allowing up to two mismatches)
At each step, only unmapped sequences are passed forward. Read counts are reported per RNA category (e.g., miRNA_Reads, hairpin_Reads, piRNA_Reads, tRNA_Reads, rRNA_Reads, mRNA_Reads). miRBase is used for miRNAs (v21 or v22), while piRNABank is referenced for piRNAs.
For human, mouse, and rat, a species-specific miRBase mature database is used, followed by genome alignment of remaining sequences to identify potential novel miRNAs (human: GRCh38, mouse: GRCm38, rat: Rnor_6.0). For all other species, a comprehensive miRBase mature database is applied.
Counting reads and unique molecules For each sample, all reads assigned to a given miRNA or piRNA ID are tallied, and UMIs are aggregated to calculate unique molecule counts. Results are reported as follows:
miR_piRNA sheet: read counts and UMI counts for miRNAs and piRNAs
tRNA and otherRNA sheets: results for tRNAs and other RNAs
notCharacterized_mappable sheet: reads and clustered UMIs aligned to the genome in the final step (human, mouse, rat only)
notCharacterized_notMappable: tally of all remaining unmapped reads
Last updated
Was this helpful?