# Counting Algorithm

The Illumina miRNA sequencing fastqs are uploaded to either BSSH or ICA, where reads are processed through the following steps in the DRAGEN miRNA app:

1. **Calibration of miRBase entries**\
   For miRNA entries with identical or nearly identical sequences in the miRBase mature database, manual calibration is performed. A combined entry is generated for each overlapping miRNA set. For instance, the sequence of *hsa-miR-151b* is entirely contained within *hsa-miR-151a-5p*; therefore, the resulting entry is reported as *hsa-miR-151b/151a-5p*.
2. **Adapter and quality trimming**\
   Reads are processed with *cutadapt* ([documentation](https://cutadapt.readthedocs.io/en/stable/guide.html)) to remove 3′ adapters (AACTGTAGGCACCATCAAT) and low-quality bases. Reads lacking adapter sequences are separately tallied as *no\_adapter\_reads*.
3. **Insert and UMI identification**\
   After trimming, insert and UMI sequences are extracted. Reads with inserts shorter than 16 bp (*too\_short\_reads*) or UMIs shorter than 10 bp (*UMI\_defective\_reads*) are discarded.
4. **Insert sequence alignment**\
   A unique sequence set is generated across all samples within a submitted job. Insert sequences are annotated using a sequential alignment strategy with *bowtie* (bowtie-bio.sourceforge.net). Alignments proceed in the following order:

   * Perfect match to miRBase mature
   * miRBase hairpin
   * Noncoding RNA, mRNA, otherRNA
   * Secondary alignment to miRBase mature (allowing up to two mismatches)

   At each step, only unmapped sequences are passed forward. Read counts are reported per RNA category (e.g., *miRNA\_Reads, hairpin\_Reads, piRNA\_Reads, tRNA\_Reads, rRNA\_Reads, mRNA\_Reads*). miRBase is used for miRNAs (v21 or v22), while piRNABank is referenced for piRNAs.

   For human, mouse, and rat, a species-specific miRBase mature database is used, followed by genome alignment of remaining sequences to identify potential novel miRNAs (human: GRCh38, mouse: GRCm38, rat: Rnor\_6.0). For all other species, a comprehensive miRBase mature database is applied.
5. **Counting reads and unique molecules**\
   For each sample, all reads assigned to a given miRNA or piRNA ID are tallied, and UMIs are aggregated to calculate unique molecule counts. Results are reported as follows:
   * *miRNA\_piRNA* sheet: read counts and UMI counts for miRNAs and piRNAs
   * *tRNA* and *otherRNA* sheets: results for tRNAs and other RNAs
   * *notCharacterized\_mappable* sheet: reads and clustered UMIs aligned to the genome in the final step (human, mouse, rat only)
   * *notCharacterized\_notMappable*: tally of all remaining unmapped reads
