# HTSeq

* [Configurable options](#configurable-options)
* [Output](#output)
* [References](#references)

HTSeq is set of tools for processing high-throughput sequencing data1. In Partek Flow, we have implemented the htseq-count script from HTSeq for quantifying aligned reads to an annotation model.

The input for HTSeq is an *Aligned reads* data node and a *Gene/Feature annotation file*.

To run HTSeq:

* Click an **Aligned reads** data node
* Click the **Quantification** section of the toolbox
* Click **HTSeq**

Please note that HTSeq has not been optimized for performance and can take a very long time to run compared with [Quantify to annotation model (Partek E/M)](https://help.connected.illumina.com/partek/partek-flow/user-manual/task-menu/quantification/quantify-to-annotation-model-partek-em) on the same data.

## Configurable options

HTSeq includes basic options (Figure 1) and advanced options accessible by clicking **Configure** (Figure 2).

### Basic Options

<div align="left"><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-9f4bd9eb40676c75c15307063891c9d26eaf978c%2Fimage2019-4-15%2013_46_16.png?alt=media" alt="Figure 1. Basic HTSeq options"></div>

#### Annotation file

The annotation file contains the features the aligned reads will be quantified to. For more information about adding an annotation model, please see [Adding an Annotation Model](https://help.connected.illumina.com/partek/partek-flow/user-manual/settings/components/library-file-management/adding-an-annotation-model).

#### Strand specificity

Depending on the library preparation method, information about the strand of the original transcript may be faithfully preserved or lost. This setting controls whether HTSeq considers strand during quantification. Consult your library preparation method user manual if you are unsure about if and how the method preserved strand information.

If set to *no*, a read is considered to be overlapping a feature regardless of whether it maps to the same or opposite strand as the feature.

If set to *yes*, the read has to be matched to the same strand as the feature if single-end reads and matched to the same strand for the first read and the opposite strand for the second read if paired-end reads.

If set to *reverse*, the read has to be matched to the opposite strand as the feature if single-end reads and matched to the opposite strand for the first read and the same strand for the second read if paired-end reads.

#### Include features with no counts

By default, only features (e.g., genes) with one or more aligned read will be included in the output. If this option is selected, all features from the annotation model, including those without any matching aligned reads, will be included.

### Advanced Options

<div align="left"><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-541e13f07d2258fdccdbf568a77f786ea95ad4f7%2Fimage2019-4-15%2014_52_42.png?alt=media" alt="Figure 2. Advanced options for HTSeq"></div>

#### Min qual

HTSeq will skip reads with an alignment quality lower than the value specified here. The default is 10.

#### Overlap mode

This option determines how HTSeq handles reads that partially overlap features. The default is *union*.

If set to *union*, any read that is partially overlapped by a feature will be assigned to that feature. Assignment is non-exclusive if multiple feature overlap.

If set to *intersection-strict*, only reads that are fully overlapped by a feature will be assigned to that feature. Assignment is non-exclusive if multiple feature fully overlap.

If set to *intersection-nonempty*, reads are assigned to the feature that has the greatest overlap. Assignment is non-exclusive if multiple feature overlap the same amount.

#### Nonunique mode

This option determines how HTSeq counts reads that are assigned to more than one feature. The default is *none*.

If set to *none*, reads that are assigned to more than one feature are not counted for any feature.

If set to *all*, reads are counted for all features they are assigned to.

## Output

HTSeq outputs a *Gene counts* data node (Figure 3). There is no task report.

<div align="left"><img src="https://1384254481-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FJVEESmJAPppJ3ijFq5aR%2Fuploads%2Fgit-blob-112ac48abd72513831e35f723fad55124c9dd52a%2Fimage2019-4-15%2014_55_34.png?alt=media" alt="Figure 3. HTSeq output"></div>

The ribosomal reads % column, present when the data is downloaded, is identified by searching ribosomal genes by their gene symbol against a list of 89 L & S ribosomal genes taken from [HGNC](https://www.genenames.org/). -<https://help.partek.illumina.com/partek-flow/user-manual/task-menu/qa-qc/single-cell-qa-qc>; it calculates 'the sum of counts across these genes / the total count' \* 100 to give the % of ribosomal counts.

## References

1. Simon Anders, Paul Theodor Pyl, Wolfgang Huber. HTSeq — A Python framework to work with high-throughput sequencing data. Bioinformatics (2014), in print, online at doi:10.1093/bioinformatics/btu638

## Additional Assistance

If you need additional assistance, please visit [our support page](http://www.partek.com/support) to submit a help ticket or find phone numbers for regional support.
