1 of 6

Filtering

Partek Flow has the flexibility to subsample your data for further downstream analyses. Filter data by:

Filter features

Noise reduction filter
Statistics based filter
Feature metadata filter
Feature list filter
Filter features task report

A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes. Because there is no gold standard for what makes a gene informative or not, and ideal gene filtering criteria depend on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options.

Filter features task can be invoked from any counts or single cell data node. Noise Reduction and Statistics Based filters take each feature and perform the specified calculation across all of the cells. The filter is applied to the values in the selected data node and the output is a filtered version of the input data node.

In the task dialog, select the filter option to activate the filter type and configure the filter, then click Finish to run.

Noise reduction filter

The Noise reduction filter lets you exclude features that meet basic criteria (Figure 1).

Descriptive statistics you can choose are:

Coefficient of variation: std. dev divided by mean of the feature
Geometric mean: nth root of the product of the n numbers, n is the number of features
Maximum: the highest value of a feature
Mean: the average value of a feature
Median: value of the mid point of a feature
Minimum: lowest value of a feature
Range: the difference between the highest and lowest values of a feature
Std dev.: the square root of the variance
Sum: total value of the feature
Variance: the average of the squared differences from the mean
Dispersion: variance divided by mean of the feature

For each of these you can choose to exclude features that are:

<: less than
<=: less than or equal to
== equal to
>: greater than
>=: greater than or equal to

The threshold is set using the text box. The input must be a number; it can be an integer or decimal, positive or negative.

If you select value, you can also choose a percentage of samples or cells that must meet the criteria for the feature to be excluded (Figure 2).

Statistics based filter

The Statistics based filter lets you include a number or percentile of genes based on descriptive statistics (Figure 3).

Select Counts to specify a number of top features to include or select Percentiles to specify the top percentile of features to include.

Descriptive statistics you can choose are:

Coefficient of variance
Geometric mean
Maximum
Mean
Median
Minimum
Range
Standard deviation (std dev)
Sum
Variance
Dispersion

Feature metadata filter

If the data linked to feature (gene) annotation, different fields in the annotation can be used to filter, e.g. genomic location information, gene biotype information etc. (Figure 4)

You can specify logical operation using different annotation field information.

Feature list filter

You can filter features based on a feature lists (Figure 5).

If you have added feature lists in Partek Flow using the List management feature, the filter using Saved list option will be available. Otherwise, you can specify a Manual list by typing in the Filter criteria box.

If you choose Saved list, the drop-down list will display all the feature lists added in List management; If you choose Manual list, you can manually type in the feature IDs/names in the box, one feature per row.

You can choose to include or exclude features in any list that you have added.

Use the Feature identifier option to choose which identifier from your annotation matches the values in the feature list.

Filter features task report

The filter features task report lists the filter criteria, reports distribution statistics for the remaining features, and indicates the number and percentage of features that passed the filter (Figure 6).

If the input was a count matrix data node, sample box plot and sample histograms are provided to show the distribution of features after filtering. These plots are not available if the input was a single cell counts data node.

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Filter groups (samples or cells)

Configuring a filter
Configuring multiple filters
Filter groups task report

Filter samples or cells in order to perform downstream analysis on a subset of data.

To filter groups, click a count matrix or single cell counts data node, click the Filtering section of the toolbox, and choose to Filter samples (bulk data) or Filter cells (single cell data).

The dialog lets you build a series of filters based on sample or cell attributes.

Click Finish to apply the filter. If no sample or cell will pass the filter criteria, a warning message will appear and the task will not run.

Configuring a filter

The first drop-down menu allows you to choose to include or exclude based on the specified criteria.

The second drop-down menu allows you to choose any categorical or numeric attribute to use for the filter criteria.

If the attribute is categorical, the third drop-down menu includes in and not in as options. A fourth drop-down menu allows you to search and choose from the levels of the selected attribute (Figure 1).

If the attribute is numeric, the the third drop-down includes:

<: less than
<=: less than or equal to
== equal to
>: greater than
>=: greater than or equal to

The threshold is set using the text box (Figure 2). The input must be a number; it can be an integer or decimal, positive or negative.

Configuring multiple filters

Using the OR and AND options, you can combine multiple filters.

When combining multiple filters all set to Include:

With AND, if all statements must be true for the sample to meet the filter criteria.

With OR, if any statement is true, the sample will meet the filter criteria.

When combining multiple filters all set to Exclude:

With AND, if any statement is true, the sample will meet the filter criteria.

With OR, all statements must be true for the sample to meet the filter criteria.

Filter groups task report

The filter groups task report lists the filter criteria and reports feature distribution statistics for the remaining samples (Figure 3).

If the input was a count matrix data node, the percentage of samples remaining after the filter is listed and charts are provided to show the breakdown of samples by categorical attributes before and after filtering (Figure 4).

If the input was a single cell counts data node, a second table displays the details from each sample based on the filtered criteria (Figure 5).

If the input was a classified groups single cell counts data node, the cell count table includes a breakdown by classification and a bar chart is provided to show the number of cells from each classification remaining after filtering (Figure 6).

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Filter barcodes

In droplet-based single cell isolation and library prep methods, each droplet is labeled by a unique nucleotide barcode. Because not all droplets will contain cells, it is important to filter out nucleotide barcodes that correspond to empty droplets prior to downstream analysis.

In Partek Flow, you can filter barcodes interactively using a knee plot after UMI deduplication in the or after quantification in the . Alternatively, you can filter using preset options in the Filter barcodes task.

To invoke Filter barcodes:

Click a Deduplicated reads data node
Click the Filtering section of the toolbox
Click Filter barcodes

Configuring Filter barcodes

You can choose to filter barcodes using three or four options, depending on whether you have already run a Filter barcodes task for your samples in the project (Figure 1).

Automatic

The automatic filter threshold is set for each sample individually. It picks the cutoff between cells and empty droplets by identifying where the UMI content per barcode drops precipitously when moving in descending order from the barcode with the highest number of UMIs.

Number of cells

Set the number of cells per sample to include. This is set for all samples; if set to 100, the top 100 barcodes by total UMI count for each sample will be retained.

Percent of reads in cell

Set the percent of reads in cells per sample to include. The number of barcodes included will be set to match the specified percent of reads in cells for each sample. Barcodes are included starting with the barcode with the highest number of total UMIs and proceeding in descending order of total UMIs per barcode until the specified percent of reads has been met or exceeded.

Previous filter

If you have already run a Filter barcodes task for your samples in the project, the Previous filter option will be available. This option lets you filter to the same cell barcodes that were included by the previous filter. This option is particularly useful for CITE-Seq data, where antibody barcodes and gene expression data must be processed separately, but you will want to analyze the same cell barcodes in downstream steps.

Selecting Previous filter opens a table with information about the previous barcode filters in your project (Figure 2).

Use the radio buttons in the first column to pick which filter you want to use.

Output of Filter barcodes

After configuring the task, click Finish to run.

Filter barcodes produces a Filtered reads data node (Figure 4). Filter barcodes does not have a task report.

Additional Assistance

Split by attribute

The Split by Attribute task is used to split a data node into different nodes based on the groups in a categorical attribute, each data node only includes the samples/cells from one group. It is a more efficient way to filter your data if you plan to perform downstream analysis on each and every group separately in an attribute.

Click on the data node and select split by attribute from the Filtering section in task menu (Figure 1).

Select the attribute to split the data on. In this case, data will be split according to the Age attribute (Figure 2).

Result of the split by attribute task will be two separate data nodes, each contains samples from one age group (Figure 3).

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Downsample Cells

The Downsample Cells task is used to randomly downsample the number of cells in a single cell data set. This task can be used to reduce the size of large single cell datasets to small and manageable sizes for quick analysis. Another use case for this task is for a project with multi samples with each sample having different number of cells. Downsample Cells can be used to randomly select an equal number of cells for all the samples in the project. For the default setting, the sample with the minimum number of cells is used with the number of cells in that sample set as the number of cells to be selected in the other samples. However, this default setting can be changed to a preferred number by the user. If the number selected by the user is greater than the number of cells in one or more samples, those samples will not be downsampled and all the cells in those samples will be returned. If the number selected by the user is greater than the number of cells in all the samples, then none of the samples will be downsampled.

To run a downsample task first click on a single cell count data node. Go to the Filtering section and select Downsample cells task (Figure 1).

Clicking on the Downsample cells task will lead to a dialogue menu with the number of cells to be downsampled set to the minimum number of cells in the project. In the figure below, the minimum number of cells in any of the samples was 2658 and this is used in the default settings. Click Finish to run the task (Figure 2).

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Filter groups (samples or cells)

Configuring a filter
Configuring multiple filters
Filter groups task report

Filter samples or cells in order to perform downstream analysis on a subset of data.

The dialog lets you build a series of filters based on sample or cell attributes.

Click Finish to apply the filter. If no sample or cell will pass the filter criteria, a warning message will appear and the task will not run.

Configuring a filter

The first drop-down menu allows you to choose to include or exclude based on the specified criteria.

The second drop-down menu allows you to choose any categorical or numeric attribute to use for the filter criteria.

If the attribute is numeric, the the third drop-down includes:

<: less than
<=: less than or equal to
== equal to
>: greater than
>=: greater than or equal to

The threshold is set using the text box (Figure 2). The input must be a number; it can be an integer or decimal, positive or negative.

Configuring multiple filters

Using the OR and AND options, you can combine multiple filters.

When combining multiple filters all set to Include:

With AND, if all statements must be true for the sample to meet the filter criteria.

With OR, if any statement is true, the sample will meet the filter criteria.

When combining multiple filters all set to Exclude:

With AND, if any statement is true, the sample will meet the filter criteria.

With OR, all statements must be true for the sample to meet the filter criteria.

Filter groups task report

The filter groups task report lists the filter criteria and reports feature distribution statistics for the remaining samples (Figure 3).

If the input was a single cell counts data node, a second table displays the details from each sample based on the filtered criteria (Figure 5).

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Filter barcodes

To invoke Filter barcodes:

Click a Deduplicated reads data node
Click the Filtering section of the toolbox
Click Filter barcodes

Configuring Filter barcodes

You can choose to filter barcodes using three or four options, depending on whether you have already run a Filter barcodes task for your samples in the project (Figure 1).

Automatic

Number of cells

Set the number of cells per sample to include. This is set for all samples; if set to 100, the top 100 barcodes by total UMI count for each sample will be retained.

Percent of reads in cell

Previous filter

Selecting Previous filter opens a table with information about the previous barcode filters in your project (Figure 2).

To help you identify which previous filter you want to apply, the color of the task node on the Analyses tab, the number of cell barcodes retained (summed for all samples), and the time/date the previous filter task was submitted are included in the table. To view the number of cells and percentage of reads in cells for each sample in a previous filter, mouse over the button (Figure 3).

Use the radio buttons in the first column to pick which filter you want to use.

Output of Filter barcodes

After configuring the task, click Finish to run.

Filter barcodes produces a Filtered reads data node (Figure 4). Filter barcodes does not have a task report.

Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Filter features

Noise reduction filter
Statistics based filter
Feature metadata filter
Feature list filter
Filter features task report

In the task dialog, select the filter option to activate the filter type and configure the filter, then click Finish to run.

Noise reduction filter

The Noise reduction filter lets you exclude features that meet basic criteria (Figure 1).

Descriptive statistics you can choose are:

Coefficient of variation: std. dev divided by mean of the feature
Geometric mean: nth root of the product of the n numbers, n is the number of features
Maximum: the highest value of a feature
Mean: the average value of a feature
Median: value of the mid point of a feature
Minimum: lowest value of a feature
Range: the difference between the highest and lowest values of a feature
Std dev.: the square root of the variance
Sum: total value of the feature
Variance: the average of the squared differences from the mean
Dispersion: variance divided by mean of the feature

For each of these you can choose to exclude features that are:

<: less than
<=: less than or equal to
== equal to
>: greater than
>=: greater than or equal to

The threshold is set using the text box. The input must be a number; it can be an integer or decimal, positive or negative.

If you select value, you can also choose a percentage of samples or cells that must meet the criteria for the feature to be excluded (Figure 2).

Statistics based filter

The Statistics based filter lets you include a number or percentile of genes based on descriptive statistics (Figure 3).

Select Counts to specify a number of top features to include or select Percentiles to specify the top percentile of features to include.

Descriptive statistics you can choose are:

Coefficient of variance
Geometric mean
Maximum
Mean
Median
Minimum
Range
Standard deviation (std dev)
Sum
Variance
Dispersion

Feature metadata filter

If the data linked to feature (gene) annotation, different fields in the annotation can be used to filter, e.g. genomic location information, gene biotype information etc. (Figure 4)

You can specify logical operation using different annotation field information.