arrow-left

All pages
gitbookPowered by GitBook
1 of 6

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Filtering

Partek Flow has the flexibility to subsample your data for further downstream analyses. Filter data by:

  • Filter features

  • Filter groups (samples or cells)

Filter barcodes
Split by attributearrow-up-right
Downsample Cellsarrow-up-right

Filter groups (samples or cells)

  • Configuring a filter

  • Configuring multiple filters

  • Filter groups task report

Filter samples or cells in order to perform downstream analysis on a subset of data.

To filter groups, click a count matrix or single cell counts data node, click the Filtering section of the toolbox, and choose to Filter samples (bulk data) or Filter cells (single cell data).

The dialog lets you build a series of filters based on sample or cell attributes.

Click Finish to apply the filter. If no sample or cell will pass the filter criteria, a warning message will appear and the task will not run.

hashtag
Configuring a filter

The first drop-down menu allows you to choose to include or exclude based on the specified criteria.

The second drop-down menu allows you to choose any categorical or numeric attribute to use for the filter criteria.

If the attribute is categorical, the third drop-down menu includes in and not in as options. A fourth drop-down menu allows you to search and choose from the levels of the selected attribute (Figure 1).

If the attribute is numeric, the the third drop-down includes:

  • <: less than

  • <=: less than or equal to

  • == equal to

  • >: greater than

The threshold is set using the text box (Figure 2). The input must be a number; it can be an integer or decimal, positive or negative.

hashtag
Configuring multiple filters

Using the OR and AND options, you can combine multiple filters.

When combining multiple filters all set to Include:

With AND, if all statements must be true for the sample to meet the filter criteria.

With OR, if any statement is true, the sample will meet the filter criteria.

When combining multiple filters all set to Exclude:

With AND, if any statement is true, the sample will meet the filter criteria.

With OR, all statements must be true for the sample to meet the filter criteria.

hashtag
Filter groups task report

The filter groups task report lists the filter criteria and reports feature distribution statistics for the remaining samples (Figure 3).

If the input was a count matrix data node, the percentage of samples remaining after the filter is listed and charts are provided to show the breakdown of samples by categorical attributes before and after filtering (Figure 4).

If the input was a single cell counts data node, a second table displays the details from each sample based on the filtered criteria (Figure 5).

If the input was a classified groups single cell counts data node, the cell count table includes a breakdown by classification and a bar chart is provided to show the number of cells from each classification remaining after filtering (Figure 6).

hashtag
Additional Assistance

If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

Filter features

>=: greater than or equal to

our support pagearrow-up-right
Figure 1. Filtering by a categorical attribute
Figure 2. Filtering groups by a numeric attribute
Figure 3. Filter samples task report - feature distribution table
Figure 4. Filter samples report - feature distribution table
Figure 5. Filter cells - cell classification table
Figure 6. Filter cells - classification bar chart

Feature list filter

  • Filter features task report

  • A common task in bulk and single-cell RNA-Seq analysis is to filter the data to include only informative genes. Because there is no gold standard for what makes a gene informative or not, and ideal gene filtering criteria depend on your experimental design and research question, Partek Flow has a wide variety of flexible filtering options.

    Filter features task can be invoked from any counts or single cell data node. Noise Reduction and Statistics Based filters take each feature and perform the specified calculation across all of the cells. The filter is applied to the values in the selected data node and the output is a filtered version of the input data node.

    In the task dialog, select the filter option to activate the filter type and configure the filter, then click Finish to run.

    hashtag
    Noise reduction filter

    The Noise reduction filter lets you exclude features that meet basic criteria (Figure 1).

    Figure 1. Noise reduction filter

    Descriptive statistics you can choose are:

    • Coefficient of variation: std. dev divided by mean of the feature

    • Geometric mean: nth root of the product of the n numbers, n is the number of features

    • Maximum: the highest value of a feature

    • Mean: the average value of a feature

    • Median: value of the mid point of a feature

    • Minimum: lowest value of a feature

    • Range: the difference between the highest and lowest values of a feature

    • Std dev.: the square root of the variance

    • Sum: total value of the feature

    • Variance: the average of the squared differences from the mean

    • Dispersion: variance divided by mean of the feature

    For each of these you can choose to exclude features that are:

    • <: less than

    • <=: less than or equal to

    • == equal to

    • >: greater than

    • >=: greater than or equal to

    The threshold is set using the text box. The input must be a number; it can be an integer or decimal, positive or negative.

    If you select value, you can also choose a percentage of samples or cells that must meet the criteria for the feature to be excluded (Figure 2).

    Figure 2. Selecting value exposes additional options

    hashtag
    Statistics based filter

    The Statistics based filter lets you include a number or percentile of genes based on descriptive statistics (Figure 3).

    Figure 3. Statistics based filter

    Select Counts to specify a number of top features to include or select Percentiles to specify the top percentile of features to include.

    Descriptive statistics you can choose are:

    • Coefficient of variance

    • Geometric mean

    • Maximum

    • Mean

    • Median

    • Minimum

    • Range

    • Standard deviation (std dev)

    • Sum

    • Variance

    • Dispersion

    hashtag
    Feature metadata filter

    If the data linked to feature (gene) annotation, different fields in the annotation can be used to filter, e.g. genomic location information, gene biotype information etc. (Figure 4)

    Figure 4. Filter features based on feature annotation fields

    You can specify logical operation using different annotation field information.

    hashtag
    Feature list filter

    You can filter features based on a feature lists (Figure 5).

    Figure 5. Feature list filter

    If you have added feature lists in Partek Flow using the List management feature, the filter using Saved list option will be available. Otherwise, you can specify a Manual list by typing in the Filter criteria box.

    If you choose Saved list, the drop-down list will display all the feature lists added in List management; If you choose Manual list, you can manually type in the feature IDs/names in the box, one feature per row.

    You can choose to include or exclude features in any list that you have added.

    Use the Feature identifier option to choose which identifier from your annotation matches the values in the feature list.

    hashtag
    Filter features task report

    The filter features task report lists the filter criteria, reports distribution statistics for the remaining features, and indicates the number and percentage of features that passed the filter (Figure 6).

    Figure 6. Feature filter report table

    If the input was a count matrix data node, sample box plot and sample histograms are provided to show the distribution of features after filtering. These plots are not available if the input was a single cell counts data node.

    Figure 7. Filter features report figures

    hashtag
    Additional Assistance

    If you need additional assistance, please visit our support pagearrow-up-right to submit a help ticket or find phone numbers for regional support.

    Noise reduction filter
    Statistics based filter
    Feature metadata filter

    Split by attribute

    The Split by Attribute task is used to split a data node into different nodes based on the groups in a categorical attribute, each data node only includes the samples/cells from one group. It is a more efficient way to filter your data if you plan to perform downstream analysis on each and every group separately in an attribute.

    Click on the data node and select split by attribute from the Filtering section in task menu (Figure 1).

    Figure 1. Single Click on Data node to be split (red) and select the split by attribute task (red rectangle) from the filtering menu

    Select the attribute to split the data on. In this case, data will be split according to the Age attribute (Figure 2).

    Figure 2. Splitting data into >110 and <110 groups of the Age category. Click Finish

    Result of the split by attribute task will be two separate data nodes, each contains samples from one age group (Figure 3).

    Figure 3. Count data node split into two :>110 and <110 data nodes (red rectangle)

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    our support pagearrow-up-right

    Downsample Cells

    The Downsample Cells task is used to randomly downsample the number of cells in a single cell data set. This task can be used to reduce the size of large single cell datasets to small and manageable sizes for quick analysis. Another use case for this task is for a project with multi samples with each sample having different number of cells. Downsample Cells can be used to randomly select an equal number of cells for all the samples in the project. For the default setting, the sample with the minimum number of cells is used with the number of cells in that sample set as the number of cells to be selected in the other samples. However, this default setting can be changed to a preferred number by the user. If the number selected by the user is greater than the number of cells in one or more samples, those samples will not be downsampled and all the cells in those samples will be returned. If the number selected by the user is greater than the number of cells in all the samples, then none of the samples will be downsampled.

    To run a downsample task first click on a single cell count data node. Go to the Filtering section and select Downsample cells task (Figure 1).

    Figure 1. Single Click on Data node to be downsampled (red) and select the Downsample cells task (red rectangle) from the filtering menu

    Clicking on the Downsample cells task will lead to a dialogue menu with the number of cells to be downsampled set to the minimum number of cells in the project. In the figure below, the minimum number of cells in any of the samples was 2658 and this is used in the default settings. Click Finish to run the task (Figure 2).

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    our support pagearrow-up-right
    Figure 2. Downsample cell number to 2658 for each sample in project

    Filter barcodes

    • Configuring Filter barcodes

    • Output of Filter barcodes

    In droplet-based single cell isolation and library prep methods, each droplet is labeled by a unique nucleotide barcode. Because not all droplets will contain cells, it is important to filter out nucleotide barcodes that correspond to empty droplets prior to downstream analysis.

    In Partek Flow, you can filter barcodes interactively using a knee plot after UMI deduplication in the UMI deduplication task report or after quantification in the Cell barcode QA/QC task report. Alternatively, you can filter using preset options in the Filter barcodes task.

    To invoke Filter barcodes:

    • Click a Deduplicated reads data node

    • Click the Filtering section of the toolbox

    • Click Filter barcodes

    hashtag
    Configuring Filter barcodes

    You can choose to filter barcodes using three or four options, depending on whether you have already run a Filter barcodes task for your samples in the project (Figure 1).

    hashtag
    Automatic

    The automatic filter threshold is set for each sample individually. It picks the cutoff between cells and empty droplets by identifying where the UMI content per barcode drops precipitously when moving in descending order from the barcode with the highest number of UMIs.

    hashtag
    Number of cells

    Set the number of cells per sample to include. This is set for all samples; if set to 100, the top 100 barcodes by total UMI count for each sample will be retained.

    hashtag
    Percent of reads in cell

    Set the percent of reads in cells per sample to include. The number of barcodes included will be set to match the specified percent of reads in cells for each sample. Barcodes are included starting with the barcode with the highest number of total UMIs and proceeding in descending order of total UMIs per barcode until the specified percent of reads has been met or exceeded.

    hashtag
    Previous filter

    If you have already run a Filter barcodes task for your samples in the project, the Previous filter option will be available. This option lets you filter to the same cell barcodes that were included by the previous filter. This option is particularly useful for CITE-Seq data, where antibody barcodes and gene expression data must be processed separately, but you will want to analyze the same cell barcodes in downstream steps.

    Selecting Previous filter opens a table with information about the previous barcode filters in your project (Figure 2).

    To help you identify which previous filter you want to apply, the color of the task node on the Analyses tab, the number of cell barcodes retained (summed for all samples), and the time/date the previous filter task was submitted are included in the table. To view the number of cells and percentage of reads in cells for each sample in a previous filter, mouse over the button (Figure 3).

    Use the radio buttons in the first column to pick which filter you want to use.

    hashtag
    Output of Filter barcodes

    After configuring the task, click Finish to run.

    Filter barcodes produces a Filtered reads data node (Figure 4). Filter barcodes does not have a task report.

    hashtag
    Additional Assistance

    If you need additional assistance, please visit to submit a help ticket or find phone numbers for regional support.

    our support pagearrow-up-right
    Figure 1. Configuring Filter barcodes
    Figure 2. Previous filter options
    Figure 3. Checking details of a previous filter
    Figure 4. Filter barcodes produces a Filtered reads data node
    image2018-12-21 11_45_24