1 of 11

Cohorts

Introduction to Cohorts

ICA Cohorts is a cohort analysis tool integrated with Illumina Connected Analytics (ICA). ICA Cohorts combines subject- and sample-level metadata, such as phenotypes, diseases, demographics, and biometrics, with molecular data stored in ICA to perform tertiary analyses on selected subsets of individuals.

Overview Video

This video is an overview of Illumina Connnected Analytics. It walks through a Multi-Omics Cancer workflow that can be found here:

Features At-a-glance

Intuitive UI for selecting subjects and samples to analyze and compare: deep phenotypical and clinical metadata, molecular features including germline, somatic, gene expression.
Comprehensive, harmonized data model exposed to ICA Base and ICA Bench users for custom analyses.
Run analyses in ICA Base and ICA Bench and upload final results back into Cohorts for visualization.

Functionality

Walk-throughs

Public Data Sets

ICA Cohorts contains a variety of freely available data sets covering different disease areas and sequencing technologies. For a list of currently available data, .

Create a Cohort

ICA Cohorts lets you create a research cohort of subjects and associated samples based on the following criteria:

Project:
- Include subjects that are part of any ICA Project that you own or that is shared with you.

Import New Samples

ICA Cohorts can pull any molecular data available in an ICA Project, as well as additional sample- and subject-level metadata information such as demographics, biometrics, sequencing technology, phenotypes, and diseases.

To import a new data set, select Import Jobs from the left navigation tab underneath Cohorts, and click the Import Files button. The Import Files button is also available under the Data Sets left navigation item.

The Data Set menu item is used to view imported data sets and information. The Import Jobs menu item is used to check the status of data set imports.

Confirm that the project shown is the ICA Project that contains the molecular data you would like to add to ICA Cohorts.

Choose a data type among
- Germline variants
- Somatic mutations

Search Spinner behavior in input jobs table

Search a term and press ** Enter.
The search spinner will appear while the results are loading.

All VCF types, specifically from DRAGEN, can be ingested using the Germline variants selection. Cohorts will distinguish the variant types that it is ingesting. If Cohorts cannot determine the variant file type, it will default to ingest small variants.

Alternatively to VCFs, you can select Nirvana JSON files for DNA variants: small variants, structural variants, and copy number variation.

The maximum amount of files that can be part of a single manual ingestion batch is capped at 1000

Alternatively, users can choose a single folder and ICA Cohorts will identify all ingestible files within that folder and its sub-folders. In this scenario, cohorts will select molecular data files matching the samples listed in the metadata sheet which is the next step in the import process.

Users have the option to ingest either VCF files or Nirvana JSON files for any given batch, regardless of the chosen ingestion method.

The sample identifiers used in the VCF columns need to match the sample identifiers used in subject/sample metadata files; accordingly, if you are starting from JSON files containing variant- and gene-level annotations provided by ILMN Nirvana, the samples listed in the header need to match the metadata files.

Variant file formats

ICA Cohorts supports VCF files formatted according to VCF v4.2 and v4.3 specifications. VCF files require at least one of the following header rows to identify the genome build:

##reference=file://... --- needs to contain a reference to hg38/GRCh38 in the file path or name (numerical value is sufficient)
##contig=<ID=chr1,length=248956422> --- for hg38/GRCh38
##DRAGENCommandLine= ... --ht-reference

ICA Cohorts accepts VCFs aligned to hg38/GRCh38 and hg19/GRCh37. If your data uses hg19/GRCh37 coordinates, Cohorts will convert these to hg38/GRCh38 during the ingestion process [see Reference 1]. Harmonizing data to one genome build facilitates searches across different private, shared, and public projects when building and analyzing a cohort. If your data contains a mixture of samples mapped to hg38 and hg19, please ingest these in separate batches, as each import job into Cohorts is limited to one genome build.

Alternative to VCFs, ICA Cohorts accepts the JSON output of for hg38/GRCh38-aligned data for small germline variants and somatic mutations, copy number variations other structural variants.

RNAseq file format

ICA Cohorts can process gene- and transcript-level quantification files produced by the Illumina DRAGEN RNA pipeline. The file naming convention needs to match .quant.genes.sf for genes; and .quant.sf for transcript-level TPM (transcripts per million.)

Please also see the online documentation for the for more information on output file formats.

GWAS file format

ICA Cohorts currently support upload of SNV-level GWAS results produced by and saved as CSV files.

Metadata and File Types

Note: If annotating large sets of samples with molecular data, expect the annotation process to take over 20 minutes per whole genome batch of samples. You will receive two e-mail notifications: once your ingestion starts and once completed successfully or failed.

As an alternative to ICA Cohorts' metadata file format, you can provide files formatted according to the . Cohorts currently ingests data for these OMOP 5.4 tables, formatted as tab-delimited files:

PERSON (mandatory),
CONCEPT (mandatory if any of the following is provided),
CONDITION_OCCURRENCE (optional),

Additional files such as measurement and observation will be supported in a subsequent release of Cohorts.

Note that Cohorts requires that all such files do not deviate from the OMOP CDM 5.4 standard. Depending on your implementation, you may have to adjust file formatting to be OMOP CDM 5.4-compatible.

References

[1] VcfMapper: https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/main_vcfmapper.py

[2] crossMap: https://crossmap.sourceforge.net/

[3] liftOver: https://genome.ucsc.edu/cgi-bin/hgLiftOver

[4] Chain files:

Prepare Metadata Sheets

In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:

subject:
- demographics such as age, sex, ancestry;

Precomputed GWAS and PheWAS

The GWAS and PheWAS tabs in ICA Cohorts allow you to visualize precomputed analysis results for phenotypes/diseases and genes, respectively. Note that these do not reflect the subjects that are part of the cohort that you created.

ICA Cohorts currently hosts GWAS and PheWAS analysis results for approximately 150 quantitative phenotypes (such as "LDL direct" and "sitting height") and about 700 diseases.

Visualize Results from Precomputed Genome-Wide Association Studies (GWAS)

Navigate to the GWAS tab and start looking for phenotypes and diseases in the search box. Cohorts will suggest the best matches against any partial input ("cancer") you provide. After selecting a phenotype/disease, Cohorts will render a Manhattan plot, by default collapsed to gene level and organized by their respective position in each chromosome.

Circles in the Manhattan plot indicate binary traits, potential associations between genes and diseases. Triangles indicate quantitative phenotypes with regression Beta different from zero, and point up or down to depict positive or negative correlation, respectively.

Hovering over a circle/triangle will display the following information:

gene symbol
variant group (see below)
P-value, both raw and FDR-corrected
number of carriers of variants of the given type

For gene-level results, Cohorts distinguishes five different classes of variants: protein truncating; deleterious; missense; missense with a high ILMN PrimateAI score (indicating likely damaging variants); and synonymous variants. You can limit results to either of these five classes, or select All to display all results together.

Deleterious variants (del): the union of all protein-truncating variants (PTVs, defined below), pathogenic missense variants with a PrimateAI score greater than a gene-specific threshold, and variants with a SpliceAI score greater than 0.2.
Protein-truncating variants (ptv): variant consequences matching any of stop_gained, stop_lost, frameshift_variant, splice_donor_variant

To zoom in to a particular chromosome, click the chromosome name underneath the plot, or select the chromosome from the drop down box, which defaults to Whole genome.

Visualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

To browse PheWAS analysis results by gene, navigate to the PheWAS tab and enter a gene of interest into the search box. The resulting Manhattan plot will show phenotypes and diseases organized into a number of categories, such as "Diseases of the nervous system" and "Neoplasms". Click on the name of a category, shown underneath the plot, to display only those phenotypes/diseases, or select a category from the drop down, which defaults to All.

Cohort Analysis

From the Cohorts menu in the left hand navigation, select a cohort created in Create Cohort to begin a cohort analysis.

Query Details

The query details can be accessed by clicking the triangle next to Show Query Details. The query details displays the selections used to create a cohort. The selections can be edited by clicking the pencil icon in the top right.

Charts

Charts will be open by default. If not, click Show Charts.
Use the gear icon in the top-right to change viewable chart settings.
There are four charts available to view summary counts of attributes within a cohort as histogram plots.

Single Subject Timeline View:

Display time-stamped events and observations for a single subject on a timeline.The timeline view is visible to only those subjects which have time-series data.
Below attributes are displayed in timeline view: • Diagnosed and Self-Reported Diseases: • Start and end dates • Progression vs. remission • Medication and Other Treatments: • Prescribed and self-medicated • Start date, end date, and dosage at every time point
The timeline utilizes age (at diagnosis, at event, at measurement) as the x-axis and attribute name as the y-axis. If the birthdate is not recorded for a subject, the user can now switch to Date to visualize data.

Measurement Section: A summary of measurements (without values) is displayed under the section titled "Measurements and Laboratory Values Available." Users can click a link to access the Timeline View for detailed results.

Drug Section: The "Drug Name" section lists drug names without repeating the header "Drug Name" for each entry.

Subjects

By Default, the Subjects tab is displayed.
The Subjects tab with a list of all subjects matching your criteria is displayed below Charts with a link to each Subject by ID and other high-level information. By clicking a subject ID, you will be brought to the data collected at the Subject level.

To Exclude specific subjects from subsequent analysis, such as marker frequencies or gene-level aggregated views, you can uncheck the box at the beginning of each row in the subject list. You will then be prompted to save any exclusion(s).

You can Export the list of subjects either to your ICA Project's data folder or to your local disk as a TSV file for subsequent use. Any export will omit subjects that you excluded after you saved those changes. For more information, see at the bottom of this page.

Remove a Subject

Specific subjects can be removed from a Cohort.
Select the Subjects tab.
Subjects in the Cohort, by default are checked.

Structural variant aggregation: Marker Frequency analysis

For each individual cohort, display a table of all observed SVs that overlap with a given gene.

Marker Frequency

Click the Marker Frequency tab, then click the Gene Expression tab.
Down-regulated genes are displayed in blue and Up-regulated genes are displayed in red.
A frequency in the Cohort is displayed and the Matching number/Total is also displayed in the chart.

Genes

You are brought to the Gene tab under the Gene Summary sub-tab.
Select a Gene by typing the gene name into the Search Genes text box.
A Gene Summary

Correlation

For every correlation, subjects contained in each count can be viewed by selecting the count on the bubble or the count on the X-axis and Y-axis.

Clinical vs. Clinical Attribute Comparison – Bubble Plot

Click the Correlation Tab.
In X-axis category, select Clinical.
In X-axis Attribute

Molecular vs. Molecular Attribute Comparison – Bubble Plot

To see a breakdown of Somatic Mutations vs. RNA Expression levels perform the following steps:

Note this comparison is for a Cancer case.

Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute

Clinical vs. Molecular Attribute Comparison – Bubble Plot

Note this comparison is for a Cancer case.

Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute

Molecular Breakdown

Click the Molecular Breakdown Tab.
In Enter a clinical Attribute, and select a clinical attribute.
In Enter a gene, select a gene by typing a gene name.

Note: for each of the aforementioned bubble plots, you can view the list of subjects by following the link under each subject count associated with an individual bubble or axis label. This will take you to the list of subjects view, see above.

CNV

If there is Copy Number Variant data in the cohort:

Click the CNV tab.
A graph will show CNV a Sample Percentage on the Y-axis and Chromosomes on the X-axis.
Any value above Zero is a copy number gain, and any value below Zero is a copy number loss.

Subject Export for Analysis in ICA Bench

ICA allows for integrated analysis in a computation workspace. You can export your cohorts definitions and, in combination with molecular data in your ICA Project Data, perform, for example, a GWAS analysis.

Confirm the VCF data for your analysis is in ICA Project Data.
From within your ICA Project, Start a Bench Workspace -- See for more details.
Navigate back to ICA Cohorts.

Compare Cohorts

You can compare up to four previously created individual cohorts, to view differences in variants and mutations, RNA expression, copy number variation, and distribution of clinical attributes. Once comparisons are created, they are saved in the Comparisons left-navigation tab of the Cohorts module.

Create a comparison view

Select Cohorts from the left-navigation panel.
Select 2 to 4 cohorts already created. If you have not created any cohorts, See Create a Cohort documentation.
Click Compare Cohorts in the right-navigation panel.
Note you are now in the Comparisons left-navigation tab of the Cohorts module.
In the Charts Section, if the COHORTS item is not displayed, click the gear icon in the top right and add Cohorts as the first attribute and click Save.
The COHORTS item in the charts panel will provide a count of subjects in each cohort and act as a legend for color representation throughout comparison screens.
For each clinical attribute category, a bar chart is displayed. Use the gear icon to select attributes to display in the charts panel.

You can share a comparison with other team members in the same ICA Project. Please refer to the section on "Sharing a Cohort" on "Create a Cohort" for details on sharing, unsharing, deleting, and archiving, which are analogous for sharing comparisons.

Attribute Comparison

Select the Attributes tab
Attribute categories are listed and can be expanded using the down-arrows next to the category names. The categories available are based on cohorts selected. Categories and attributes are part of the ICA Cohorts metadata template that map to each Subject.
For example, use the drop-down arrow next to Vital status to view sub-categories and frequencies across selected cohorts.

Variants Comparison

Select the Genes tab
Search for a gene of interest using its HUGO/HGNC gene symbol
Variants and mutations will be displayed as one needle plot for each cohort that is part of the comparison (see in this online help for more details)

Survival Summary

Select the Survival Summary tab.
Attribute categories are listed and can be expanded using the down-arrows next to the category names.
Select the drop-down arrow for Therapeutic interventions.

Survival Comparison

Click Survival Comparison tab.
A Kaplan-Meier Curve is rendered based on each cohort.
P-Value Displayed at the top of Survival Comparison indicates whether there is statistically significant variance between survival probabilities over time of any pair of cohorts (CI=0.95).

When comparing two cohorts, the P-Value is shown above the two survival curves. For three or four cohorts, P-Values are shown as a pair-wise heatmap, comparing each cohort to every other cohort.

Marker Frequency Comparison

Select the Marker Frequency tab.
Select either Gene expression (default), Somatic mutation, or Copy number variation

Correlation Comparison

Select the Correlation tab.
Similar to the single-cohort view (Cohort Analysis | Correlation), choose two clinical attributes and/or genes to compare.
Depending on the available data types for the two selections (categorical and/or continuous), Cohorts will display a bubble plot, violin plot, or scatter plot.

Oncology Walk-through

This walk-through is intended to represent a typical workflow when building and studying a cohort of oncology cases.

Create a Cancer Cohort and View Subject Details

Click Create Cohort button.
Select the following studies to add to your cohort:
1. TCGA – BRCA – Breast Invasive Carcinoma
2. TCGA – Ovarian Serous Cystadenocarcinoma
Add a Cohort Name = TCGA Breast and Ovarian_1472
Click on Apply.
Expand Show query details to see the study makeup of your cohort.
Charts will be open by default. If not, click Show charts
Use the gear icon in the top-right to change viewable chart settings.
Tip: Disease Type, Histological Diagnosis, Technology, Overall Survival have interesting data about this cohorts
The Subject tab with all Subjects list is displayed below Charts with a link to each Subject by ID and other high-level information, like Data Types measured and reported. By clicking a subject ID, you will be brought to the data collected at the Subject level.
Search for subject TCGA-E2-A14Y and view the data about this Subject.
Click the TCGA-E2-A14Y Subject ID link to view clinical data for this Subject that was imported via the metadata.tsv file on ingest.
Note: the Subject is a 35 year old Female with vital status and other phenotypes that feed up into the Subject attribute selection criteria when creating or editing cohorts.
Click X to close the Subject details.
Click Hide charts to increase interactive landscape.

Data Analysis, Multi-Omic Biomarker Discovery, and Interpretation

Click the Marker Frequency tab, then click the Somatic Mutation tab.
Review the gene list and mutation frequencies.
Note that PIK3CA has a high rate of mutation in the Cohort (ranked 2nd with 33% mutation frequency in 326 of the 987 Subjects that have Somatic Mutation data in this cohort).

Rare Genetic Disorders Walk-through

Cohorts Walk-through: Rare Genetic Disorders

This walk-through is meant to represent a typical workflow when building and studying a cohort of rare genetic disorder cases.

Create a new Project to track your study:

Login to the ICA
Navigate to Projects
Create a new project using the New Project button.

Create and Review a Rare Disease Cohort

Navigate to the ICA Cohorts module by clicking Cohorts in the left navigation panel.
Click Create Cohort button.
Enter a name for your cohort, like Rare Disease + 1kGP at top, left of pencil icon.

Analyze Your Rare Disease Cohort Data

A recent GWAS publication identified 10 risk genes for intellectual disability (ID) and autism. Our task is to evaluate them in ICA Cohorts: TTN, PKHD1, ANKRD11, ARID1B, ASXL3, SCN2A, FHL1, KMT2A, DDX3X, SYNGAP1.
First let’s Hide charts for more visual space.
Click the Genes tab where you need to query a gene to see and interact with results.

Public Data Sets

ICA Cohorts comes front-loaded with a variety of publicly accessible data sets, covering multiple disease areas and also including healthy individuals.

Data set

Samples

Diseases/Phenotypes

Reference

Cohort Analysis

From the Cohorts menu in the left hand navigation, select a cohort created in Create Cohort to begin a cohort analysis.

Query Details

Charts

Charts will be open by default. If not, click Show Charts.
Use the gear icon in the top-right to change viewable chart settings.
There are four charts available to view summary counts of attributes within a cohort as histogram plots.

Single Subject Timeline View:

Display time-stamped events and observations for a single subject on a timeline.The timeline view is visible to only those subjects which have time-series data.
Below attributes are displayed in timeline view: • Diagnosed and Self-Reported Diseases: • Start and end dates • Progression vs. remission • Medication and Other Treatments: • Prescribed and self-medicated • Start date, end date, and dosage at every time point
The timeline utilizes age (at diagnosis, at event, at measurement) as the x-axis and attribute name as the y-axis. If the birthdate is not recorded for a subject, the user can now switch to Date to visualize data.

Drug Section: The "Drug Name" section lists drug names without repeating the header "Drug Name" for each entry.

Subjects

By Default, the Subjects tab is displayed.
The Subjects tab with a list of all subjects matching your criteria is displayed below Charts with a link to each Subject by ID and other high-level information. By clicking a subject ID, you will be brought to the data collected at the Subject level.

Remove a Subject

Specific subjects can be removed from a Cohort.
Select the Subjects tab.
Subjects in the Cohort, by default are checked.

Structural variant aggregation: Marker Frequency analysis

For each individual cohort, display a table of all observed SVs that overlap with a given gene.

Marker Frequency

Click the Marker Frequency tab, then click the Gene Expression tab.
Down-regulated genes are displayed in blue and Up-regulated genes are displayed in red.
A frequency in the Cohort is displayed and the Matching number/Total is also displayed in the chart.

Genes

You are brought to the Gene tab under the Gene Summary sub-tab.
Select a Gene by typing the gene name into the Search Genes text box.
A Gene Summary

Correlation

For every correlation, subjects contained in each count can be viewed by selecting the count on the bubble or the count on the X-axis and Y-axis.

Clinical vs. Clinical Attribute Comparison – Bubble Plot

Click the Correlation Tab.
In X-axis category, select Clinical.
In X-axis Attribute

Molecular vs. Molecular Attribute Comparison – Bubble Plot

To see a breakdown of Somatic Mutations vs. RNA Expression levels perform the following steps:

Note this comparison is for a Cancer case.

Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute

Clinical vs. Molecular Attribute Comparison – Bubble Plot

Note this comparison is for a Cancer case.

Click the Correlation Tab.
In X-axis category, select Somatic.
In X-axis Attribute

Molecular Breakdown

Click the Molecular Breakdown Tab.
In Enter a clinical Attribute, and select a clinical attribute.
In Enter a gene, select a gene by typing a gene name.

Note: for each of the aforementioned bubble plots, you can view the list of subjects by following the link under each subject count associated with an individual bubble or axis label. This will take you to the list of subjects view, see above.

CNV

If there is Copy Number Variant data in the cohort:

Click the CNV tab.
A graph will show CNV a Sample Percentage on the Y-axis and Chromosomes on the X-axis.
Any value above Zero is a copy number gain, and any value below Zero is a copy number loss.

Subject Export for Analysis in ICA Bench

Confirm the VCF data for your analysis is in ICA Project Data.
From within your ICA Project, Start a Bench Workspace -- See for more details.
Navigate back to ICA Cohorts.

Import New Samples

The Data Set menu item is used to view imported data sets and information. The Import Jobs menu item is used to check the status of data set imports.

Confirm that the project shown is the ICA Project that contains the molecular data you would like to add to ICA Cohorts.

Choose a data type among
- Germline variants
- Somatic mutations

Search Spinner behavior in input jobs table

Search a term and press ** Enter.
The search spinner will appear while the results are loading.

All VCF types, specifically from DRAGEN, can be ingested using the Germline variants selection. Cohorts will distinguish the variant types that it is ingesting. If Cohorts cannot determine the variant file type, it will default to ingest small variants.

Alternatively to VCFs, you can select Nirvana JSON files for DNA variants: small variants, structural variants, and copy number variation.

The maximum amount of files that can be part of a single manual ingestion batch is capped at 1000

Alternatively, users can choose a single folder and ICA Cohorts will identify all ingestible files within that folder and its sub-folders. In this scenario, cohorts will select molecular data files matching the samples listed in the metadata sheet which is the next step in the import process.

Users have the option to ingest either VCF files or Nirvana JSON files for any given batch, regardless of the chosen ingestion method.

The sample identifiers used in the VCF columns need to match the sample identifiers used in subject/sample metadata files; accordingly, if you are starting from JSON files containing variant- and gene-level annotations provided by ILMN Nirvana, the samples listed in the header need to match the metadata files.

Variant file formats

ICA Cohorts supports VCF files formatted according to VCF v4.2 and v4.3 specifications. VCF files require at least one of the following header rows to identify the genome build:

##reference=file://... --- needs to contain a reference to hg38/GRCh38 in the file path or name (numerical value is sufficient)
##contig=<ID=chr1,length=248956422> --- for hg38/GRCh38
##DRAGENCommandLine= ... --ht-reference

Alternative to VCFs, ICA Cohorts accepts the JSON output of for hg38/GRCh38-aligned data for small germline variants and somatic mutations, copy number variations other structural variants.

RNAseq file format

Please also see the online documentation for the for more information on output file formats.

GWAS file format

ICA Cohorts currently support upload of SNV-level GWAS results produced by and saved as CSV files.

Metadata and File Types

Note: If annotating large sets of samples with molecular data, expect the annotation process to take over 20 minutes per whole genome batch of samples. You will receive two e-mail notifications: once your ingestion starts and once completed successfully or failed.

As an alternative to ICA Cohorts' metadata file format, you can provide files formatted according to the . Cohorts currently ingests data for these OMOP 5.4 tables, formatted as tab-delimited files:

PERSON (mandatory),
CONCEPT (mandatory if any of the following is provided),
CONDITION_OCCURRENCE (optional),

Additional files such as measurement and observation will be supported in a subsequent release of Cohorts.

Note that Cohorts requires that all such files do not deviate from the OMOP CDM 5.4 standard. Depending on your implementation, you may have to adjust file formatting to be OMOP CDM 5.4-compatible.

References

[1] VcfMapper: https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/main_vcfmapper.py

[2] crossMap: https://crossmap.sourceforge.net/

[3] liftOver: https://genome.ucsc.edu/cgi-bin/hgLiftOver

[4] Chain files:

Cohorts Data in ICA Base

ICA Cohorts data can be viewed in an ICA Project Base instance as a shared database. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See Base for more information on enabling this feature in your ICA Project.

ICA Cohorts Base Tables

After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See Cohorts Import for instruction on importing data sets into Cohorts.

Post ingestion, data will be represented in Base.
Select BASE from the ICA left-navigation and click Query.
Under the New Query window, a list of tables is displayed. Expand the Shared Database for Project \<your project name\> .
Cohorts tables will be displayed.
To preview the table and fields click each view listed.
Clicking any of these views then selecting PREVIEW on the right-hand side will show you a preview of the data in the tables.

If your ingestion includes Somatic variants, there will be two molecular tables: ANNOTATED_SOMATIC_MUTATIONS and ANNOTATED_VARIANTS. All ingestions will include a PHENOTYPE table.

The PHENOTYPE table includes a harmonized set that is collected across all data ingestions and is not representative of all data ingested for the Subject or Sample. Sample information is also displayed in this table, if applicable. Sample information drives the annotation process if molecular data is included in the ingestion. That data is stored in the PHENOTYPE table.

Phenotype Data

Field Name

Type

Description

Sample Information

Field Name

Type

Description

Sample Attribute

This table is an entity-attribute value table of supplied sample data matching Cohorts accepted attributes.

Field Name

Type

Description

Study Information

Field Name

Type

Description

Subject

Field

Type

Description

Subject Attribute

This table is an entity-attribute value table of supplied subject data matching Cohorts accepted attributes.

Field

Type

Description

Disease

Field

Type

Description

Drug Exposure

Field

Type

Description

Measurement

Field

Type

Description

Procedure

Field

Type

Description

Annotated Variants

This table will be available for all projects with ingested molecular data

Annotated Somatic Mutations

This table will only be available for data sets with ingested Somatic molecular data.

Annotated Copy Number Variants

This table will only be available for data sets with ingested CNV molecular data.

Annotated Structural Variants

This table will only be available for data sets with ingested SV molecular data. Note that ICA Cohorts stores copy number variants in a separate table.

Raw RNAseq data tables for genes and transcripts

These tables will only be available for data sets with ingested RNAseq molecular data.

Table for gene quantification results:

The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.

Differential expression tables for genes and transcripts

These tables will only be available for data sets with ingested RNAseq molecular data.

Table for differential gene expression results:

The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.

Cohorts

hashtagIntroduction to Cohorts

hashtagOverview Video

hashtagFeatures At-a-glance

hashtagFunctionality

hashtagWalk-throughs

hashtagPublic Data Sets

Create a Cohort

Import New Samples

hashtagImport New Samples

hashtagVariant file formats

hashtagRNAseq file format

hashtagGWAS file format

hashtagMetadata and File Types

hashtagReferences

Prepare Metadata Sheets

Precomputed GWAS and PheWAS

hashtagVisualize Results from Precomputed Genome-Wide Association Studies (GWAS)

hashtagVisualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

Cohort Analysis

hashtagCohort Analysis

hashtagQuery Details

hashtagCharts

hashtagSingle Subject Timeline View:

hashtagSubjects

hashtagRemove a Subject

hashtagStructural variant aggregation: Marker Frequency analysis

hashtagMarker Frequency

hashtagGenes

hashtagCorrelation

hashtagClinical vs. Clinical Attribute Comparison – Bubble Plot

hashtagMolecular vs. Molecular Attribute Comparison – Bubble Plot

hashtagClinical vs. Molecular Attribute Comparison – Bubble Plot

hashtagMolecular Breakdown

hashtagCNV

hashtagSubject Export for Analysis in ICA Bench

Compare Cohorts

hashtagCreate a comparison view

hashtagAttribute Comparison

hashtagVariants Comparison

hashtagSurvival Summary

hashtagSurvival Comparison

hashtagMarker Frequency Comparison

hashtagCorrelation Comparison

Oncology Walk-through

hashtagCreate a Cancer Cohort and View Subject Details

hashtagData Analysis, Multi-Omic Biomarker Discovery, and Interpretation

Rare Genetic Disorders Walk-through

hashtagCohorts Walk-through: Rare Genetic Disorders

hashtagLogin and Create a new ICA Project

hashtagCreate and Review a Rare Disease Cohort

hashtagAnalyze Your Rare Disease Cohort Data

Public Data Sets

Oncology Walk-through

hashtagCreate a Cancer Cohort and View Subject Details

hashtagData Analysis, Multi-Omic Biomarker Discovery, and Interpretation

Rare Genetic Disorders Walk-through

hashtagCohorts Walk-through: Rare Genetic Disorders

hashtagLogin and Create a new ICA Project

hashtagCreate and Review a Rare Disease Cohort

hashtagAnalyze Your Rare Disease Cohort Data

Compare Cohorts

hashtagCreate a comparison view

hashtagAttribute Comparison

hashtagVariants Comparison

hashtagSurvival Summary

hashtagSurvival Comparison

hashtagMarker Frequency Comparison

hashtagCorrelation Comparison

Cohorts

hashtagIntroduction to Cohorts

hashtagOverview Video

hashtagFeatures At-a-glance

hashtagFunctionality

hashtagWalk-throughs

hashtagPublic Data Sets

Public Data Sets

Prepare Metadata Sheets

Precomputed GWAS and PheWAS

hashtagVisualize Results from Precomputed Genome-Wide Association Studies (GWAS)

Introduction to Cohorts

Overview Video

Features At-a-glance

Functionality

Walk-throughs

Public Data Sets

Import New Samples

Variant file formats

RNAseq file format

GWAS file format

Metadata and File Types

References

Visualize Results from Precomputed Genome-Wide Association Studies (GWAS)

Visualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

Cohort Analysis

Query Details

Charts

Single Subject Timeline View:

Subjects

Remove a Subject

Structural variant aggregation: Marker Frequency analysis

Marker Frequency

Genes

Correlation

Clinical vs. Clinical Attribute Comparison – Bubble Plot

Molecular vs. Molecular Attribute Comparison – Bubble Plot

Clinical vs. Molecular Attribute Comparison – Bubble Plot

Molecular Breakdown

CNV

Subject Export for Analysis in ICA Bench

Create a comparison view

Attribute Comparison

Variants Comparison

Survival Summary

Survival Comparison

Marker Frequency Comparison

Correlation Comparison

Create a Cancer Cohort and View Subject Details

Data Analysis, Multi-Omic Biomarker Discovery, and Interpretation

Cohorts Walk-through: Rare Genetic Disorders

Login and Create a new ICA Project

Create and Review a Rare Disease Cohort

Analyze Your Rare Disease Cohort Data

Create a Cancer Cohort and View Subject Details

Data Analysis, Multi-Omic Biomarker Discovery, and Interpretation

Cohorts Walk-through: Rare Genetic Disorders

Login and Create a new ICA Project

Create and Review a Rare Disease Cohort

Analyze Your Rare Disease Cohort Data

Create a comparison view

Attribute Comparison

Variants Comparison

Survival Summary

Survival Comparison

Marker Frequency Comparison

Correlation Comparison

Introduction to Cohorts

Overview Video

Features At-a-glance

Functionality

Walk-throughs

Public Data Sets

Visualize Results from Precomputed Genome-Wide Association Studies (GWAS)

Visualize Results from Precomputed Phenome-Wide Association Studies (PheWAS)

Cohort Analysis

Query Details

Charts

Single Subject Timeline View:

Subjects

Remove a Subject

Structural variant aggregation: Marker Frequency analysis

Marker Frequency

Genes

Correlation

Clinical vs. Clinical Attribute Comparison – Bubble Plot

Molecular vs. Molecular Attribute Comparison – Bubble Plot

Clinical vs. Molecular Attribute Comparison – Bubble Plot

Molecular Breakdown

CNV

Subject Export for Analysis in ICA Bench