# Cohorts Data in ICA Base

ICA Cohorts data can be viewed in an ICA Project Base instance as a *shared database*. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See [Base](https://help.ica.illumina.com/project/p-base) for more information on enabling this feature in your ICA Project.

## ICA Cohorts Base Tables

After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See [Cohorts Import](https://help.ica.illumina.com/project/p-cohorts/cohorts-import) for instruction on importing data sets into Cohorts.

1. Post ingestion, data will be represented in Base.
2. Select `BASE` from the ICA left-navigation and click `Query`.
3. Under the New Query window, a list of tables is displayed. Expand the `Shared Database for Project \<your project name\>` .
4. Cohorts tables will be displayed.
5. To preview the table and fields click each *view* listed.
6. Clicking any of these views then selecting `PREVIEW` on the right-hand side will show you a preview of the data in the tables.

{% hint style="info" %}
If your ingestion includes Somatic variants, there will be two molecular tables: *ANNOTATED\_SOMATIC\_MUTATIONS* and *ANNOTATED\_VARIANTS*. All ingestions will include a *PHENOTYPE* table.
{% endhint %}

{% hint style="info" %}
The *PHENOTYPE* table includes a harmonized set that is collected across all data ingestions and is not representative of all data ingested for the Subject or Sample. Sample information is also displayed in this table, if applicable. Sample information drives the annotation process if molecular data is included in the ingestion. That data is stored in the *PHENOTYPE* table.
{% endhint %}

## Phenotype Data

| Field Name             | Type    | Description                                                |
| ---------------------- | ------- | ---------------------------------------------------------- |
| SAMPLE\_BARCODE        | STRING  | Sample Identifier                                          |
| SUBJECTID              | STRING  | Identifer for Subject entity                               |
| STUDY                  | STRING  | Study designation                                          |
| AGE                    | NUMERIC | Age in years                                               |
| SEX                    | STRING  | Sex field to drive annotation                              |
| POPULATION             | STRING  | Population Designation for 1000 Genomes Project            |
| SUPERPOPULATION        | STRING  | Superpopulation Designation from 1000 Genomes Project      |
| RACE                   | STRING  | Race according to NIH standard                             |
| CONDITION\_ONTOLOGIES  | VARIANT | Diagnosis Ontology Source                                  |
| CONDITION\_IDS         | VARIANT | Diagnosis Concept Ids                                      |
| CONDITIONS             | VARIANT | Diagnosis Names                                            |
| HARMONIZED\_CONDITIONS | VARIANT | Diagnosis High-level concept to drive UI                   |
| LIBRARYTYPE            | STRING  | Seqencing technology                                       |
| ANALYTE                | STRING  | Substance sequenced                                        |
| TISSUE                 | STRING  | Tissue source                                              |
| TUMOR\_OR\_NORMAL      | STRING  | Tumor designation for somatic                              |
| GENOMEBUILD            | STRING  | Genome Build to drive annotations - hg38 only              |
| SAMPLE\_BARCODE\_VCF   | STRING  | Sample ID from VCF                                         |
| AFFECTED\_STATUS       | NUMERIC | Affected, Unaffected, or Unknown for Family Based Analysis |
| FAMILY\_RELATIONSHIP   | STRING  | Relationship designation for Family Based Analysis         |

## Sample Information

| Field Name      | Type   | Description                                |
| --------------- | ------ | ------------------------------------------ |
| SAMPLE\_BARCODE | STRING | Original sample barcode used in VCF column |
| SUBJECTID       | STRING | Original identifier for the subject record |
| DATATYPE        | ARRAY  | The categorization of molecular data       |
| TECHNOLOGY      | ARRAY  | The sequencing method                      |
| CREATEDATE      | DATE   | Date and time of record creation           |
| LASTUPDATEDATE  | DATE   | Date and time of last update of record     |

## Sample Attribute

This table is an entity-attribute value table of supplied sample data matching Cohorts accepted attributes.

| Field Name       | Type    | Description                                |
| ---------------- | ------- | ------------------------------------------ |
| SAMPLE\_ BARCODE | STRING  | Original sample barcode used in VCF column |
| SUBJECTID        | STRING  | Original identifier for the subject record |
| ATTRIBUTE\_NAME  | STRING  | Cohorts meta-data driven field name        |
| ATTRIBUTE\_VALUE | VARIANT | List of values entered for the field       |

## Study Information

| Field Name     | Type   | Description                     |
| -------------- | ------ | ------------------------------- |
| NAME           | STRING | Study name                      |
| CREATEDATE     | DATE   | Date and time of study creation |
| LASTUPDATEDATE | DATE   | Data and time of record update  |

## Subject

| Field          | Type   | Description                                 |
| -------------- | ------ | ------------------------------------------- |
| SUBJECTID      | STRING | Original identifier for the subject record  |
| AGE            | FLOAT  | Age entered on subject record if applicable |
| SEX            | STRING | -                                           |
| ETHNICITY      | STRING | -                                           |
| STUDY          | STRING | Study subject belongs to                    |
| CREATEDATE     | DATE   | Date and time of record creation            |
| LASTUPDATEDATE | DATE   | Date and time of record update              |

## Subject Attribute

This table is an entity-attribute value table of supplied subject data matching Cohorts accepted attributes.

| Field            | Type    | Description                                |
| ---------------- | ------- | ------------------------------------------ |
| SUBJECTID        | STRING  | Original identifier for the subject record |
| ATTRIBUTE\_NAME  | STRING  | Cohorts meta-data driven field name        |
| ATTRIBUTE\_VALUE | VARIANT | List of values entered for the field       |

## Disease

<table><thead><tr><th>Field</th><th width="208">Type</th><th>Description</th></tr></thead><tbody><tr><td>SUBJECTID</td><td>STRING</td><td>Original identifier for the subject record</td></tr><tr><td>TERM</td><td>STRING</td><td>Code for disease term</td></tr><tr><td>OCCURRENCES</td><td>STRING</td><td>List of occurrence related data</td></tr></tbody></table>

## Drug Exposure

| Field       | Type   | Description                                      |
| ----------- | ------ | ------------------------------------------------ |
| SUBJECTID   | STRING | Original identifier for the subject record       |
| TERM        | STRING | Code for drug term                               |
| OCCURRENCES | STRING | List of occurrence related data of drug exposure |

## Measurement

| Field       | Type   | Description                                                       |
| ----------- | ------ | ----------------------------------------------------------------- |
| SUBJECTID   | STRING | Original identifier for the subject record                        |
| TERM        | STRING | Code for measurement term                                         |
| OCCURRENCES | STRING | List of occurrences and values related to lab or measurement data |

## Procedure

| Field       | Type   | Description                                           |
| ----------- | ------ | ----------------------------------------------------- |
| SUBJECTID   | STRING | Original identifier for the subject record            |
| TERM        | STRING | Code for procedure term                               |
| OCCURRENCES | STRING | List of occurrences and values related procedure data |

## Annotated Variants

This table will be available for all projects with ingested molecular data

| **Field Name**  | **Type** | **Description**                                                                |
| --------------- | -------- | ------------------------------------------------------------------------------ |
| SAMPLE\_BARCODE | STRING   | Original sample barcode used in VCF column                                     |
| STUDY           | STRING   | Study designation                                                              |
| GENOMEBUILD     | STRING   | Only hg38 is supported                                                         |
| CHROMOSOME      | STRING   | Chromosome without 'chr' prefix                                                |
| CHROMOSOMEID    | NUMERIC  | Chromosome ID: 1..22, 23=X, 24=Y, 25=Mt                                        |
| DBSNP           | STRING   | dbSNP Identifiers                                                              |
| VARIANT\_KEY    | STRING   | Variant ID in the form "1:12345678:12345678:C"                                 |
| NIRVANA\_VID    | STRING   | Broad Institute VID: "1-12345678-A-C"                                          |
| VARIANT\_TYPE   | STRING   | Description of Variant Type (e.g. SNV, Deletion, Insertion)                    |
| VARIANT\_CALL   | NUMERIC  | 1=germline, 2=somatic                                                          |
| DENOVO          | BOOLEAN  | true / false                                                                   |
| GENOTYPE        | STRING   | "G\|T"                                                                         |
| READ\_DEPTH     | NUMERIC  | Sequencing read depth                                                          |
| ALLELE\_COUNT   | NUMERIC  | Counts of each alternate allele for each site across all samples               |
| ALLELE\_DEPTH   | STRING   | Unfiltered count of reads that support a given allele for an individual sample |
| FILTERS         | STRING   | Filter field from VCF. If all filters pass, field is PASS                      |
| ZYGOSITY        | NUMERIC  | 0 = hom ref, 1 = het ref/alt, 2 = hom alt, 4 = hemi alt                        |
| GENEMODEL       | NUMERIC  | 1=Ensembl, 2=RefSeq                                                            |
| GENE\_HGNC      | STRING   | HUGO/HGNC gene symbol                                                          |
| GENE\_ID        | STRING   | Ensembl gene ID ("ENSG00001234")                                               |
| GID             | NUMERIC  | NCBI Entrez Gene ID (RefSeq) or numerical part of Ensembl ENSG ID              |
| TRANSCRIPT\_ID  | STRING   | Ensembl ENST or RefSeq NM\_                                                    |
| CANONICAL       | STRING   | Transcript designated 'canonical' by source                                    |
| CONSEQUENCE     | STRING   | missense, stop gained, intronic, etc.                                          |
| HGVSC           | STRING   | The HGVS coding sequence name                                                  |
| HGVSP           | STRING   | The HGVS protein sequence name                                                 |

## Annotated Somatic Mutations

This table will only be available for data sets with ingested *Somatic* molecular data.

| **Field Name**  | **Type** | **Description**                                                                            |
| --------------- | -------- | ------------------------------------------------------------------------------------------ |
| SAMPLE\_BARCODE | STRING   | Original sample barcode, used in VCF column                                                |
| SUBJECTID       | STRING   | Identifier for Subject entity                                                              |
| STUDY           | STRING   | Study designation                                                                          |
| GENOMEBUILD     | STRING   | Only hg38 is supported                                                                     |
| CHROMOSOME      | STRING   | Chromosome without 'chr' prefix                                                            |
| DBSNP           | NUMERIC  | dbSNP Identifiers                                                                          |
| VARIANT\_KEY    | STRING   | Variant ID in the form "1:12345678:12345678:C"                                             |
| MUTATION\_TYPE  | NUMERIC  | Rank of consequences by expected impact: 0 = Protein Truncating to 40 = Intergenic Variant |
| VARIANT\_CALL   | NUMERIC  | 1=germline, 2=somatic                                                                      |
| GENOTYPE        | STRING   | "G\|T"                                                                                     |
| REF\_ALLELE     | STRING   | Reference allele                                                                           |
| ALLELE1         | STRING   | First allele call in the tumor sample                                                      |
| ALLELE2         | STRING   | Second allele call in the tumor sample                                                     |
| GENEMODEL       | NUMERIC  | 1=Ensembl, 2=RefSeq                                                                        |
| GENE\_HGNC      | STRING   | HUGO/HGNC gene symbol                                                                      |
| GENE\_ID        | STRING   | Ensembl gene ID ("ENSG00001234")                                                           |
| TRANSCRIPT\_ID  | STRING   | Ensembl ENST or RefSeq NM\_                                                                |
| CANONICAL       | BOOLEAN  | Transcript designated 'canonical' by source                                                |
| CONSEQUENCE     | STRING   | missense, stop gained, intronic, etc.                                                      |
| HGVSP           | STRING   | HGVS nomenclature for AA change: p.Pro72Ala                                                |

## Annotated Copy Number Variants

This table will only be available for data sets with ingested *CNV* molecular data.

| **Field Name**       | **Type** | **Description**                                                                     |
| -------------------- | -------- | ----------------------------------------------------------------------------------- |
| SAMPLE\_BARCODE      | STRING   | Sample barcode used in the original VCF                                             |
| GENOMEBUILD          | STRING   | Genome build, always 'hg38'                                                         |
| NIRVANA\_VID         | STRING   | Variant ID of the form 'chr-pos-ref-alt'                                            |
| CHRID                | STRING   | Chromosome without 'chr' prefix                                                     |
| CID                  | NUMERIC  | Numerical representation of the chromosome, X=23, Y=24, Mt=25                       |
| GENE\_ID             | STRING   | NCBI or Ensembl gene identifier                                                     |
| GID                  | NUMERIC  | Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix        |
| START\_POS           | NUMERIC  | First affected position on the chromosome                                           |
| STOP\_POS            | NUMERIC  | Last affected position on the chromosome                                            |
| VARIANT\_TYPE        | NUMERIC  | 1 = copy number gain, -1 = copy number loss                                         |
| COPY\_NUMBER         | NUMERIC  | Observed copy number                                                                |
| COPY\_NUMBER\_CHANGE | NUMERIC  | Fold-chang of copy number, assuming 2 for diploid and 1 for haploid as the baseline |
| SEGMENT\_VALUE       | NUMERIC  | Average FC for the identified chromosomal segment                                   |
| PROBE\_COUNT         | NUMERIC  | Probes confirming the CNV (arrays only)                                             |
| REFERENCE            | NUMERIC  | Baseline taken from normal samples (1) or averaged disease tissue (2)               |
| GENE\_HGNC           | STRING   | HUGO/HGNC gene symbol                                                               |

## Annotated Structural Variants

This table will only be available for data sets with ingested *SV* molecular data. Note that ICA Cohorts stores copy number variants in a separate table.

| **Field Name**       | **Type**  | **Description**                                                                                                                                                           |
| -------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| SAMPLE\_BARCODE      | STRING    | Sample barcode used in the original VCF                                                                                                                                   |
| GENOMEBUILD          | STRING    | Genome build, always 'hg38'                                                                                                                                               |
| NIRVANA\_VID         | STRING    | Variant ID of the form 'chr-pos-ref-alt'                                                                                                                                  |
| CHRID                | STRING    | Chromosome without 'chr' prefix                                                                                                                                           |
| CID                  | NUMERIC   | Numerical representation of the chromosome, X=23, Y=24, Mt=25                                                                                                             |
| BEGIN                | NUMERIC   | First affected position on the chromosome                                                                                                                                 |
| END                  | NUMERIC   | Last affected position on the chromosome                                                                                                                                  |
| BAND                 | STRING    | Chromosomal band                                                                                                                                                          |
| QUALIITY             | NUMERIC   | Quality from the original VCF                                                                                                                                             |
| FILTERS              | ARRAY     | Filters from the original VCF                                                                                                                                             |
| VARIANT\_TYPE        | STRING    | Insertion, deletion, indel, tandem\_duplication, translocation\_breakend, inversion ("INV"), short tandem repeat ("STR2")                                                 |
| VARIANT\_TYPE\_ID    | NUMERIC   | 21=insertion, 22=deletion, 23=indel, 24=tandem\_duplication, 25=translocation\_breakend, 26=inversion ("INV"), 27=short tandem repeat ("STR2")                            |
| CIPOS                | ARRAY     | Confidence interval around first position                                                                                                                                 |
| CIEND                | ARRAY     | Confidence interval around last position                                                                                                                                  |
| SVLENGTH             | NUMERIC   | Overall size of the structural variant                                                                                                                                    |
| BONDCHR              | STRING    | For translocations, the other affected chromosome                                                                                                                         |
| BONDCID              | NUMERIC   | For translocations, the other affected chromosome as a numeric value, X=23, Y=24, Mt=25                                                                                   |
| BONDPOS              | STRING    | For translocations, positions on the other affected chromosome                                                                                                            |
| BONDORDER            | NUMERIC   | 3 or 5: Whether this fragment (the current variant/VID) "receives" the other chromosome's fragment on it's 3' end, or attaches to the 5' of the other chromosome fragment |
| GENOTYPE             | STRING    | Called genotype from the VCF                                                                                                                                              |
| GENOTYPE\_QUALITY    | NUMERIC   | Genotype call quality                                                                                                                                                     |
| READCOUNTSSPLIT      | ARRAY     | Read counts                                                                                                                                                               |
| READCOUNTSPAIRED     | ARRAY     | Read counts, paired end                                                                                                                                                   |
| REGULATORYREGIONID   | STRING    | Ensembl ID for the affected regulatory region                                                                                                                             |
| REGULATORYREGIONTYPE | STRING    | Type of the regulatory region                                                                                                                                             |
| CONSEQUENCE          | ARRAY     | Variant consequence according to SequenceOntology                                                                                                                         |
| TRANSCRIPTID         | STRING    | Ensembl of RefSeq transcript identifier                                                                                                                                   |
| TRANSCRIPTBIOTYPE    | STRING    | Biotype of the transcript                                                                                                                                                 |
| INTRONS              | STRING    | Count of impacted introns out of the total number of introns, specified as "M/N"                                                                                          |
| GENEID               | STRING    | Ensembl or RefSeq gene identifier                                                                                                                                         |
| GENEHGNC             | STRING    | HUGO/HGNC gene symbol                                                                                                                                                     |
| ISCANONICAL          | BOOLEAN   | Is the transcript ID the canonical one according to Ensembl?                                                                                                              |
| PROTEINID            | STRING    | RefSeq or Ensembl protein ID                                                                                                                                              |
| SOURCEID             | NUMERICAL | Gene model: 1=Ensembl, 2=RefSeq                                                                                                                                           |

## Raw RNAseq data tables for genes and transcripts

These tables will only be available for data sets with ingested *RNAseq* molecular data.

Table for gene quantification results:

| **Field Name**    | **Type**  | **Description**                                                                   |
| ----------------- | --------- | --------------------------------------------------------------------------------- |
| GENOMEBUILD       | STRING    | Genome build, always 'hg38'                                                       |
| STUDY\_NAME       | STRING    | Study designation                                                                 |
| SAMPLE\_BARCODE   | STRING    | Sample barcode used in the original VCF                                           |
| LABEL             | STRING    | Group label specified during import: Case or Control, Tumor or Normal, etc.       |
| GENE\_ID          | STRING    | Ensembl or RefSeq gene identifier                                                 |
| GID               | NUMERIC   | Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix      |
| GENE\_HGNC        | STRING    | HUGO/HGNC gene symbol                                                             |
| SOURCE            | STRING    | Gene model: 1=Ensembl, 2=RefSeq                                                   |
| TPM               | NUMERICAL | Transcripts per million                                                           |
| LENGTH            | NUMERICAL | The length of the gene in base pairs.                                             |
| EFFECTIVE\_LENGTH | NUMERICAL | The length as accessible to RNA-seq, accounting for insert-size and edge effects. |
| NUM\_READS        | NUMERICAL | The estimated number of reads from the gene. The values are not normalized.       |

The corresponding transcript table uses TRANSCRIPT\_ID instead of GENE\_ID and GENE\_HGNC.

## Differential expression tables for genes and transcripts

These tables will only be available for data sets with ingested *RNAseq* molecular data.

Table for differential gene expression results:

| **Field Name**       | **Type**  | **Description**                                                              |
| -------------------- | --------- | ---------------------------------------------------------------------------- |
| GENOMEBUILD          | STRING    | Genome build, always 'hg38'                                                  |
| STUDY\_NAME          | STRING    | Study designation                                                            |
| SAMPLE\_BARCODE      | STRING    | Sample barcode used in the original VCF                                      |
| CASE\_LABEL          | STRING    | Study designation                                                            |
| GENE\_ID             | STRING    | Ensembl or RefSeq gene identifier                                            |
| GID                  | NUMERIC   | Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix |
| GENE\_HGNC           | STRING    | HUGO/HGNC gene symbol                                                        |
| SOURCE               | STRING    | Gene model: 1=Ensembl, 2=RefSeq                                              |
| BASEMEAN             | NUMERICAL |                                                                              |
| FC                   | NUMERICAL | Fold-change                                                                  |
| LFC                  | NUMERICAL | Log of the fold-change                                                       |
| LFCSE                | NUMERICAL | Standard error for log fold-change                                           |
| PVALUE               | NUMERICAL | P-value                                                                      |
| CONTROL\_SAMPLECOUNT | NUMERICAL | Number of samples used as control                                            |
| CONTROL\_LABEL       | NUMERICAL | Label used for controls                                                      |

The corresponding transcript table uses TRANSCRIPT\_ID instead of GENE\_ID and GENE\_HGNC.
