Cohorts Data in ICA Base
ICA Cohorts data can be viewed in an ICA Project Base instance as a shared database. A shared database in ICA Base operates as a database view. To use this feature, enable Base for your project prior to starting any ICA Cohorts ingestions. See Base for more information on enabling this feature in your ICA Project.
ICA Cohorts Base Tables
After ingesting data into your project, select Phenotypic and Molecular data are available to view in Base. See Cohorts Import for instruction on importing data sets into Cohorts.
- Post ingestion, data will be represented in Base. 
- Select - BASEfrom the ICA left-navigation and click- Query.
- Under the New Query window, a list of tables is displayed. Expand the - Shared Database for Project \<your project name\>.
- Cohorts tables will be displayed. 
- To preview the table and fields click each view listed. 
- Clicking any of these views then selecting - PREVIEWon the right-hand side will show you a preview of the data in the tables.
Phenotype Data
SAMPLE_BARCODE
STRING
Sample Identifier
SUBJECTID
STRING
Identifer for Subject entity
STUDY
STRING
Study designation
AGE
NUMERIC
Age in years
SEX
STRING
Sex field to drive annotation
POPULATION
STRING
Population Designation for 1000 Genomes Project
SUPERPOPULATION
STRING
Superpopulation Designation from 1000 Genomes Project
RACE
STRING
Race according to NIH standard
CONDITION_ONTOLOGIES
VARIANT
Diagnosis Ontology Source
CONDITION_IDS
VARIANT
Diagnosis Concept Ids
CONDITIONS
VARIANT
Diagnosis Names
HARMONIZED_CONDITIONS
VARIANT
Diagnosis High-level concept to drive UI
LIBRARYTYPE
STRING
Seqencing technology
ANALYTE
STRING
Substance sequenced
TISSUE
STRING
Tissue source
TUMOR_OR_NORMAL
STRING
Tumor designation for somatic
GENOMEBUILD
STRING
Genome Build to drive annotations - hg38 only
SAMPLE_BARCODE_VCF
STRING
Sample ID from VCF
AFFECTED_STATUS
NUMERIC
Affected, Unaffected, or Unknown for Family Based Analysis
FAMILY_RELATIONSHIP
STRING
Relationship designation for Family Based Analysis
Sample Information
SAMPLE_BARCODE
STRING
Original sample barcode used in VCF column
SUBJECTID
STRING
Original identifier for the subject record
DATATYPE
ARRAY
The categorization of molecular data
TECHNOLOGY
ARRAY
The sequencing method
CREATEDATE
DATE
Date and time of record creation
LASTUPDATEDATE
DATE
Date and time of last update of record
Sample Attribute
This table is an entity-attribute value table of supplied sample data matching Cohorts accepted attributes.
SAMPLE_ BARCODE
STRING
Original sample barcode used in VCF column
SUBJECTID
STRING
Original identifier for the subject record
ATTRIBUTE_NAME
STRING
Cohorts meta-data driven field name
ATTRIBUTE_VALUE
VARIANT
List of values entered for the field
Study Information
NAME
STRING
Study name
CREATEDATE
DATE
Date and time of study creation
LASTUPDATEDATE
DATE
Data and time of record update
Subject
SUBJECTID
STRING
Original identifier for the subject record
AGE
FLOAT
Age entered on subject record if applicable
SEX
STRING
-
ETHNICITY
STRING
-
STUDY
STRING
Study subject belongs to
CREATEDATE
DATE
Date and time of record creation
LASTUPDATEDATE
DATE
Date and time of record update
Subject Attribute
This table is an entity-attribute value table of supplied subject data matching Cohorts accepted attributes.
SUBJECTID
STRING
Original identifier for the subject record
ATTRIBUTE_NAME
STRING
Cohorts meta-data driven field name
ATTRIBUTE_VALUE
VARIANT
List of values entered for the field
Disease
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for disease term
OCCURRENCES
STRING
List of occurrence related data
Drug Exposure
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for drug term
OCCURRENCES
STRING
List of occurrence related data of drug exposure
Measurement
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for measurement term
OCCURRENCES
STRING
List of occurrences and values related to lab or measurement data
Procedure
SUBJECTID
STRING
Original identifier for the subject record
TERM
STRING
Code for procedure term
OCCURRENCES
STRING
List of occurrences and values related procedure data
Annotated Variants
This table will be available for all projects with ingested molecular data
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Original sample barcode used in VCF column
STUDY
STRING
Study designation
GENOMEBUILD
STRING
Only hg38 is supported
CHROMOSOME
STRING
Chromosome without 'chr' prefix
CHROMOSOMEID
NUMERIC
Chromosome ID: 1..22, 23=X, 24=Y, 25=Mt
DBSNP
STRING
dbSNP Identifiers
VARIANT_KEY
STRING
Variant ID in the form "1:12345678:12345678:C"
NIRVANA_VID
STRING
Broad Institute VID: "1-12345678-A-C"
VARIANT_TYPE
STRING
Description of Variant Type (e.g. SNV, Deletion, Insertion)
VARIANT_CALL
NUMERIC
1=germline, 2=somatic
DENOVO
BOOLEAN
true / false
GENOTYPE
STRING
"G|T"
READ_DEPTH
NUMERIC
Sequencing read depth
ALLELE_COUNT
NUMERIC
Counts of each alternate allele for each site across all samples
ALLELE_DEPTH
STRING
Unfiltered count of reads that support a given allele for an individual sample
FILTERS
STRING
Filter field from VCF. If all filters pass, field is PASS
ZYGOSITY
NUMERIC
0 = hom ref, 1 = het ref/alt, 2 = hom alt, 4 = hemi alt
GENEMODEL
NUMERIC
1=Ensembl, 2=RefSeq
GENE_HGNC
STRING
HUGO/HGNC gene symbol
GENE_ID
STRING
Ensembl gene ID ("ENSG00001234")
GID
NUMERIC
NCBI Entrez Gene ID (RefSeq) or numerical part of Ensembl ENSG ID
TRANSCRIPT_ID
STRING
Ensembl ENST or RefSeq NM_
CANONICAL
STRING
Transcript designated 'canonical' by source
CONSEQUENCE
STRING
missense, stop gained, intronic, etc.
HGVSC
STRING
The HGVS coding sequence name
HGVSP
STRING
The HGVS protein sequence name
Annotated Somatic Mutations
This table will only be available for data sets with ingested Somatic molecular data.
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Original sample barcode, used in VCF column
SUBJECTID
STRING
Identifier for Subject entity
STUDY
STRING
Study designation
GENOMEBUILD
STRING
Only hg38 is supported
CHROMOSOME
STRING
Chromosome without 'chr' prefix
DBSNP
NUMERIC
dbSNP Identifiers
VARIANT_KEY
STRING
Variant ID in the form "1:12345678:12345678:C"
MUTATION_TYPE
NUMERIC
Rank of consequences by expected impact: 0 = Protein Truncating to 40 = Intergenic Variant
VARIANT_CALL
NUMERIC
1=germline, 2=somatic
GENOTYPE
STRING
"G|T"
REF_ALLELE
STRING
Reference allele
ALLELE1
STRING
First allele call in the tumor sample
ALLELE2
STRING
Second allele call in the tumor sample
GENEMODEL
NUMERIC
1=Ensembl, 2=RefSeq
GENE_HGNC
STRING
HUGO/HGNC gene symbol
GENE_ID
STRING
Ensembl gene ID ("ENSG00001234")
TRANSCRIPT_ID
STRING
Ensembl ENST or RefSeq NM_
CANONICAL
BOOLEAN
Transcript designated 'canonical' by source
CONSEQUENCE
STRING
missense, stop gained, intronic, etc.
HGVSP
STRING
HGVS nomenclature for AA change: p.Pro72Ala
Annotated Copy Number Variants
This table will only be available for data sets with ingested CNV molecular data.
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
GENOMEBUILD
STRING
Genome build, always 'hg38'
NIRVANA_VID
STRING
Variant ID of the form 'chr-pos-ref-alt'
CHRID
STRING
Chromosome without 'chr' prefix
CID
NUMERIC
Numerical representation of the chromosome, X=23, Y=24, Mt=25
GENE_ID
STRING
NCBI or Ensembl gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
START_POS
NUMERIC
First affected position on the chromosome
STOP_POS
NUMERIC
Last affected position on the chromosome
VARIANT_TYPE
NUMERIC
1 = copy number gain, -1 = copy number loss
COPY_NUMBER
NUMERIC
Observed copy number
COPY_NUMBER_CHANGE
NUMERIC
Fold-chang of copy number, assuming 2 for diploid and 1 for haploid as the baseline
SEGMENT_VALUE
NUMERIC
Average FC for the identified chromosomal segment
PROBE_COUNT
NUMERIC
Probes confirming the CNV (arrays only)
REFERENCE
NUMERIC
Baseline taken from normal samples (1) or averaged disease tissue (2)
GENE_HGNC
STRING
HUGO/HGNC gene symbol
Annotated Structural Variants
This table will only be available for data sets with ingested SV molecular data. Note that ICA Cohorts stores copy number variants in a separate table.
Field Name
Type
Description
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
GENOMEBUILD
STRING
Genome build, always 'hg38'
NIRVANA_VID
STRING
Variant ID of the form 'chr-pos-ref-alt'
CHRID
STRING
Chromosome without 'chr' prefix
CID
NUMERIC
Numerical representation of the chromosome, X=23, Y=24, Mt=25
BEGIN
NUMERIC
First affected position on the chromosome
END
NUMERIC
Last affected position on the chromosome
BAND
STRING
Chromosomal band
QUALIITY
NUMERIC
Quality from the original VCF
FILTERS
ARRAY
Filters from the original VCF
VARIANT_TYPE
STRING
Insertion, deletion, indel, tandem_duplication, translocation_breakend, inversion ("INV"), short tandem repeat ("STR2")
VARIANT_TYPE_ID
NUMERIC
21=insertion, 22=deletion, 23=indel, 24=tandem_duplication, 25=translocation_breakend, 26=inversion ("INV"), 27=short tandem repeat ("STR2")
CIPOS
ARRAY
Confidence interval around first position
CIEND
ARRAY
Confidence interval around last position
SVLENGTH
NUMERIC
Overall size of the structural variant
BONDCHR
STRING
For translocations, the other affected chromosome
BONDCID
NUMERIC
For translocations, the other affected chromosome as a numeric value, X=23, Y=24, Mt=25
BONDPOS
STRING
For translocations, positions on the other affected chromosome
BONDORDER
NUMERIC
3 or 5: Whether this fragment (the current variant/VID) "receives" the other chromosome's fragment on it's 3' end, or attaches to the 5' of the other chromosome fragment
GENOTYPE
STRING
Called genotype from the VCF
GENOTYPE_QUALITY
NUMERIC
Genotype call quality
READCOUNTSSPLIT
ARRAY
Read counts
READCOUNTSPAIRED
ARRAY
Read counts, paired end
REGULATORYREGIONID
STRING
Ensembl ID for the affected regulatory region
REGULATORYREGIONTYPE
STRING
Type of the regulatory region
CONSEQUENCE
ARRAY
Variant consequence according to SequenceOntology
TRANSCRIPTID
STRING
Ensembl of RefSeq transcript identifier
TRANSCRIPTBIOTYPE
STRING
Biotype of the transcript
INTRONS
STRING
Count of impacted introns out of the total number of introns, specified as "M/N"
GENEID
STRING
Ensembl or RefSeq gene identifier
GENEHGNC
STRING
HUGO/HGNC gene symbol
ISCANONICAL
BOOLEAN
Is the transcript ID the canonical one according to Ensembl?
PROTEINID
STRING
RefSeq or Ensembl protein ID
SOURCEID
NUMERICAL
Gene model: 1=Ensembl, 2=RefSeq
Raw RNAseq data tables for genes and transcripts
These tables will only be available for data sets with ingested RNAseq molecular data.
Table for gene quantification results:
Field Name
Type
Description
GENOMEBUILD
STRING
Genome build, always 'hg38'
STUDY_NAME
STRING
Study designation
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
LABEL
STRING
Group label specified during import: Case or Control, Tumor or Normal, etc.
GENE_ID
STRING
Ensembl or RefSeq gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
GENE_HGNC
STRING
HUGO/HGNC gene symbol
SOURCE
STRING
Gene model: 1=Ensembl, 2=RefSeq
TPM
NUMERICAL
Transcripts per million
LENGTH
NUMERICAL
The length of the gene in base pairs.
EFFECTIVE_LENGTH
NUMERICAL
The length as accessible to RNA-seq, accounting for insert-size and edge effects.
NUM_READS
NUMERICAL
The estimated number of reads from the gene. The values are not normalized.
The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.
Differential expression tables for genes and transcripts
These tables will only be available for data sets with ingested RNAseq molecular data.
Table for differential gene expression results:
Field Name
Type
Description
GENOMEBUILD
STRING
Genome build, always 'hg38'
STUDY_NAME
STRING
Study designation
SAMPLE_BARCODE
STRING
Sample barcode used in the original VCF
CASE_LABEL
STRING
Study designation
GENE_ID
STRING
Ensembl or RefSeq gene identifier
GID
NUMERIC
Numerical part of the gene ID; for Ensembl, we remove the 'ENSG000..' prefix
GENE_HGNC
STRING
HUGO/HGNC gene symbol
SOURCE
STRING
Gene model: 1=Ensembl, 2=RefSeq
BASEMEAN
NUMERICAL
FC
NUMERICAL
Fold-change
LFC
NUMERICAL
Log of the fold-change
LFCSE
NUMERICAL
Standard error for log fold-change
PVALUE
NUMERICAL
P-value
CONTROL_SAMPLECOUNT
NUMERICAL
Number of samples used as control
CONTROL_LABEL
NUMERICAL
Label used for controls
The corresponding transcript table uses TRANSCRIPT_ID instead of GENE_ID and GENE_HGNC.
Last updated
Was this helpful?
