Prepare Metadata Sheets
Last updated
Was this helpful?
Last updated
Was this helpful?
In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:
subject:
demographics such as age, sex, ancestry;
phenotypes and diseases;
biometrics such as body height, body mass index, etc.;
pathological classification, tumor stages, etc.;
family and patient medical history;
sample:
sample type such as FFPE,
tissue type,
sequencing technology: whole genome DNA-sequencing, RNAseq, single-cell RNAseq, among others.
You can use these attributes while to define the cases and/or controls that you want to include.
During , you will be asked to upload a metadata sheet as a tab-delimited (TSV) file. An example sheet is available for download on the Import files page in the ICA Cohorts UI.
A metadata sheet will need to contain at least these four columns per row:
Subject ID - identifier referring to individuals; use the column header "SubjectID".
Sample ID - identifier for a sample. Sample IDs need to match the corresponding column header in VCF/GVCFs; each subject can have multiple samples, these need to be specified in individual rows for the same SubjectID; use the column header "SampleID".
Biological sex - can be "Female (XX)", "Female"; "Male (XY)", "Male"; "X (Turner's)"; "XXY (Klinefelter)"; "XYY"; "XXXY" or "Not provided". Use the column header "DM_Sex" (demographics).
Sequencing technology - can be "Whole genome sequencing", "Whole exome sequencing", "Targeted sequencing panels", or "RNA-seq"; use the column header "TC" (technology).
A description of all attributes and data types currently supported by ICA Cohorts can be found here:
You can download an example of a metadata sheet, which contains some samples from The Cancer Genome Atlas () and their publicly available clincal attributes, here:
A list of concepts and diagnoses that cover all public data subjects to easily navigate the new concept code browser for diagnosis can be found here: