# Prepare Metadata Sheets

In ICA Cohorts, metadata describe any subjects and samples imported into the system in terms of attributes, including:

* subject:
  * demographics such as age, sex, ancestry;
  * phenotypes and diseases;
  * biometrics such as body height, body mass index, etc.;
  * pathological classification, tumor stages, etc.;
  * family and patient medical history;
* sample:
  * sample type such as FFPE,
  * tissue type,
  * sequencing technology: whole genome DNA-sequencing, RNAseq, single-cell RNAseq, among others.

You can use these attributes while [creating a cohort](https://help.ica.illumina.com/project/p-cohorts/cohorts-create) to define the cases and/or controls that you want to include.

During [import](https://help.ica.illumina.com/project/p-cohorts/cohorts-import), you will be asked to upload a metadata sheet as a tab-delimited (TSV) file. An example sheet is available for download on the **Import files** page in the ICA Cohorts UI.

A metadata sheet will need to contain at least these four columns per row:

* **Subject ID** - identifier referring to individuals; use the column header "SubjectID".
* **Sample ID** - identifier for a sample. Sample IDs need to match the corresponding column header in VCF/GVCFs; each subject can have multiple samples, these need to be specified in individual rows for the same **SubjectID**; use the column header "SampleID".
* **Biological sex** - can be "Female (XX)", "Female"; "Male (XY)", "Male"; "X (Turner's)"; "XXY (Klinefelter)"; "XYY"; "XXXY" or "Not provided". Use the column header "DM\_Sex" (demographics).
* **Sequencing technology** - can be "Whole genome sequencing", "Whole exome sequencing", "Targeted sequencing panels", or "RNA-seq"; use the column header "TC" (technology).

A description of all attributes and data types currently supported by ICA Cohorts can be found here: [ICA\_Cohorts\_Supported\_Attributes.xlsx](https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/ICA_Cohorts_Supported_Attributes.xlsx)

You can download an example of a metadata sheet, which contains some samples from The Cancer Genome Atlas ([TCGA](https://www.cancer.gov/ccg/research/genome-sequencing/tcga)) and their publicly available clincal attributes, here: [ICA\_Cohorts\_Example\_Metadata.tsv](https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/ICA_Cohorts_Example_Metadata.tsv)

A list of concepts and diagnoses that cover all public data subjects to easily navigate the new concept code browser for diagnosis can be found here: [PublicData\_AllConditionsSummarized.xlsx](https://stratus-documentation-us-east-1-public.s3.amazonaws.com/downloads/cohorts/PublicData_AllConditionsSummarized.xlsx)
