CSV format requirements

General CSV format requirements

The following are the general format requirements for a CSV file used to create multiple cases:

  1. The file must have a .csv extension.

  2. The file must contain a [Data] header.

  3. The row after [Data] header must include the field names identifying the data in each column. The column names are case-sensitive.

  4. The row after the column name header and each subsequent row represents a sample.

  5. Each column represents a data field.

  6. It is essential that there are no empty rows between the [Data] header and the last sample row.

  7. Number of cases per file can’t be greater than 50.

  8. On versions before 34.0, cells should not contain commas. Consider replacing the commas with semicolons.


CSV schema

1. Mandatory fields

Must be present in the sample table at all times.

  1. Case Type;

  2. Family Id;

  3. Phenotypes OR Phenotypes Id.

2. Conditionally mandatory fields

If these fields are left empty, it will result in the creation of an empty sample.

  1. BioSample Name;

  2. Files Names;

  3. Storage Provider Id;

This field is mandatory if Files Names is empty:

  1. Sample Type.

This field is required if the "auto" option is used for Files Names (only relevant for BSSH):

  1. Default Project.

3. Optional fields

The sample table may include these supported optional columns.

  1. Boost Genes;

  2. Clinical Notes;

  3. Date Of Birth;

  4. Due Date;

  5. Execute now;

  6. Gender;

  7. Gene List Id;

  8. Kit Id;

  9. Label Id;

  10. Opt In;

  11. Relation;

  12. Selected Preset;

  13. Visualization Files.

4. Custom fields

The sample table may contain custom columns to suit your specific needs and include any relevant information that is important for your workflow.

Each custom field must be assigned a unique name without spaces. Data from custom columns is saved per case under the Additional information section of Case Info.

Note: In cases with more than one sample, custom fields are only recognized and added to case information if their values appear within the same table row where the Relation field is equal to "proband".

Custom field examples:

Field (column) nameExpected inputField detailsExample

Institution

Free text

Custom

GenoMed Solutions

Sample_Received_Date

Free text

Custom

24-02-2022

Sample_Type

Free text

Custom

Amniotic Fluid


Batch case .csv file validation rules

Mandatory (highlighted in red), Conditionally mandatory (highlighted in orange), and Optional fields should be filled in according to the following rules.

Field (column) nameExpected inputField detailsExample

BioSample Name

Free text

Conditionally mandatory. An empty sample will be created if the field is left blank.

NA24385

Boost Genes

1. "TRUE" 2. "FALSE"

Optional. Indicates whether the Boost genes mode will be used. "TRUE" means that variants in the targeted genes will receive upgraded scores during prioritization by the AI Shortlist algorithm. Default value is "FALSE". Only considered for proband.

TRUE

Case Type

1. "Whole Genome" 2. "Exome" 3. "Custom Panel" 4. Custom case type

Mandatory. Only considered for proband.

Whole Genome

Clinical Notes

Free text

Optional

A 14-year-old boy with a visual acuity of 20/200 in both eyes in whom hearing loss was first noted at 5 years of age on routine screening; audiometry revealed sensorineural hearing loss.

Date Of Birth

Date "YYYY-MM-DD"

Optional

2013-01-22

Default Project

Free text

Conditionally mandatory. Must be filled in if the "auto" option is used for Files Names (only relevant for BSSH).

GIAB

Due Date

Date "YYYY-MM-DD"

Optional

2023-05-03

Execute now

1. "TRUE" 2. "FALSE"

Optional. Default value is "TRUE". Use "FALSE" if you don’t want to run the case upon uploading the file.e Only considered for proband.

FALSE

Family Id

Free text

Mandatory

RM8392

Files Names

1. Semicolon-separated list of paths to .fastq, .fastq.gz, .vcf, .vcf.gz, .bam, .cram files without spaces 2. "existing" 3. "auto"

Conditionally mandatory. An empty sample will be created if the field is left blank. The "existing" option automatically locates FASTQ files based on the BioSample Name. Note: If data files for an existing case were sourced from the customer’s external bucket and later removed, attempting to create a case from those files will result in an error. With the "auto" option, BSSH users can automatically locate FASTQ files based on the BioSample Name and Default Project provided. When using BSSH without the "auto" option, ensure that your file path is formatted correctly.

/GIAB_cases/1/NA24385.dragen.hard-filtered.gvcf.gz;/QA_cases/Other/NA24385.dragen.cnv.vcf.gz;/QA_cases/Other/NA24385.dragen.repeats.vcf;

Sex / Gender*

1. "F" 2. "M" 3. "U"

Optional. Default value is "U".

*The field is labeled as Sex in versions 33.0 and later, and as Gender in older versions.

M

Gene List Id

integer

Optional. Must be the id of a previously defined Gene List. Only considered for proband.

12345

Kit Id

integer

Optional. Must be the id of a previously defined Kit. Only considered for proband.

23456

Label Id

integer

Optional. Must be the id of a previously defined Case Label. Only considered for proband.

34567

Opt In

1. "TRUE" 2. "FALSE"

Optional. Indicates whether the case subject consented to the extended sharing of data with your network(s). Default value is "TRUE".

FALSE

Phenotypes

  1. Semicolon-separated list of HPO phenotype terms

  2. "Unaffected" is used for non-affected family members.

Mandatory for proband sample if Phenotypes Id is empty. List must be under 100. It is possible to include non-HPO terms if Phenotypes Id is empty.

Abnormal pupillary function;Orthotopic os odontoideum;

Phenotypes Id

Semicolon-separated list of HPO phenotype IDs

Mandatory for proband sample if Phenotypes is empty.

List must be under 100.

HP:0007686;HP:0025375;

Relation

1. "proband" 2. "mother" 3. "father" 4. "sibling"

Optional. Default value is "proband". Values "proband", "father", "mother" can be only used once per Family ID. One sample with Relation "proband" is required per Family ID.

Mother

Sample Type

1. "FASTQ" 2. "VCF"

Conditionally mandatory. Required if Files Names is empty. Only considered for proband.

FASTQ

Selected Preset

1. Free text 2. "Default"

Optional. Must be the name of a previously defined Preset. If set to default, the default Preset will be applied. If left empty, no Preset will be applied.

High quality candidates

Storage Provider Id

Integer

Conditionally mandatory. Required if Files Names is not empty. Must be from the configured storage provider ID list.

208

Visualization Files

Semicolon-separated list of paths to sequence alignment data files of extension .bam, .cram; 🆕34.0+: also .tn.bw, .baf.bw, .roh.bed

Optional

/giab_project/NA24385.bam

*Required BSSH file path format:

For BSSH, it is necessary to use the actual names (numbers):

/projects/3824821/appresults/2319318/files/119675608

instead of aliases

/projects/ABC_DEF_2022-12-22_DEv395/appresults/ABC-GM58342-def/files/ABC-GM58342-def.hard-filtered.vcf.gz

Last updated