CSV format requirements
General CSV format requirements
The following are the general format requirements for a CSV file used to create multiple cases:
The file must have a .csv extension.
The file must contain a [Data] header.
The row after [Data] header must include the field names identifying the data in each column. The column names are case-sensitive.
The row after the column name header and each subsequent row represents a sample.
Each column represents a data field.
It is essential that there are no empty rows between the [Data] header and the last sample row.
Number of cases per file can’t be greater than 50.
CSV schema
1. Mandatory fields
Must be present in the sample table at all times.
Case Type;
Family Id;
Phenotypes OR Phenotypes Id.
2. Conditionally mandatory fields
If these fields are left empty, it will result in the creation of an empty sample.
BioSample Name;
Files Names;
Storage Provider Id;
This field is mandatory if Files Names is empty:
Sample Type.
This field is required if the "auto" option is used for Files Names (only relevant for BSSH):
Default Project.
3. Optional fields
The sample table may include these supported optional columns.
Boost Genes
Clinical Notes
Date Of Birth
Due Date
Execute now
Gender. See an important note
Gene List Id
Kit Id
Intersect Bed Id (38.0+)
Label Id
Opt In
Relation
Selected Preset
Visualization Files
4. Custom fields
The sample table may contain custom columns to suit your specific needs and include any relevant information that is important for your workflow.
Each custom field must be assigned a unique name without spaces. Data from custom columns is saved per case under the Additional information section of Case Info.
Custom field examples:
Institution
Free text
Custom
GenoMed Solutions
Sample_Received_Date
Free text
Custom
24-02-2022
Sample_Type
Free text
Custom
Amniotic Fluid
Batch case .csv file validation rules
Mandatory (highlighted in red), Conditionally mandatory (highlighted in orange), and Optional fields should be filled in according to the following rules.
Handling cases with unknown sex
When a sample is user-assigned "Unknown" sex, the system assumes "Female". This affects CNV interpretation on sex chromosomes in case the genetic sex is actually male:
Chromosome X: CN = 2 is considered reference (REF) for a female genome, so CNVs with two copies are hidden by default. This may cause chromosome X duplications to be missed.
Chromosome Y: CN = 0 is considered reference (REF) for a female genome, so CNVs with zero copies are hidden by default. This may cause chromosome Y deletions to be missed.
To include these variants in the analysis, enable the Include Reference Homozygosity and No Coverage Calls toggle in Workbench & Pipeline Settings.
Required BSSH file path format:
For BSSH, it is necessary to use the actual names (numbers):
/projects/3824821/appresults/2319318/files/119675608instead of aliases
/projects/ABC_DEF_2022-12-22_DEv395/appresults/ABC-GM58342-def/files/ABC-GM58342-def.hard-filtered.vcf.gzHuman-readable path for BSSH files in batch CSV
In version 37, we introduced an enhancement to the batch upload process that allows you to provide a human-readable path in their batch CSV for BSSH files.
Validations
When a batch CSV includes a human-readable path, the system performs the following validations for paths in BSSH storage:
Single File in the Path:
If the provided path contains exactly one file or dataset, the batch upload proceeds successfully.
Two Files in the Path:
If the path contains two files with the same name (for example, two pairs of fastqs in a dataset) , the system will:
Select the dataset marked as QCPassed.
Fail the batch upload if both datasets are marked as QCPassed, as this indicates conflicting data.
More Than Two Files in the Path:
If the path contains more than two files or datasets, the system fails the batch upload, as the path is considered ambiguous or invalid.
Error Scenarios
Multiple QCPassed Datasets: If two datasets in the same path are marked as QCPassed, the batch upload will fail with a descriptive error indicating the conflict.
Excessive Files in the Path: If more than two files are found for the provided path, the batch upload will fail, instructing the user to provide a more specific or valid path.
Benefits
Enables customers to use intuitive, human-readable paths in their workflows.
Automatically handles dataset selection based on quality control status.
Last updated
Was this helpful?
