arrow-left

All pages
gitbookPowered by GitBook
1 of 4

Loading...

Loading...

Loading...

Loading...

Creating multiple cases

Batch case upload from platform

If you're comfortable with scripting and API usage, you can upload multiple cases at once using those methods. But if you're not a technical expert, don't worry. There is a user-friendly alternative available—importing a CSV file directly through the user interface.

Please follow the steps as described below.

circle-exclamation

Caution: Please note that refreshing or leaving the page, exiting the Add new case tab, or power failure of your computer before you've completed a batch case upload will result in loss of the case creation progress.

hashtag
1. Prepare a CSV file

CSV (Comma-Separated Values) is a simple file format used to store data in tabular form. A row represents a sample, and a column represents a data field.

Start by downloading a CSV template with an example line and mandatory and non-mandatory fields from the Add new case page set to Batch mode (see ). Fill the file with your data according to .

hashtag
2. Upload a CSV file

  1. Click on the + New case button on the .

  2. Click on the Switch to batch button in the top right corner. You'll be directed to the Select file page of the Batch upload flow. Note: Here you can download a CSV template in the valid format.

  3. Drag and drop a CSV file into the box or upload it from the file explorer. Wait for file upload and validation to finish.

hashtag
3. Review file validation results

After validation is complete, you will be directed to the Batch validation page. It features validation results details for you to review:

  • File name,

  • Number of rows in the file,

  • Number of cases to be created

  • Number of errors found,

hashtag
4. Create cases

  1. Click on Create. A progress bar will appear on the right as the cases are created (Cases creation page).

  2. If the cases have been created successfully, the Cases summary page will display the total number of cases that were created.

  3. If there were any errors during the batch case creation process, the Cases summary page will display a table indicating the number of cases that were successfully created and the number of cases that failed.

You will have the option to download a CSV file containing two additional columns: Errors and Case ID. The Errors column will contain error messages for samples where case creation failed, while the Case ID column will contain the Case ID of a successfully created case for the lines where case creation was successful.

triangle-exclamation

API/batch upload limitations

  • When using the API or batch upload, note that applying multiple gene lists can inadvertently exceed a combined limit of 10,000 genes across panels. The platform may not provide an explicit error message in such cases. Plan gene-panel combinations carefully.

  • Status message

    • If no errors were detected, a success message will be displayed

    • If any errors were detected, an error message will be displayed.

      You will be given the option to download a file with error details to help you diagnose and correct any issues with the data. Once you've corrected the CSV file, reupload it.

  • Combining gene lists at case creation is available via the UI only and cannot be performed through API/batch upload.
  • API/batch upload cannot add phenotypes for an unaffected parent.

  • JSON files cannot be uploaded via API/batch upload.

  • step 2
    CSV format requirements
    top navigation panel

    Batch case upload via CLI

    hashtag
    Prerequisites

    • Download and install node js platform via https://nodejs.org/en/downloadarrow-up-right Minimum version required: 16 Upgrade existing installation: nvm install --lts

    hashtag
    Batch upload via CLI (Command Line Interface)

    1. Download the batch case create script. Replace my-domain with your Emedgene domain. Illumina cloud: my-domain.emg.illumina.com Legacy Emedgene cloud: my-domain.emedgene.com

    1. Download the CSV template file.

    1. Edit the downloaded batchCases.csv file. See for more details.

    2. Execute the batch cases creator as java script using the command below. Replace my-domain with your Emedgene domain and my-email with your user email. A prompt for your Emedgene password will appear, enter the password and press Enter.

    1. In case of validation errors in the input CSV, an output CSV called batchCases_results.csv will be created in the same location with detailed error results.

    2. -l will create a log file in the same location.

    More information can be found by running

    CSV format requirements
    curl https://my-domain.emg.illumina.com/v2/js/batchCasesCreator.js --output batchCasesCreator.js
    node batchCasesCreator.js saveTemplateFile
    node batchCasesCreator.js create -h https://my-domain.emg.illumina.com -c batchCases.csv -u my-email -l
    node batchCasesCreator.js --help
    node batchCasesCreator.js create --help

    CSV format requirements

    hashtag
    General CSV format requirements

    The following are the general format requirements for a CSV file used to create multiple cases:

    1. The file must have a .csv extension.

    2. The file must contain a [Data] header.

    3. The row after [Data] header must include the field names identifying the data in each column. The column names are case-sensitive.

    4. The row after the column name header and each subsequent row represents a sample.

    5. Each column represents a data field.

    6. It is essential that there are no empty rows between the [Data] header and the last sample row.

    7. Number of cases per file can’t be greater than 50.


    hashtag
    CSV schema

    hashtag
    1. Mandatory fields

    Must be present in the sample table at all times.

    1. Case Type;

    2. Family Id;

    3. Phenotypes OR Phenotypes Id.

    hashtag
    2. Conditionally mandatory fields

    If these fields are left empty, it will result in the creation of an empty sample.

    1. BioSample Name;

    2. Files Names;

    3. Storage Provider Id;

    This field is mandatory if Files Names is empty:

    1. Sample Type.

    This field is required if the "auto" option is used for Files Names (only relevant for BSSH):

    1. Default Project.

    hashtag
    3. Optional fields

    The sample table may include these supported optional columns.

    1. Boost Genes

    2. Clinical Notes

    3. Date Of Birth

    4. Due Date

    hashtag
    4. Custom fields

    The sample table may contain custom columns to suit your specific needs and include any relevant information that is important for your workflow.

    Each custom field must be assigned a unique name without spaces. Data from custom columns is saved per case under the Additional information section of .

    circle-info

    Note: In cases with more than one sample, custom fields are only recognized and added to case information if their values appear within the same table row where the Relation field is equal to "proband".

    hashtag
    Custom field examples:

    Field (column) name
    Expected input
    Field details
    Example

    hashtag
    Batch case .csv file validation rules

    (highlighted in red), (highlighted in orange), and fields should be filled in according to the following rules.

    Field (column) name
    Expected input
    Field details
    Example

    hashtag
    Handling a proband sample with unknown sex

    circle-exclamation

    When a sample is user-assigned "Unknown" sex, the system assumes "Female". This affects CNV interpretation on sex chromosomes in case the genetic sex is actually male:

    • Chromosome X: CN = 2 is considered reference (REF) for a female genome, so CNVs with two copies are hidden by default. This may cause chromosome X duplications to be missed.

    hashtag
    Required BSSH file path format:

    For BSSH, it is necessary to use the actual names (numbers):

    instead of aliases

    hashtag
    Human-readable path for BSSH files in batch CSV

    In version 37, we introduced an enhancement to the batch upload process that allows you to provide a human-readable path in their batch CSV for BSSH files.

    hashtag
    Validations

    When a batch CSV includes a human-readable path, the system performs the following validations for paths in BSSH storage:

    1. Single File in the Path:

      • If the provided path contains exactly one file or dataset, the batch upload proceeds successfully.

    2. Two Files in the Path:

    hashtag
    Error Scenarios

    • Multiple QCPassed Datasets: If two datasets in the same path are marked as QCPassed, the batch upload will fail with a descriptive error indicating the conflict.

    • Excessive Files in the Path: If more than two files are found for the provided path, the batch upload will fail, instructing the user to provide a more specific or valid path.

    hashtag
    Benefits

    • Enables customers to use intuitive, human-readable paths in their workflows.

    • Automatically handles dataset selection based on quality control status.

    Execute now

  • Gender. See an important note

  • Gene List Id

  • Kit Id

  • Intersect Bed Id (38.0+)

  • Label Id

  • Opt In

  • Relation

  • Selected Preset

  • Visualization Files

  • Clinical Notes

    Free text

    Optional

    A 14-year-old boy with a visual acuity of 20/200 in both eyes in whom hearing loss was first noted at 5 years of age on routine screening; audiometry revealed sensorineural hearing loss.

    Date Of Birth

    Date "YYYY-MM-DD"

    Optional

    2013-01-22

    Default Project

    Free text

    Conditionally mandatory. Must be filled in if the "auto" option is used for Files Names (only relevant for BSSH).

    GIAB

    Due Date

    Date "YYYY-MM-DD"

    Optional

    2023-05-03

    Execute now

    1. "TRUE" 2. "FALSE"

    Optional. Default value is "TRUE". Use "FALSE" if you don’t want to run the case upon uploading the file. Only considered for proband.

    FALSE

    Family Id

    Free text

    Mandatory

    RM8392

    Files Names

    1. Semicolon-separated list of paths to .fastq, .fastq.gz, .vcf, .vcf.gz, .bam, .cram, .gt_sample_summary.json, .annotated_cyto.json files without spaces 2. "existing" 3. "auto" (BSSH)

    Conditionally mandatory. An empty sample will be created if the field is left blank. The "existing" option automatically locates FASTQ files based on the BioSample Name. Note: If data files for an existing case were sourced from the customer’s external bucket and later removed, attempting to create a case from those files will result in an error.

    Learn about the . With the "auto" option, BSSH users can automatically locate FASTQ files based on the BioSample Name and Default Project provided. When using BSSH without the "auto" option, ensure that your file path is .

    /GIAB_cases/1/NA24385.dragen.hard-filtered.gvcf.gz;/QA_cases/Other/NA24385.dragen.cnv.vcf.gz;/QA_cases/Other/NA24385.dragen.repeats.vcf;

    Gender

    1. "F" 2. "M" 3. "U"

    Optional. Default value is "U". See an .

    M

    Gene List Id

    integer

    Optional. Must be the id of a previously defined Gene List. Only considered for proband.

    12345

    Kit Id

    integer

    Optional.

    <38.0: ID of a Region of interest BED.

    38.0+: ID of a Coverage BED. Must be the id of a previously defined kit. Only considered for proband.

    23456

    Intersect Bed Id (38.0+)

    integer

    Optional. ID of a Region of interest BED. Must be the id of a previously defined kit. Only considered for proband.

    78957

    Label Id

    integer

    Optional. Must be the id of a previously defined Case Label. Only considered for proband.

    34567

    Opt In

    1. "TRUE" 2. "FALSE"

    Optional. Indicates whether the case subject consented to the with your network(s). Default value is "TRUE".

    FALSE

    Phenotypes

    1. Semicolon-separated list of HPO phenotype terms

    2. "Unaffected" is used for non-affected family members.

    Mandatory for proband sample if Phenotypes Id is empty. List must be under 100. It is possible to include non-HPO terms if Phenotypes Id is empty.

    Abnormal pupillary function;Orthotopic os odontoideum;

    Phenotypes Id

    Semicolon-separated list of HPO phenotype IDs

    Mandatory for proband sample if Phenotypes is empty.

    List must be under 100.

    HP:0007686;HP:0025375;

    Relation

    1. "proband" 2. "mother" 3. "father" 4. "sibling"

    Optional. Default value is "proband". Values "proband", "father", "mother" can be only used once per Family ID. One sample with Relation "proband" is required per Family ID.

    Mother

    Sample Type

    1. "FASTQ" 2. "VCF"

    Conditionally mandatory. Required if Files Names is empty. Only considered for proband.

    FASTQ

    Selected Preset

    1. Free text 2. "Default"

    Optional. Must be the name of a previously defined Preset. If set to default, the default Preset will be applied. If left empty, no Preset will be applied.

    High quality candidates

    Storage Provider Id

    Integer

    Conditionally mandatory. Required if Files Names is not empty. Must be from the configured storage provider ID list.

    208

    Visualization Files

    Semicolon-separated list of paths to sequence alignment data files of extension .bam, .cram; .tn.bw, .baf.bw, .roh.bed, .lrr.bedgraph, .baf.bedgraph

    Optional

    /giab_project/NA24385.bam

    Chromosome Y: CN = 0 is considered reference (REF) for a female genome, so CNVs with zero copies are hidden by default. This may cause chromosome Y deletions to be missed.

    To include these variants in the analysis, enable the Include Reference Homozygosity and No Coverage Calls toggle in Workbench & Pipeline Settings.

    If the path contains two files with the same name (for example, two pairs of fastqs in a dataset) , the system will:

    • Select the dataset marked as QCPassed.

    • Fail the batch upload if both datasets are marked as QCPassed, as this indicates conflicting data.

  • More Than Two Files in the Path:

    • If the path contains more than two files or datasets, the system fails the batch upload, as the path is considered ambiguous or invalid.

  • Institution

    Free text

    Custom

    GenoMed Solutions

    Sample_Received_Date

    Free text

    Custom

    24-02-2022

    Sample_Type

    Free text

    Custom

    BioSample Name

    Free text

    Conditionally mandatory. An empty sample will be created if the field is left blank.

    NA24385

    Boost Genes

    1. "TRUE" 2. "FALSE"

    Optional. Indicates whether the Boost genes mode will be used. "TRUE" means that variants in the targeted genes will receive upgraded scores during prioritization by the AI Shortlist algorithm. Default value is "FALSE". Only considered for proband.

    TRUE

    Case Type

    1. "Whole Genome" 2. "Exome" 3. "Custom Panel" 4. Array

    5. Custom case type

    Mandatory. Only considered for proband.

    Case Info
    Mandatory
    Conditionally mandatory
    Optional

    Amniotic Fluid

    Whole Genome

    /projects/3824821/appresults/2319318/files/119675608
    /projects/ABC_DEF_2022-12-22_DEv395/appresults/ABC-GM58342-def/files/ABC-GM58342-def.hard-filtered.vcf.gz
    current limitation for CRAM file input
    formatted correctly
    important note
    extended sharing of data