CSV format requirements for variant upload

General CSV format requirements

A CSV file used for variant upload to Curate must meet the following requirements:

  • File extension: .csv.

  • The first row contains column headers.

  • Each column represents a data field.

  • Column headers are case-sensitive and exactly match this specification or a template.

  • Each row after the header represents a variant.

  • There are no empty rows between the header row and variant rows and between variant rows.

  • Maximum file size: 10 MB.

  • Maximum number of variants: 5000.

  • Supported variant types:

    • SNV (SNV, indel)

    • CNV (DEL, DUP)

  • SNV and CNV rows may be included in the same file if all mandatory fields are present (see Field summary).

CSV file templates

You can use the templates below as a starting point.

Field summary

Field (column name)
SNV
CNV

Vartype

Mandatory

Mandatory

Chromosome

Mandatory

Mandatory

Position

Mandatory

Mandatory

REF

Mandatory

Mandatory

ALT

Mandatory

N/A

End

N/A

Mandatory

Overlap

N/A

Optional

Pathogenicity

Optional

Optional

Disease

Optional

Optional

Interpretation

Optional

Optional

Notes

Optional

Optional

Transcript

Optional

N/A

Access-controlled fields

Some fields require specific user permissions to update.

Access-controlled fields include:

  • Pathogenicity

  • Disease

  • Interpretation

  • Notes

  • Transcript

If you do not have permission to update a field:

  • Curate ignores the field

  • The variant is still created

  • The upload summary in the last step of multiple variant upload marks the variant as Partial Success and lists the ignored fields \

Batch variant CSV file validation rules

Only fields listed on this page are processed. Any unsupported fields included in the CSV file are ignored.

See the tables below for fields supported for each variant type: mandatory (in red font) and optional.

Single nucleotide variants

The fields for an SNV variant should be filled in according to the rules in the table.

Table 1. Validation rules for SNV fields

Field (column) name
Field details
Expected input
Example

Vartype

Mandatory.

SNV. Applicable for both SNV and indel.

SNV

Chromosome

Mandatory.

1 - 22, X, Y, M

3

Position

Mandatory.

Integer

123000

REF

Mandatory.

A, T, G, C, or a string of these letters

G

ALT

Mandatory.

A, T, G, C, or a string of these letters

GACT

Pathogenicity

Optional. Access-controlled.

Pathogenic, Likely pathogenic, VUS, Likely benign, Benign

Likely pathogenic

Disease

Optional. OMIM ID.

Integer

113705

Interpretation

Optional. Access-controlled.

Free text under 65000 characters

Free text

Notes

Optional. Access-controlled.

Free text under 65000 characters

Free text

Transcript

Optional. Access-controlled. NCBI RefSeq transcript ID.

NCBI RefSeq transcript ID

NM_000183.3

Copy number variants

The fields for a CNV variant should be filled in according to the rules in the table.

Table 2. Validation rules for CNV fields

Field (column) name
Field details
Expected input
Example

Vartype

Mandatory.

DEL, DUP

DEL

Chromosome

Mandatory.

Integer

3

Position

Mandatory.

Integer

123000

End

Mandatory.

Integer greater than Position

124000

REF

Mandatory.

A, T, G, C

G

Overlap

Optional: if absent or empty, the fallback value 70 is assigned.

Integer 0-100

80

Pathogenicity

Optional. Access-controlled.

Pathogenic, Likely pathogenic, VUS, Likely benign, Benign

Likely pathogenic

Disease

Optional. Access-controlled. OMIM ID.

Integer

113705

Interpretation

Optional. Access-controlled.

Free text under 65000 characters

Free text

Notes

Optional. Access-controlled.

Free text under 65000 characters

Free text

Last updated

Was this helpful?