# CSV format requirements for variant upload

## General CSV format requirements

A CSV[^1] file used for variant upload to **Curate** must meet the following requirements:

* File extension: .csv.
* The first row contains column headers.
* Each column represents a data field.
* Column headers are case-sensitive and exactly match this specification or a template.
* Each row after the header represents a variant.
* There are no empty rows between the header row and variant rows and between variant rows.
* Maximum file size: 10 MB.
* Maximum number of variants: 5000.
* Supported variant types:
  * **SNV** (SNV, indel)
  * **CNV** (DEL, DUP)
* SNV and CNV rows may be included in the same file if all mandatory fields are present (see [Field summary](#field-summary)).

## CSV file templates

You can use the templates below as a starting point.

{% columns %}
{% column %}
{% file src="/files/U3pHX2OvyZ5t62dqxi6Y" %}
{% endcolumn %}

{% column %}
{% file src="/files/n7BAgk8f2aXui96NozFE" %}
{% endcolumn %}
{% endcolumns %}

## Field summary

| Field (column name) | SNV       | CNV       |
| ------------------- | --------- | --------- |
| `Vartype`           | Mandatory | Mandatory |
| `Chromosome`        | Mandatory | Mandatory |
| `Position`          | Mandatory | Mandatory |
| `REF`               | Mandatory | Mandatory |
| `ALT`               | Mandatory | N/A       |
| `End`               | N/A       | Mandatory |
| `Overlap`           | N/A       | Optional  |
| `Pathogenicity`     | Optional  | Optional  |
| `Disease`           | Optional  | Optional  |
| `Interpretation`    | Optional  | Optional  |
| `Notes`             | Optional  | Optional  |
| `Transcript`        | Optional  | N/A       |

## Access-controlled fields

Some fields require specific [user permissions](/emedgene/emedgene-analyze-manual/settings/user_roles/iam-scopes-emedgene-roles.md) to update.

**Access-controlled fields include:**

* `Pathogenicity`
* `Disease`
* `Interpretation`
* `Notes`
* `Transcript`

If you do not have permission to update a field:

* **Curate** ignores the field
* The variant is still created
* The upload summary in the last step of [multiple variant upload](/emedgene/emedgene-curate-manual/curate_variants/adding-variants-to-curate/adding-multiple-variants-to-curate.md) marks the variant as **Partial Success** and lists the ignored fields \\

## Batch variant CSV file validation rules

Only fields listed on this page are processed. Any unsupported fields included in the CSV file are ignored.

See the tables below for fields supported for each variant type: mandatory (in <mark style="color:red;">red font</mark>) and optional.

### Single nucleotide variants

The fields for an SNV variant should be filled in according to the rules in the table.

Table 1. Validation rules for SNV fields

<table data-full-width="false"><thead><tr><th>Field (column) name</th><th>Field details</th><th>Expected input</th><th>Example</th></tr></thead><tbody><tr><td><mark style="color:red;"><code>Vartype</code></mark></td><td>Mandatory.</td><td><code>SNV</code>.<br>Applicable for both SNV and indel.</td><td><code>SNV</code></td></tr><tr><td><mark style="color:red;"><code>Chromosome</code></mark></td><td>Mandatory.</td><td><code>1</code> - <code>22</code>, <code>X</code>, <code>Y</code>, <code>M</code></td><td><code>3</code></td></tr><tr><td><mark style="color:red;"><code>Position</code></mark></td><td>Mandatory.</td><td>Integer</td><td><code>123000</code></td></tr><tr><td><mark style="color:red;"><code>REF</code></mark></td><td>Mandatory.</td><td><code>A</code>, <code>T</code>, <code>G</code>, <code>C</code>, or a string of these letters</td><td><code>G</code></td></tr><tr><td><mark style="color:red;"><code>ALT</code></mark></td><td>Mandatory.</td><td><code>A</code>, <code>T</code>, <code>G</code>, <code>C</code>, or a string of these letters</td><td><code>GACT</code></td></tr><tr><td><code>Pathogenicity</code></td><td>Optional.<br>Access-controlled.</td><td><code>Pathogenic</code>, <code>Likely pathogenic</code>, <code>VUS</code>, <code>Likely benign</code>, <code>Benign</code></td><td><code>Likely pathogenic</code></td></tr><tr><td><code>Disease</code></td><td>Optional.<br><br>OMIM ID.</td><td>Integer</td><td><code>113705</code></td></tr><tr><td><code>Interpretation</code></td><td>Optional.<br>Access-controlled.</td><td>Free text under 65000 characters</td><td>Free text</td></tr><tr><td><code>Notes</code></td><td>Optional.<br>Access-controlled.</td><td>Free text under 65000 characters</td><td>Free text</td></tr><tr><td><code>Transcript</code></td><td>Optional.<br>Access-controlled.<br><br>NCBI RefSeq transcript ID.</td><td>NCBI RefSeq transcript ID</td><td><code>NM_000183.3</code></td></tr></tbody></table>

### Copy number variants

The fields for a CNV variant should be filled in according to the rules in the table.

Table 2. Validation rules for CNV fields

<table data-full-width="false"><thead><tr><th>Field (column) name</th><th width="186.20001220703125">Field details</th><th>Expected input</th><th>Example</th></tr></thead><tbody><tr><td><mark style="color:red;"><code>Vartype</code></mark></td><td>Mandatory.</td><td><code>DEL</code>, <code>DUP</code></td><td><code>DEL</code></td></tr><tr><td><mark style="color:red;"><code>Chromosome</code></mark></td><td>Mandatory.</td><td>Integer</td><td><code>3</code></td></tr><tr><td><mark style="color:red;"><code>Position</code></mark></td><td>Mandatory.</td><td>Integer</td><td><code>123000</code></td></tr><tr><td><mark style="color:red;"><code>End</code></mark></td><td>Mandatory.</td><td>Integer greater than <code>Position</code></td><td><code>124000</code></td></tr><tr><td><mark style="color:red;"><code>REF</code></mark></td><td>Mandatory.</td><td><code>A</code>, <code>T</code>, <code>G</code>, <code>C</code></td><td><code>G</code></td></tr><tr><td><code>Overlap</code></td><td>Optional: if absent or empty, the fallback value <code>70</code> is assigned.</td><td>Integer 0-100</td><td><code>80</code></td></tr><tr><td><code>Pathogenicity</code></td><td>Optional.<br>Access-controlled.</td><td><code>Pathogenic</code>, <code>Likely pathogenic</code>, <code>VUS</code>, <code>Likely benign</code>, <code>Benign</code></td><td><code>Likely pathogenic</code></td></tr><tr><td><code>Disease</code></td><td>Optional.<br>Access-controlled.<br><br>OMIM ID.</td><td>Integer</td><td><code>113705</code></td></tr><tr><td><code>Interpretation</code></td><td>Optional.<br>Access-controlled.</td><td>Free text under 65000 characters</td><td>Free text</td></tr><tr><td><code>Notes</code></td><td>Optional.<br>Access-controlled.</td><td>Free text under 65000 characters</td><td>Free text</td></tr></tbody></table>

[^1]: Comma-Separated Values


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.connected.illumina.com/emedgene/emedgene-curate-manual/curate_variants/adding-variants-to-curate/adding-multiple-variants-to-curate/csv-format-requirements-for-variant-upload.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
