# Organization databases

**Organization databases (DBs)** are custom variant datasets that [enhance interpretation](/emedgene/emedgene-analyze-manual/tertiary-analysis-pipeline/annotations-from-organization-databases.md) by adding population-specific frequencies, detecting technical artifacts, and referencing curated variants.

## Organization database types

### **By purpose**

* **Historic DBs:** Filter out variants common in the population of interest
* **Noise DBs:** Detect technical artifacts
* **Curated DBs:** Reference previously curated variants

{% tabs %}
{% tab title="Historic database" %}

### Historic database

* Serves as a **private population frequency database**, helping users evaluate variant frequencies within their population.
* An internal historic database is created from cases processed in your organization’s Emedgene account.
* Typically includes all unique cases at the time of creation but can be tailored to include only specific cases.
* Common variants are less likely to be pathogenic and can be [filtered out](/emedgene/emedgene-analyze-manual/reviewing_a_case/analysis-tools-tab-beta-v100.39.0+/filters_presets_panel/filters/polymorphism_filters.md#maximum-allele-frequency-in-organizational-databases).
  {% endtab %}

{% tab title="Noise database" %}

### Noise database

* Serves as a **quality control database** to identify recurring artifacts introduced by the sequencing technique, sequencing platform, and analysis pipeline.
* Typically, a noise database is composed of samples from unaffected individuals, such as healthy parents.
* If only patient data is available, the database remains useful for filtering out high-frequency artifacts. However, caution is necessary when filtering rare variants to avoid excluding true pathogenic ones.
* Sample size recommendations:
  * ≥ 100 samples to filter out variants with > 5% allele frequency in the database
  * ≥ 500 samples to filter out variants with > 1% allele frequency in the database
* Multiple noise database instances can be maintained to account for different assays and calling methodologies.
* Common variants can be [filtered out](/emedgene/emedgene-analyze-manual/reviewing_a_case/analysis-tools-tab-beta-v100.39.0+/filters_presets_panel/filters/polymorphism_filters.md#maximum-allele-frequency-in-organizational-databases) as likely artifacts.
  {% endtab %}

{% tab title="Curated database" %}

### Curated database

* Serves as a **reference of previously curated variants.**
* A static curated variant database implemented upon request.\
  **Note:** This is not the same as variants found in the dynamic [Curate](broken://pages/JHmO0TJUuneVcw5IzKwJ) database.
* [Filtering by known variants](/emedgene/emedgene-analyze-manual/reviewing_a_case/analysis-tools-tab-beta-v100.39.0+/filters_presets_panel/filters/variant_effect_filters.md#advanced) from your curated databases aids in pinpointing significant variants, consistency, and faster interpretation.
  {% endtab %}
  {% endtabs %}

### By origin

* **Internal DBs:** [Built automatically](/emedgene/emedgene-analyze-manual/settings/organization_settings_-330+/workbench-and-pipeline/organization-db-management/configuring-an-organization-database/creating-a-database-vcf-file/creating-an-internal-database-vcf-file.md) from cases processed within your organization’s Emedgene account.\
  **Note:** Historic and noise databases only.
* **External DBs:** [Created](/emedgene/emedgene-analyze-manual/settings/organization_settings_-330+/workbench-and-pipeline/organization-db-management/configuring-an-organization-database/creating-a-database-vcf-file/creating-an-external-database-vcf-file.md) by the organization from other sources, such as:
  * Cases analyzed with different software
  * Research cohorts or legacy data
  * Publicly available datasets

### By included variant types

* **SNV DBs**: Store single nucleotide variants (SNVs)
* **CNV DBs**: Store copy number variants (CNVs), insertions >50bp (INS), short tandem repeats (STRs), and regions of homozygosity (ROH)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.connected.illumina.com/emedgene/emedgene-analyze-manual/settings/organization_settings_-330+/workbench-and-pipeline/organization-db-management/organization-databases.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
