# Organization databases

**Organization databases (DBs)** are custom variant datasets that [enhance interpretation](https://help.connected.illumina.com/emedgene/emedgene-analyze-manual/tertiary-analysis-pipeline/annotations-from-organization-databases) by adding population-specific frequencies, detecting technical artifacts, and referencing curated variants.

## Organization database types

### **By purpose**

* **Historic DBs:** Filter out variants common in the population of interest
* **Noise DBs:** Detect technical artifacts
* **Curated DBs:** Reference previously curated variants

{% tabs %}
{% tab title="Historic database" %}

### Historic database

* Serves as a **private population frequency database**, helping users evaluate variant frequencies within their population.
* An internal historic database is created from cases processed in your organization’s Emedgene account.
* Typically includes all unique cases at the time of creation but can be tailored to include only specific cases.
* Common variants are less likely to be pathogenic and can be [filtered out](https://help.connected.illumina.com/emedgene/reviewing_a_case/analysis-tools-tab/analysis_tools_tab/filters_presets_panel/filters/polymorphism_filters#maximum-allele-frequency-in-organizational-databases).
  {% endtab %}

{% tab title="Noise database" %}

### Noise database

* Serves as a **quality control database** to identify recurring artifacts introduced by the sequencing technique, sequencing platform, and analysis pipeline.
* Typically, a noise database is composed of samples from unaffected individuals, such as healthy parents.
* If only patient data is available, the database remains useful for filtering out high-frequency artifacts. However, caution is necessary when filtering rare variants to avoid excluding true pathogenic ones.
* Sample size recommendations:
  * ≥ 100 samples to filter out variants with > 5% allele frequency in the database
  * ≥ 500 samples to filter out variants with > 1% allele frequency in the database
* Multiple noise database instances can be maintained to account for different assays and calling methodologies.
* Common variants can be [filtered out](https://help.connected.illumina.com/emedgene/reviewing_a_case/analysis-tools-tab/analysis_tools_tab/filters_presets_panel/filters/polymorphism_filters#maximum-allele-frequency-in-organizational-databases) as likely artifacts.
  {% endtab %}

{% tab title="Curated database" %}

### Curated database

* Serves as a **reference of previously curated variants.**
* A static curated variant database implemented upon request.\
  **Note:** This is not the same as variants found in the dynamic [Curate](https://help.connected.illumina.com/emedgene/emedgene-curate-manual/curate_overview/curate_overview) database.
* [Filtering by known variants](https://help.connected.illumina.com/emedgene/reviewing_a_case/analysis-tools-tab/analysis_tools_tab/filters_presets_panel/filters/variant_effect_filters#advanced) from your curated databases aids in pinpointing significant variants, consistency, and faster interpretation.
  {% endtab %}
  {% endtabs %}

### By origin

* **Internal DBs:** [Built automatically](https://help.connected.illumina.com/emedgene/emedgene-analyze-manual/settings/organization_settings_-330+/workbench-and-pipeline/organization-db-management/configuring-an-organization-database/creating-a-database-vcf-file/creating-an-internal-database-vcf-file) from cases processed within your organization’s Emedgene account.\
  **Note:** Historic and noise databases only.
* **External DBs:** [Created](https://help.connected.illumina.com/emedgene/emedgene-analyze-manual/settings/organization_settings_-330+/workbench-and-pipeline/organization-db-management/configuring-an-organization-database/creating-a-database-vcf-file/creating-an-external-database-vcf-file) by the organization from other sources, such as:
  * Cases analyzed with different software
  * Research cohorts or legacy data
  * Publicly available datasets

### By included variant types

* **SNV DBs**: Store single nucleotide variants (SNVs)
* **CNV DBs**: Store copy number variants (CNVs), insertions >50bp (INS), short tandem repeats (STRs), and regions of homozygosity (ROH)
