# Connected Multiomics Walkthrough

Connected Multiomics is available for tertiary analysis of the Illumina 5-Base DNA Prep data. This walkthrough includes the following main sections:

* [Getting Started](#getting-started)
* [Demo Data](#demo-data)
* [Default 5-base Methylation Analysis](#default-5-base-methylation-analysis)
* [Custom 5-base Methylation Analysis](#custom-5-base-methylation-analysis)
  * [Analysis On 5-base DNA Methylation Data (CpG sites)](#analysis-on-5-base-dna-methylation-data-cpg-sites)
  * [Analysis On 5-base DNA Methylation Data (Regions)](#analysis-on-5-base-dna-methylation-data-regions)
  * [Analysis on DNA Variants Data](#analysis-on-dna-variants-data)
  * [Dual-omics Analysis of DNA Methylation and Variants](#dual-omics-analysis-of-dna-methylation-and-variants)

## Getting Started

[Logging into ICM](https://help.multiomics.illumina.com/icm/introduction/#log-in-to-connected-multiomics)

[Upload Data Files](https://help.multiomics.illumina.com/icm/introduction/upload-data-files)

[Creating a Study from a ICA Project](https://help.multiomics.illumina.com/icm/studies/create-study)

[Viewing Results and Navigating in ICM](https://help.multiomics.illumina.com/icm/analyses/enter-analysis)

## Demo Data

Demo data that can be used to follow along with this walkthrough is found in the Connected Multiomics Demo Data repository. The dataset can be found at /Multiomics-Demo-Data/Methylation/Illumina 5-base-solution/AML-demo-data/. This demo data consists of 24 samples, from acute myeloid leukemia (AML) patient cohort. Both 5-base DNA methylation data and variants data are available in this demo dataset and were generated using DRAGEN Somatic Tumor-only pipeline. In this walkthrough, we outline analysis steps that can be performed to explore the data, identify differentially methylated regions (DMRs) between AML subtypes, and overlay the DMRs with small genomic variants.

Prior to starting an analysis in the Connected Multiomics software, we first need to add the AML demo data and its sample metadata into a study. As the AML demo data consists of both 5-base DNA methylation data and variants data, we need to ingest 7 files with files extension listed below per sample - the first 5 files for methylation data, the last 2 files for variants data:

* .CX\_report.txt.gz
* .methyl\_metrics.csv
* .mapping\_metrics.csv
* .wgs\_coverage\_metrics.csv
* .M-bias.txt
* .vcf.gz
* .vcf.gz.tbi

These files are generated from DRAGEN analysis. [CX\_report file](https://help.connected.illumina.com/dragen-5-base/additional-information/key-output-files-and-metrics#cx-report) is the key output file that contains methylation read counts at single nucleotide level. The metrics files and the M-bias file contain QC metrics for reads mapping quality and methylation calling, which will be used to generate visualizations in 5-base Methylation QC task in the Connected Multiomics. The VCF file contains small variant data.

After the DRAGEN output files are ingested into a study, follow below steps to add a sample metadata file to the same study:

* Click **+ Add data**, choose **Select from ICA project**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-f06151cdf7e58ac5cb115346d814384be8765374%2F5base1.1.addData.png?alt=media" alt="" width="376"><figcaption></figcaption></figure>

* At **Select data type to import** drop-down, select **Bulk** > **Proteomics**.
* Select **Illumina Protein Prep** as format, then click **Select format**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-a1e44b7882e58e68b66cc08b08a82112ace2da29%2F5base1.1.selectIPPimporte.png?alt=media" alt=""><figcaption></figcaption></figure>

* Navigate to the same demo data folder, select **AML\_24samples\_metadata.tsv** file, then click **Add selected data to your study** to add the sample metadata file to the study.

{% hint style="warning" %}
Above steps used proteomics data importer to add metadata file that is stored within an exisiting ICA project because current methylation data importer does not support adding of metadata file from ICA project. When you are analyzing your own data, you can directly upload a metadata (.tsv or .csv) file from your local computer to the Connected Multiomics software, more details at [Sample Metadata](https://help.multiomics.illumina.com/icm/studies/view-studies/sample-metadata).
{% endhint %}

## Default 5-base Methylation Analysis

The default 5-base methylation analysis runs a pre-defined pipeline on the data and presents analysis results in visualizations in a data viewer.

### Creating a Default Analysis

After all 24 samples are added to a study, follow these steps to create a Default analysis in Connected Multiomics:

* Click on **+ New Analysis.**
* In the pop-up window, provide a name for the analysis, select **Default: Illumina 5-base DNA Methylation** as the Analysis Type, choose a sample group to be included in the analysis (all samples option is selected by default), and click on the **Run Analysis** button.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-b6eb7f8235610c39ec8e019b12858104680a576b%2Fcreate-default-5base-analysis.png?alt=media" alt=""><figcaption></figcaption></figure>

* Refresh the page to get the latest status of the analysis.

### View Default Analysis Results

When the analysis status is **Complete**, click on the analysis name to open results. The analysis opens into a data viewer that consists of principal components analysis (PCA) results and methylation QC plots:

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-3dc807e4a4f60114ea6bad50d103c51d1c597223%2F5base1.1.default_summary_report.png?alt=media" alt=""><figcaption></figcaption></figure>

The plots can be configured by selecting Configure option from toolbox on the left within each plot:

* To change color of the data points, select Configure > Style > Color by
* To change size of the data points, select Configure > Style > Point size

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-335d1fb890ccf10118c6699b6dc958f02a11fc41%2F5base1.1.default_configurePlot.png?alt=media" alt=""><figcaption></figcaption></figure>

### View Default Analysis Pipeline

To view the default analysis pipeline, click on the Analysis name on breadcrumb on top of the data viewer to go to Analyses page. The default analysis pipeline looks like this:

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-00d3cf75528e46549a282303f06d791215c89b7d%2F5base1.1.default_pipeline.png?alt=media" alt=""><figcaption></figcaption></figure>

The default analysis imports the data, creates [methylation QC](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/qa-qc/5-base-methylation-qc) report, [generates regions](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/region-analysis/get-regional-methylation) from CpG sites, then use exploratory analysis methods such as [Principal Components Analysis (PCA)](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/exploratory-analysis/pca), and [k-means clustering](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/exploratory-analysis/k-means-clustering) to explore the data.

To go back to the visualizations, double-click on the **Summary report** task node.

## Custom 5-base Methylation Analysis

The custom 5-base methylation analysis imports the methylation data and variants data (if available). Users will use a context-sensitive menu to run analysis tasks individually and eventually build an analysis pipeline. In this section, we outline analysis steps for a workflow shown below, that covers methylation analysis using CpG data (light blue), methylation analysis using regions data (dark blue), variant analysis (orange), and dual-omics analysis that combines the methylation and the variants data (green).

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-17080d8afb0ad474892560249fc7cced4ba87b25%2F5base1.1.completeTaskGraph%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

### Creating a Custom Analysis

After all 24 samples are added to a study, follow these steps to create a Custom analysis in Connected Multiomics:

* Click on **+ New Analysis.**
* In the pop-up window, provide a name for the analysis, select **Custom: Illumina 5-base DNA Methylation** as the Analysis Type, choose a sample group to be included in the analysis (all samples option is selected by default), and click on the **Run Analysis** button.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-612090014b6f27d7ab9a8792f7d8afe7ee829c6b%2F5base-create-custom-analysis.png?alt=media" alt=""><figcaption></figcaption></figure>

* Refresh the page to get the latest status of the analysis.

When the Status is **Complete**, click on the analysis name to enter the analysis module. There is a **Import cohort** task node and two data nodes - one for **5-base Methylation**, one for **Variants**. To review the number of samples and features in the methylation data, hover over the **5-base Methylation** data node. Features refer to CpG sites.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-f658298fcaebd1f49273b92891c2d26c0c702527%2F5base1.1.imported_data.png?alt=media" alt=""><figcaption></figcaption></figure>

The 5-base Methylation data node contains raw methylated counts and unmethylated counts for genomic postions present in the CX reports. For positions with methylation calls on both strands in the CX report, the strands are collapsed such that position on the postive strand is used and the methylation counts are summed. This data node also contains percent methylation levels which will be used in exploratory analysis such as PCA.

{% hint style="warning" %}
The Connected Multiomics software currently supports analysis up to 24 samples. For a cohort larger than 24 samples, you may experience ingestion failure at analysis creation step, please click the "Retry" button for the failed analysis at Analyses page to relaunch the analysis.
{% endhint %}

### View Sample Metadata

The sample metadata that was added into the study is also imported into an analysis during analysis creation step. To view sample metadata, navigate to **Metadata** tab within an analysis.

A sample table is displayed in the Metadata page, with samples at rows and sample metadata or attributes at columns. In this AML demo dataset, we have one sample attribute called Mol\_Phenotype, which consists of AML subtypes defined by molecular classifications.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-db2bf2f14842d7431dc20987742c922d82c6eae7%2F5base1.1.MetadataTab.png?alt=media" alt=""><figcaption></figcaption></figure>

## Analysis On 5-base DNA Methylation Data (CpG sites)

### 5-base Methylation QC

The [**5-base methylation QC**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/qa-qc/5-base-methylation-qc) task in the Connected Multiomics enables you to visualize sample-level QC metrics that describe reads mapping quality and CpG methylation calling. The QC metrics are extracted from the DRAGEN analysis metric files that were ingested into the study. To invoke the **5-base methylation QC** task:

* At **Analyses** page, click on the **5-base Methylation** node.
* Click **QA/QC** section in the context-sensitive task menu on the right.
* Click **5-base methylation QC**.

After the **5-base methylation QC** task is completed, double-click on the task node to open the QC report in a data viewer. The QC report consists of plots and tables organized in 2 sheets. Click sheet name at the bottom of the data viewer to navigate from one sheet to another.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d550076b1be6c51213289f2956d22f11d442cee3%2F5base1.1.QCmetricsplot.png?alt=media" alt=""><figcaption></figcaption></figure>

* Sheet **Metrics** shows sample-level QC metrics plot. Each sample is a data point, they are randomly spread out on x-axis. The QC metric is represented by y-axis. Each plot has a violin plot overlaid to show distribution of the QC metrics.
  * Percent methylation in samples: Percentages of CpG methylation in samples.
  * Percent methylation in unmethylated control: Percentage of CpG methylation in the unmethylated control (lambda). Low value indicates good quality.
  * Percent methylation in methylated control: Percentage of CpG methylation in the methylated control (pUC19). High value indicates good quality.
  * Percent duplicate reads: Percentage of duplicate marked reads, as a result of PCR amplification.
  * Percent mapped reads: Percentage of mapped reads, indicate the alignment rate.
  * Average autosomal coverage: Mean autosomal coverage across the whole genome. Higher coverage indicates the counts of methylated/unmethylated more accurately reflects the true methylation amount at any particular site.
  * QC metrics table: Text representations of the QC metrics plots.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-957e459e63bee11f5bf8b159bf2fa3ebbb40a87c%2F5base1.1.Mbiasplot.png?alt=media" alt=""><figcaption></figcaption></figure>

* Sheet **M-bias** shows M-bias plots for methylation level and coverage across genomic positions on read1 and read2. The M-bias should be consistent across all positions on the reads. It is common for the first/last 10 bases to have uneven methylation due to end-repair and sequencing artifacts.
  * In this dataset, we observed one sample (pointed in orange arrow) showing M-bias. Hover over the dots on the line show the sample name. We are going to remove this sample from the data in the next step.

### Filter samples

We have identified one sample with M-bias from the M-bias QC plot. Now, switch back to Metrics sheet and click to select this sample to see how this sample performs across all QC metrics.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-cc84137d82817c51bae71bbe064f82e72072f8c0%2F5base1.1.oneSampleQC.png?alt=media" alt=""><figcaption></figcaption></figure>

From the Metrics plots, we can see this sample has lower percent methylation in the sample compared to other samples in the dataset, it also has higher percent of duplicate reads, lower percent of mapped reads, and lower average autosomal coverage.

Let's use **Percent methylation in samples** metric to filter out the poor-quality sample.

* Click **Selection** tool from the toolbar on the left within the plot, then choose **Select and filter**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-0c4e28c762b65b36a76c43ce8f54017637ab4ae9%2F5base1.1.openSelectAndFilter.png?alt=media" alt="" width="481"><figcaption></figcaption></figure>

* In the **Select and filter** dialog, select **Criteria** at **Selection mode**.
* Under **Criteria** section, type in **65** as minimum **Sample CpG Methylation**, press **Enter**. The poor-quality sample is now deselected (dimmed) across all plots.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-7515d2faa798a3d0c112b5c8f4a8367c4b9b0a39%2F5base1.1.setfiltercriteriaQC.png?alt=media" alt=""><figcaption></figcaption></figure>

* At **Filter** section, click on **Include selected points**, then click **Apply observation filter...**

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-87ffb2e073905dd598aa14461c4e3e49db503634%2F5base1.1.includeSelectedPoints.png?alt=media" alt=""><figcaption></figcaption></figure>

* Click to select **5-base Methylation** node as the data node to apply the filter on, then click **Select**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-21d0376d8dc363d80e751585da40a7ec973580f5%2F5base1.1.selectNodeToFilter.png?alt=media" alt=""><figcaption></figcaption></figure>

* A message will show up in the middle of the screen indicating the filter task has been enqueued and you may return to the project analysis page. Click **OK** to close the message.
* Click on the analysis name on the top of the breadcrumb to go back to the Analyses page.

A **Filter counts** task node and a **Filtered samples** data node are added to the analysis task graph. After the task is completed, hover over the **Filtered samples** data node should show only 23 samples available for downstream analysis.

### PCA

The [principal components analysis (PCA)](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/exploratory-analysis/pca) scatter plot allows us to visualize similarities and differences between the samples in a dataset. To invoke a PCA task from the cleaned CpG data:

* Click on the **Filtered samples** node.
* Click **Exploratory analysis** section in the context-sensitive task menu.
* Click **PCA**.
* Set to use the top **100,000** features with the highest **variance** in calculation.
* Keep the rest of the parameters as default, and click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-25e0cbc35dbf5ca129ffe21434c2385aee5faa1e%2F5base1.1.PCAsitelevel.png?alt=media" alt=""><figcaption></figcaption></figure>

After the PCA task is completed, double click on the **PCA** node to view the PCA plot in a data viewer.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-fd697a047f524de2720b653d9d555be6daedd5a5%2F5base1.1.PCAplotCpG.png?alt=media" alt=""><figcaption></figcaption></figure>

* The scatter plot shows the samples distribution among the first three PCs. Each sample is a data point.
* The scree plot (top right panel) shows variance represented by each PC.
* The component loading table (bottom right panel) shows correlation between CpG methylation sites and PC.

### Detect Differentially Methylated Regions (DMRs)

[DSS ](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/statistics/correlation-analysis-2)(Dispersion Shrinkage for Sequencing data) enables the detection of differentially methylated regions using counts data at single nucleotide level. It uses beta-binomial distribution to model methylation counts at each CpG site and uses Wald test to identify differentially methylation loci (DML). Nearby DMLs are then merged into a region to form differentially methylated region (DMR). Set up a DSS task to identify DMRs between two AML subtypes:

* Click on the **Filtered samples** node.
* Click **Statistics** section in the context-sensitive task menu.
* Click **Differential Methylation**.
* Select **DSS** as the Method to use for differential methylation analysis, click **Next.**
* Add **Mol\_Phenotype** as a factor for analysis, click **Next.**

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-b799c0e1731b7f40e1e7fc8f51e4cc27fef6cf69%2F5base1.1.DSSaddFactor%20(2).png?alt=media" alt=""><figcaption></figcaption></figure>

* Drag **DNMT3AR882** to the top right Numerator box and **KMT2Ar** to the bottom right Denominator box. Click on **Add comparison**.
* Keep the rest of the parameters as default, and click **Finish** to start running DSS.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-107cee4c1283d1fddfb718dcf1268f6dfb6e819f%2F5base1.1.DSSsetComparison%20(1).png?alt=media" alt="" width="419"><figcaption></figcaption></figure>

When the DSS task is completed, double click on the **DNMT3AR882 vs KMT2Ar (DMR)** node to open the DMR report. The DSS DMR task report lists regions on rows and the test statistics (areaStat, diff.Methy, etc.) on columns. Regions are listed in descending order by the abs(areaStat) so that the most significant DMR is listed first. diff.Methy statistics reports the difference in average methylation between the two groups, negative value indicates **DNMT3AR882** is hypomethylated compared to **KMT2Ar** in the region, while positive value indicates **DNMT3AR882** is hypermethylated compared to **KMT2Ar** in the region. Refer to [DSS documentation](https://help.multiomics.illumina.com/icm/analyses/analysis-functionality/task-menu/statistics/correlation-analysis-2) to learn more about the differential methylation report.

On the DMR report, click on the **volcano icon** ( ![](https://help.multiomics.illumina.com/~gitbook/image?url=https%3A%2F%2F1449565646-files.gitbook.io%2F%7E%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252F2vobyLRkVGSSjQAGz1Ur%252Fuploads%252Fgit-blob-902f38d87e52780e8ffb2f82ffba390a58ea1487%252Fvolcano-icon.png%3Falt%3Dmedia\&width=300\&dpr=4\&quality=100\&sign=d6725558\&sv=2) ) next to the comparison name to visualize the DMRs result. In the volcano plot, methylation difference (diff.Methy) is represented by the x-axis, absolute value of areaStat (abs(areaStats)) is represented by the y-axis. Each data point in the plot is a DMR. Hypomethylated regions are colored in blue, while hypermethylated regions are colored in red. The hypo- and hypermethylation status is defined by methylation difference (diff.Methy on x-axis), default at -0.2 and 0.2, respectively. The thresholds are adjustable using the **Configure** tool in the left side toolbox within the plot, then select **Statistics**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-b3833155cdf46901700f5989524c2fe08c7b639d%2F5base1.1.DSSvolcano%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

Another option to visualize the DMRs result is using a Manhattan plot. To invoke a Manhattan plot, click on the Manhattan plot icon ( ![](https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d1576f244670b4fc0954deb16b1fa669df61a9a1%2F5base1.1.ManhattanIcon.png?alt=media) ) next to the comparison name in the DMR report. In a Manhattan plot, genomic positions from chromosome 1 to chromosome Y are represented by the x-axis, absolute value of areaStat (abs(areaStats)) is represented by the y-axis. Each data point is a DMR. The test statistics plotted at y-axis can be configured to linear or log scale, and a different test statistics can be selected to change the content in the Manhattan plot.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-84f57f4097f2a78df9f8b50c09fbb144c6313675%2F5base1.1.DSS_ManhattanPlot%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

### Filter DMRs

We recommend filtering DMRs by hypo- or hypermethylation status, using the diff.Methy statistics, to give the necessary context of which pathways are hypo- or hypermethylated from the differential comparison. To filter DMR results to hypermethylated DMRs,

* Click **DNMT3AR882 vs KMT2Ar (DMR)** node.
* Click **Filtering** section in the context-sensitive task menu.
* Click **Differential analysis filter**.
* Choose **Metadata** as Filter type.
* In Filter criteria section, set Filter features by **include DNMT3AR882 vs KMT2Ar: diff.Methy > 0.2**, then click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-1d89084e5fe0681ab847816a337dca8c048337ef%2F5base1.1.filterDSS.png?alt=media" alt=""><figcaption></figcaption></figure>

This generates a Filtered features list node that contains DMRs passing the filtering criteria. Same steps can be applied to generate a filtered list of hypomethylated DMRs, by setting filtering criteria to include regions with diff.Methy statistics < -0.2. The filtering threshold can be adjusted, more filtering criteria can be defined, based on your research questions.

### Annotate DMRs

Next, we are going to annotate the filtered DMRs list with gene information using an annotation model.

* Click **Filtered feature list** node.
* Click **Region analysis** section in the context-sensitive task menu.
* Click [**Annotate regions**](https://help.multiomics.illumina.com/icm/analyses/analysis-functionality/task-menu/region-analysis/annotate-regions).
* Select an **Annotation model**, keep the remaining settings as default, click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-55a05f8844db9c470f40c95e5139953c46cc2f96%2F5base1.1.DSSDMR_annotateRegions.png?alt=media" alt=""><figcaption></figcaption></figure>

When completed, double click **Annotated regions** node to open the annotation report. The annotation report shows a pie chart on gene section breakdown for the DMRs, and a table where each row is a DMR, columns are the annotated information.

* Click **Optional columns** on the top right of the table, tick **gene\_name** checkbox to display gene name in the table.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-11cb22233d819e8c19ecca3025ece8872d3fe0de%2F5base1.1.Annotated_Regions_report_DSSDMR%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

### Gene Set Enrichment

The [**Gene set enrichment**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/biological-interpretation/gene-set-enrichment) analysis identifies gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.

* Click **Annotated regions** node.
* Click **Biological interpretation** section in the context-sensitive task menu.
* Click **Gene set enrichment**.
* Select **KEGG database** as **Database** for pathway enrichment analysis. Choose **Homo sapiens hsa\_v12\_25\_04\_07** from the **KEGG database** dropdown.
* At Feature identifier section, tick **Select feature identifier** checkbox, select **gene\_name**, then click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d61d532154de0b5b3c30159ea3592ef08f7de5f8%2F5base1.1.KEGGsetUp%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

When completed, double click **Pathway enrichment** node to open the pathway enrichment report. Each row in the report is a pathway, with an enrichment score and p-value. It also lists how many genes in the pathway were in the input gene list and how many were not. Click on the gene set ID in the first column to view the pathway diagram. On the pathway diagram, clicking on a gene name links to a KEGG page for additional details.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d2080f5501ad3a268666f24a345bb1c2d7265556%2F5base1.1.DSSDMR_KEGG_Report%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-85707968a12dfc98ea90e2a8a51e64bb9af12940%2F5base1.1.DSSDMR_KEGG_diagram%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

## Analysis On 5-base DNA Methylation Data (Regions)

When you have a list of regions of interest that you want to analyze, for example, a list of methylation regions relevant to the phenotype in your study, you may use the steps in this section to generate regions data from the CpG sites data, then perform exploratory analysis, differential comparisons, and pathway analysis on the regions. DMRs identified from the regions can also be used for integration analysis with DNA variants data.

### Get Regional Methylation

The [**Get Regional Methylation**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/region-analysis/get-regional-methylation) task enables you to generate regions from CpG sites. For each region specified in a region annotation file, methylation counts of all CpG sites within the region are averaged to produce a count value in a regions-by-samples count matrix.

Because some AML subtypes in this demo data has only one sample per condition, we are going to filter the data to samples with at least 3 replicates per condition before we generate regions, to get a better clustering performance in exploratory analysis later. This step is optional and specific to this dataset only.

* Click on **Filtered samples** data node.
* Select **Filtering** from context-sensitive menu, select **Filter samples**.
* Set filter to include **Mol\_Phenotype** in the following AML subtypes, then click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d06f0e39db46230e0f561965ee7f85eb98689caf%2F5base1.1.filterSamplesForGRM.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

Next, we are going to use a regions list curated for AML to generate regions on the filtered samples. You will need to download a region (.bed) file attached in this section to your computer before continue with the steps.

* Click on data node downstream of **Filter observations** task.
* Select **Region analysis** from context-sensitive menu, select **Get Regional Methylation**.
* Select **Homo sapiens (human) - hg38** at **Assembly**, select **Add annotation model** at **Annotation model**.
* In the **Add annotation file** dialog, select **Homo sapiens (human)** at **Species**, **hg38** for **Assembly**, **Add annotation model** at **Annotation model**, type in a name (e.g. AML\_regions) at **Custom name**, select **Import annotation file** at **Creation options**, then click **Create**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-5c8f10096ed49c04dd3e1557122d906f1a85cacf%2F5base1.1.addGRMannotation.png?alt=media" alt=""><figcaption></figcaption></figure>

* At **Select file** page, select **My Computer**, click **+ Choose**, then browse and select the downloaded regions bed file from your computer, click **Next**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-0a08e33c9391d9e762b9780eda681a25f52f2549%2F5base1.1.GRMselectBed.png?alt=media" alt="" width="308"><figcaption></figcaption></figure>

* Click **Finish** to start generating regions based on the selected region file.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-80e49229f11e7b24034d6a0efd96e630ddc61408%2F5base1.1.startGRM.png?alt=media" alt="" width="465"><figcaption></figcaption></figure>

After the task is completed, you may hover over **Regional Methylation** data node to see the number of regions generated.

{% file src="<https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-2ab4f0a45f259ebccbcdec4a8211c8cd948bc951%2FAML_DMRs.bed?alt=media>" %}

{% hint style="info" %}
If you do not have your own regions list for Get Regional Methylation, you may use our built-in regions list that is based on human (hg38) promoter regions from ENSEMBL version 114 to explore the curated promoter regions, more details at [using a built-in promoter regions file](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/region-analysis/get-regional-methylation#using-a-built-in-promoter-regions-file).
{% endhint %}

### PCA and UMAP

[PCA ](https://help.multiomics.illumina.com/icm/analyses/analysis-functionality/task-menu/exploratory-analysis/pca)and [UMAP](https://help.multiomics.illumina.com/icm/analyses/analysis-functionality/task-menu/exploratory-analysis/umap) can be used to explore samples clustering pattern based on their methylation profile on the regions data.

To run PCA,

* Click on **Regional methylation** data node.
* Select **Exploratory analysis** from context-sensitive menu, select **PCA**.
* Use the default settings, click **Finish**.

After the task is completed, double-click on the **PCA** data node to open the PCA plot. We can see the samples are mostly separated by the AML subtypes, with some subtypes cluster closer to each other.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-726492b7f5015dfd0c1e0c8f393f1881ab5c80b1%2F5base1.1.GRM-pca.png?alt=media" alt=""><figcaption></figcaption></figure>

Non-linear dimensionality reduction methods such as UMAP and t-SNE can be used to further enhance the clustering. However, these methods often require parameters optimization. For UMAP, parameters such as the number of principal components and the number of nearest neighbors must be carefully tuned.

To run UMAP,

* Click on **PCA** data node.
* Select **Exploratory analysis** from context-sensitive menu, select **UMAP**.
* Set **PCs to use** to Top **4**.
* Under **Advanced options** section, click **Configure** at **Option set**, set **local neighborhood size** to **3**, then click **Apply**.
* Click **Finish** to run UMAP with the custom parameters.

{% hint style="warning" %}
The UMAP parameters applied here are optimized for this specific dataset. When you are analyzing your own data, you may run UMAP multiple times with different parameters settings to identify the parameter sets that is most suitable for your data.
{% endhint %}

After the task is completed, double-click on the **UMAP** data node to open the UMAP plot. With these UMAP parameters, we see stronger separation on the AML subtypes.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-54b8d89e5614e7147bee7312c59f011e38955f5d%2F5base1.1.GRM-umap.png?alt=media" alt=""><figcaption></figcaption></figure>

### Annotate regions

The regions generated from a given regions list can be annotated with genomic information such as gene name and gene ID to enable biological intepretation on the regions.

* Click **Regional methylation** node.
* Click **Region analysis** section in the context-sensitive task menu.
* Click [**Annotate regions**](https://help.multiomics.illumina.com/icm/analyses/analysis-functionality/task-menu/region-analysis/annotate-regions).
* Select **Homo sapiens (human) - hg38** at **Assembly**, **Ensembl Transcripts release 114** at **Annotation model**, keep the remaining settings as default, click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-cd45186cbe475ab84634cd06f892808bb0828e0f%2F5base1.1.AnnotateRegionsTaskSetUp.png?alt=media" alt=""><figcaption></figcaption></figure>

When completed, double click **Annotated regions** node to open the annotation report. The annotation report shows a pie chart on gene section breakdown for the regions, and a table where each row is a region, columns are the annotated gene information.

* Click **Optional columns** on the top right of the table, tick **gene\_name** checkbox to display gene name in the table.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-0f6f583976d00200b076bd65a77ebe129eb53f86%2F5base1.1.GRMAnnotateRegionsReport.png?alt=media" alt=""><figcaption></figcaption></figure>

### Detect Differential Methylation

The [**Detect Differential Methylation**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/methylation-analysis/detect-differential-methylation) task can be used for differential test on the regional data generated from the Get Regional Methylation task. It converts Beta-values to M-values and uses these to perform ANOVA differential expression analysis.

* Click on **Annotated regions** data node.
* Select **Statistics** from context-sensitive menu, select **Detect Differential Methylation**.
* Add **Mol\_Phenotype** as a factor for analysis, click **Next**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-adfd18a871205dbd9e9cb8133a080dfa72b5dbd3%2F5base1.1.DDM-addFactor.png?alt=media" alt=""><figcaption></figcaption></figure>

* Define and **add comparison**, then click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-435e664084a01769a72b6133924a90ded1a3f8d3%2F5base1.1.DDM-add-comparison.png?alt=media" alt="" width="380"><figcaption></figcaption></figure>

When completed, double click **DNMT3AR882 vs IDH** node to open the differential test report. Each row in the report is a region, sorted by p-values. Significance such as P-value and FDR step up are computed from the M-values. The LSMeans of the groups and the Difference are computed from the Beta-values.

Genomic annotations that were annotated to the data in previous Annotate regions task are carried over in the differential methylation report. Click **Optional columns** to show or hide the annotations from the report table. In screenshot below, we display gene name associated with each regions in the report table.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-5053620f7c2f58b9c915f59501b0b46df64baa7f%2F5base1.1.DDM-report%20(2).png?alt=media" alt=""><figcaption></figcaption></figure>

Click **Volcano icon** next to the comparison name, and icons buttons in **View** columns to open visualizations and additional details on the results. The volcano plot dashboard opens to a volcano plot, a differential result table, and dotplots for the top 4 results. From the volcano plot, we can see most of the AML regions in our data are hypomethylated in patients carrying DNMT3A R882 mutation.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-096ac21978601845c30e069134078b78e09ce1f1%2F5base1.1.DDM-volcano.png?alt=media" alt=""><figcaption></figcaption></figure>

### Filter Differential Results

Let's filter the differential results to a list of significant hypomethylated DMRs.

* Click **DNMT3AR882 vs IDH** node.
* Click **Filtering** section in the context-sensitive task menu.
* Click **Differential analysis filter**.
* Choose **Metadata** as Filter type.
* In Filter criteria section, set Filter features by **include DNMT3AR882 vs IDH: FDR <= 0.05 AND DNMT3AR882 vs IDH:Difference < -0.2**, then click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-953498b2ad82e4e92c0c0ad0accec343986ea34e%2F5base1.1.DDM-filterResult.png?alt=media" alt=""><figcaption></figcaption></figure>

This generates a Filtered features list node that contains DMRs passing the filtering criteria. The filtering criteria is adjustable based on your research questions.

### Gene Set Enrichment

With the list of significant DMRs, we can perform [**gene set enrichment**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/biological-interpretation/gene-set-enrichment) analysis to identify gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.

* Click **Filtered feature list** node.
* Click **Biological interpretation** section in the context-sensitive task menu.
* Click **Gene set enrichment**.
* Select **KEGG database** as **Database** for pathway enrichment analysis. Choose **Homo sapiens hsa\_v12\_25\_04\_07** from the **KEGG database** dropdown.
* At Feature identifier section, tick **Select feature identifier** checkbox, select **gene\_name**, then click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d61d532154de0b5b3c30159ea3592ef08f7de5f8%2F5base1.1.KEGGsetUp.png?alt=media" alt=""><figcaption></figcaption></figure>

When completed, double click **Pathway enrichment** node to open the pathway enrichment report. Each row in the report is a pathway, with an enrichment score and p-value. It also lists how many genes in the pathway were in the input gene list and how many were not. Click on the gene set ID in the first column to view the pathway diagram. On the pathway diagram, genes in our input filtered features list are colored, clicking on a gene name links to a KEGG page for additional details.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-4125b2e0923c67ae42d1a8dbc5b6427c3b1e925a%2F5base1.1.DDM-KEGGpathwayReport.png?alt=media" alt=""><figcaption></figcaption></figure>

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-860bc237cc664e7a3591b5d08fb598414f46fe02%2F5base1.1.DDM-KEGGpathwayMap.png?alt=media" alt=""><figcaption></figcaption></figure>

### Correlation Engine Pathway

The [**Correlation Engine Pathway**](https://help.multiomics.illumina.com/icm/analyses/analysis-functionality/task-menu/biological-interpretation/correlation-engine-pathway) is another tool available for biological interpretation of your result. It helps determine if your gene(s) of interest, identified by differential analysis in Connected Multiomics, corresponds to gene or protein sets from the GO consortium, MSigDB, TargetScan, and InterPro.

* Click **Filtered feature list** node.
* Click **Biological interpretation** section in the context-sensitive task menu.
* Click **Correlation Engine pathway**.
* Select **Homo sapiens** as **Organism**, **RNA expression** for **Data type**, fill in project information, click **Next**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-baac43e0553b87efc53880a4132204bc2ec55147%2F5base1.1.CEsetup.png?alt=media" alt=""><figcaption></figcaption></figure>

* Select a **contrast** of interest for Correlation Engine pathway analysis, click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-0ed993b47b031c3abe4972a6c0d5af7806f0686a%2F5base1.1.CEsetup2.png?alt=media" alt=""><figcaption></figcaption></figure>

When completed, double click **Correlation Engine** node to open the report. In the report table, each row is a pathway, ID in Gene set column is clickable to open more information from source database about the gene set. The Taxonomy column tells the database sources, while the Description column provides more information about the pathway. The result can be filtered by typing filter criteria in text box under each column name. For example, in screenshot below, we search for AML-related pathway by typing in "AML" at Description column. The top 30 pathways can be visualized by clicking **Open Data Viewer auto session** link on the top of the report table.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-92e04555da2fd4d829954749a3927be1a48f503a%2F5base1.1.CEreport.png?alt=media" alt=""><figcaption></figcaption></figure>

## Analysis On DNA Variants Data

Variants data should be filtered and annotated with genomic information such as gene ID and gene name prior to intergration with the DMRs. In this section, we described analysis tasks that can be performed on the variants data to prepare it for dual-omics analysis.

### Filter Variants

Filtering is recommended and subjective to the study. For examples, you may want to filter the data to remove variants with low variants quality, and/or filter to include a sample group of interest (e.g., samples involved in the DMRs comparison).

* Click on **Variants** data node.
* Select **Variant analysis** section in the context-sensitive task menu, select [**Filter variants**.](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/variant-analysis/filter-variants)
* Select correct **Assembly** for your variants dat&#x61;**.**
* Set filtering criteria based on annotation, samples, and/or variants, then click **Finish.** In this analysis, we are interested to look at variants in **DNMT3AR882** samples, so, we set the filter as follow:

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d773e6a1d43f57bbfcd19b64c0a7d2ada17574de%2F5base1.1.FilterVariants.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

When completed, double click **Variants** node downstream of **Filter variants** task to open the variant filter report. The variant filter report shows a table with number of variants passed the filter, failed the filter, and percentage of passed variants, per sample.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-2166c70749ed498bc13098043eeeda73f1cbe8d0%2F5base1.1.VariantFilterReport.png?alt=media" alt="" width="545"><figcaption></figcaption></figure>

{% hint style="warning" %}
If too many variants are filtered out, you might want to redo the filter and use a more lenient filtering criteria.
{% endhint %}

### Summarized Cohort Mutations

Variants information is stored on a per sample basis, the [**Summarize cohort mutations**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/variant-analysis/summarize-cohort-mutations) task enables you to create a summary of variants across samples in the input data node. The summary can be informative to identify both the frequency of variants and the samples that share a particular variant.

* Click on **Variants** data node.
* Select **Variant analysis** section in the context-sensitive task menu, select **Summarized cohort mutations**.

When completed, double-click on the **Summarized variants** node to open the summary report. The report opens to a table where each row is a variant, additional columns can be selected to show in the table using Optional columns button. The table can be filtered by typing in filtering criteria in text box under each column name. Refer the [Summarized cohort mutation documentation](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/variant-analysis/summarize-cohort-mutations) for more details about the task and its output.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-6185f75eb99cbcb464f9cefe405ae06781115e45%2F5base1.1.summarizedCohortMutations.png?alt=media" alt=""><figcaption></figcaption></figure>

### Remove Background Mutations

The [**Remove background mutations** ](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/variant-analysis/remove-background-mutations)task is a variants filtering task that enables you to filter variants against built-in databases, to clean up noises such as common polymorphisms and benign mutations. The available built-in databases are:

* **Primate AI**: Prediction of pathogenicity on missense mutations. Percentile score ranges from 0 to 1, with 0 being benign, 1 being most pathogenic.
* **Promoter AI**: Prediction of expression-altering consequences of variants at promoter regions. Scores range from -1 to 1. Positive score indicates variant likely enhances transcription, negative score indicates variant likely repress transcription.
* **gnomAD**: The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. Filtering by minor allele frequency is supported.
* **DRAGEN Haplotype Database**: Proprietary haplotype database built from a panel of 256 population haplotypes. Filtering by minor allele frequency is supported.

{% hint style="warning" %}
The databases in the Remove background mutations task support only human hg38 assembly.
{% endhint %}

To invoke the **Remove background mutations** task,

* Click **Summarized variants** data node.
* Click **Variant analysis** section in the context-sensitive task menu.
* Click **Remove background mutations**.
* Select a **database** for filtering, click **Next**. In this analysis, we want to filter out common variants from a population database and focus on the less common ones, we select **gnomAD** database for filtering.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d3239ec1817bd4bd2cc4a463dc7f1cfb0675efa7%2F5base1.1.selectDBforRemoveBackground.png?alt=media" alt="" width="527"><figcaption></figcaption></figure>

* Use the default **filtering criteria**, then click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-2817cd2b60a3eef85e143b7c217a365a05def7a9%2F5base1.1.gnomADfilterThreshold.png?alt=media" alt=""><figcaption></figcaption></figure>

If you need to filter variants on more than one databases, run the **Remove background mutations** tasks sequentially.

### Annotate Variants

Variants must be annotated so that it is associated with gene ID and gene name, that are both required for dual-omics analysis with the DMRs.

* Click **Variants** data node.
* Click **Variant analysis** section in the context-sensitive task menu.
* Click [**Annotate variants**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/variant-analysis/annotate-variants).
* Select **Homo sapiens (human) - hg38** at **Assembly**.
* Tick **Annotate with genomic features**, select **Ensembl Transcripts release 114** at **Annotation model**, click **Finish**.

{% hint style="warning" %}
It is recommended to use the same annotation model to annotate both the DMRs and the variants, because the integration step uses gene ID from the annotation when combining the data. For example, if you have annotated DMRs using Ensembl annotation model, you should use the same Ensembl annotation model to annotate variants.
{% endhint %}

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-15f2636b42a0ef7e1f9226d4520f259c6acf978f%2F5base1.1.AnnotateVariants.png?alt=media" alt=""><figcaption></figcaption></figure>

When completed, double-click on **Annotated variants** data node to open report. You may use Optional column button to show/hide annotations in the table.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-524bd83c02d9bde4efbe7bda446e012db06bb5e4%2F5base1.1.AnnotateVariantsReport.png?alt=media" alt=""><figcaption></figcaption></figure>

## Dual-omics Analysis of DNA Methylation and Variants

### Combine DMRs And Variants For Analysis

With annotated DMRs and annotated variants, we are ready to combine both data types for dual-omics analysis.

* Click **Annotated variants** data node.
* Click **Combine multiomics data** section in the context-sensitive task menu.
* Click [**Combine 5-base methylation and variant data**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/combine-multiomics-data/combine-5-base-methylation-and-variant-data).
* Click **Select data node** under **Methylation (5-base)**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-7e15a93d90c60438786173e2e21a6deeb0cbd5c1%2F5base-variant-dialog.png?alt=media" alt=""><figcaption></figcaption></figure>

* In Select data node pop-up, nodes that are eligible for selection to combine with variants are highlighted. Click **Annotated regions** data node from the DSS DMR analysis branch, then click **Select.**

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-d93cd08b085d375ff11f39bf423f41a17fcf3121%2F5base1.1.select5baseNodeToCombineWithVariant.png?alt=media" alt=""><figcaption></figcaption></figure>

* Use the default **Intersection** option to combine the data, click **Finish**.
  * Intesection: Use gene ID to intersect DMRs and variants. Every intersected variant and DMR are reported.
  * Union: Report intesection result and also variants that overlapped DMR by specified distance (default is +/- 1000bp).

When complete, double-click on **5-base methylation and variants** data node to open result. The task report is similar to the **Summarized cohort mutations** task report, with an additional column **Origin** to indicates if result in a row is present in both methylation and variant data (VC, DMR), or variant data only (VC), or methylation data only (DMR). The table can be filtered by typing in search criteria in text boxes under each column name. For example, you can filter the result to a gene of interest by typing in gene name in Gene Name text box.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-7289944dfc2ea0695ed923a72fc94922456ff9fd%2F5base1.1.combineDMRVariantReport%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>

### Gene Set Enrichment On The Combined Data

After combining methylation and variant data, [**gene set enrichment analysis**](https://help.connected.illumina.com/icm/analyses/analysis-functionality/task-menu/biological-interpretation/gene-set-enrichment) can be used to identify gene sets and pathways overrepresented from the combined data, deriving biological meaning of your results.

* Click **5-base methylation and variants** node.
* Click **Biological interpretation** section in the context-sensitive task menu.
* Click **Gene set enrichment**.
* Select **KEGG database** or **Gene set database** for analysis. KEGG database for pathway analysis, gene set database for gene ontology (GO) analysis.
* At Feature identifier section, tick **Select feature identifier** checkbox, select G**ene Name**, then click **Finish**.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-a6a478c8b1faa0dfbdb0ff69f81e986b8abad10b%2F5base1.1.KEGGsetUp_combinedResult.png?alt=media" alt=""><figcaption></figcaption></figure>

When completed, double-click **Pathway enrichment** node to open result. Clicking on the individual gene set lD to see more details about a pathway.

* The result can be filtered by typing in search criteria in text boxes under each column name. For example, filter the report by typing in "leukemia" in the text box under Description column to search for leukemia-related pathways.

<figure><img src="https://707781091-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FL2tTN7buOERM9NKPYYlg%2Fuploads%2Fgit-blob-4a9146d8d8e73a12b19e7051b66a8edff2eeca5e%2F5base1.1.leukemia%20(1).png?alt=media" alt=""><figcaption></figcaption></figure>
