Connected Multiomics Walkthrough

Connected Multiomics is available for tertiary analysis of the Illumina 5-Base DNA Prep data.

Getting Started

Logging into ICM

Upload Data Files

Creating a Study from a ICA Project

Viewing Results and Navigating in ICM

Demo Data

Demo data that can be used to follow along with this walkthrough is found in the Connected Multiomics Demo Data repository. The dataset can be found at /Multiomics-Demo-Data/Methylation/Illumina 5-base-solution. This demo data consists of 6 samples, from two phenotype groups. In this walkthrough, we outline analysis steps that can be performed to explore the data, identify differentially methylated regions between the two sample groups, and find pathways overrepresented in the differential test result.

We need 5 files per sample to analyze the DNA Methylation Prep data in the Connected Multiomics software. Add the following 5 files for each sample from the demo data folder to a study prior to starting an analysis:

  • <sample name>.CX_report.txt.gz

  • <sample name>.methyl_metrics.csv

  • <sample name>.mapping_metrics.csv

  • <sample name>.wgs_coverage_metrics.csv

  • <sample name>.M-bias.txt

These files are generated from DRAGEN analysis. CX_report file is the key output file that contains methylation reads count at single nucleotide level. The metrics files and the M-bias file contain QC metrics for reads mapping quality and methylation calling, which will be used to generate visualizations in 5-base Methylation QC task in the Connected Multiomics.

Default 5-base Methylation analysis

The default 5-base methylation analysis runs a pre-defined pipeline on samples and presents analysis results in visualizations in a data viewer.

Creating a Default Analysis

After all 6 samples are added to a study, follow these steps to create a Default analysis in the Connected Multiomics:

  • Click on + New Analysis.

  • In the pop-up window, provide a name for the analysis, select Default: Illumina 5-base DNA Methylation as the Analysis Type, choose a sample group to be included in the analysis (all samples option is selected by default), and click on the Run Analysis button.

  • Refresh the page to get the latest status of the analysis.

View Default Analysis Results

When the analysis status is Complete, click on the analysis tile to open results. The analysis opens into a data viewer that consists of PCA analysis results and methylation QC plots:

The plots can be configured by selecting Configure option from toolbox on the left within each plot:

  • To change color of the data points, select Configure > Style > Color by

  • To change size of the data points, select Configure > Style > Point size

View Default Analysis Pipeline

To view the default analysis pipeline, click on the Analysis name on breadcrumb on top of the data viewer to go to Analyses page. The default analysis pipeline looks like this:

The default analysis imports the methylation data, creates methylation QC report, generates regions from CpG sites, then use exploratory analysis methods such as Principal Components Analysis (PCA), and k-means clustering to explore the data.

To go back to the visualizations, double-click on the Summary report task node.

Custom 5-base Methylation Analysis

The custom 5-base methylation analysis imports the methylation data. Users will use context-sensitive menu to run analysis task individually and eventually build an analysis pipeline. In this section, we outline analysis steps for a workflow shown below:

Creating a Custom Analysis

After all 6 samples are added to a study, follow these steps to create a Custom analysis in the Connected Multiomics:

  • Click on + New Analysis.

  • In the pop-up window, provide a name for the analysis, select Custom: Illumina 5-base DNA Methylation as the Analysis Type, choose a sample group to be included in the analysis (all samples option is selected by default), and click on the Run Analysis button.

  • Refresh the page to get the latest status of the analysis.

When the Status is Complete, click on the analysis tile to enter the analysis module. After the Import cohort task is completed, the first data node called 5-base Methylation is generated. To review the number of samples and features, hover over the 5-base Methylation data node. Features refer to CpG sites.

The 5-base Methylation data node contains raw methylated counts and unmethylated counts for postions present in the CX reports. For positions with methylation calls on both strands in the CX report, the strands are collapsed such that position on the postive strand is used and the methylation counts are summed. This data node also contains percent methylation levels which will be used in exploratory analysis such as Principal Component Analysis (PCA).

Add Sample Metadata

We use Metadata tab within an analysis to manage sample metadata. Follow these steps to create a new sample attribute called sampleGroup, and assign attribute value to each sample:

  • Click on Metadata tab. In Sample attributes menu on the left, click Manage.

  • In the Manage sample attributes page, click Add new attribute. Type in sampleGroup in the Name text box, click Add.

  • Click + button to add two category values A, B to the sampleGroup attribute.

  • Click Back to metadata tab.

  • Click Assign values under Sample attributes. Use dropdown at each sample to assign a category value for sampleGroup attribute. Assign value for each sample as screenshot below.

  • Click Apply changes to save the assigned values.

When there are many samples or many sample attributes in the dataset, it is more efficient to use a sample metadata file to associate samples with metadata information. The metadata file must has a column named 'SampleID' with values identical to samples ID in the Samples page in the Connected Multiomics, and saved as a .tsv or .csv file extension. For more details, please refer to Sample Metadata documentation.

5-base Methylation QC

The 5-base methylation QC task in the Connected Multiomics enables you to visualize sample-level QC metrics that describe reads mapping quality and CpG methylation calling. The QC metrics are extracted from the DRAGEN analysis metric files that were ingested into the study. To invoke the 5-base methylation QC task:

  • At Analyses page, click on the 5-base Methylation node.

  • Click QA/QC section in the context-sensitive task menu on the right.

  • Click 5-base methylation QC.

After the 5-base methylation QC task is completed, double-click on the task node to open the QC report in a data viewer. The QC report consists of plots and tables organized in 2 sheets. Click sheet name at the bottom of the data viewer to navigate from one sheet to another.

  • Sheet Metrics shows sample-level QC metrics plot. Each sample is a data point, they are randomly spread out on x-axis. The QC metric is represented by y-axis. Each plot is overlay with a violin plot to show distribution of the QC metrics.

    • Percent methylation in samples: Percentages of CpG methylation in samples.

    • Percent methylation in unmethylated control: Percentage of CpG methylation in the unmethylated control (lambda). Low value indicates good quality.

    • Percent methylation in methylated control: Percentage of CpG methylation in the methylated control (pUC19). High value indicates good quality.

    • Percent duplicate reads: Percentage of duplicate marked reads, as a result of PCR amplification.

    • Percent mapped reads: Percentage of mapped reads, indicate the alignment rate.

    • Average autosomal coverage: Mean autosomal coverage across the whole genome. Higher coverage indicates the counts of methylated/unmethylated more accurately reflects the true methylation amount at any particular site.

    • QC metrics table: Text representations of the QC metrics plots.

  • Sheet M-bias shows M-bias plots for methylation level and coverage across genomic positions on read1 and read2. The M-bias should be consistent across all positions on the reads. It is common for the first/last 10 bases to have un-even methylation due to end-repair and sequencing artifacts.

PCA

The principal components analysis (PCA) scatter plot allows us to visualize similarities and differences between the samples in a dataset. To invoke a PCA task:

  • Click on the 5-base Methylation node.

  • Click Exploratory analysis section in the context-sensitive task menu.

  • Click PCA.

  • Set to use the top 100,000 features with the highest variance in calculation.

  • Keep the rest of the parameters as default, and click Finish.

After the PCA task is completed, double click on the PCA node to view the PCA plot in a data viewer.

  • The scatter plot shows the data distribution among the first three PCs. Each sample is a data point.

  • The scree plot (top right panel) shows variance represented by each PC.

  • The component loading table (bottom right panel) shows the correlation between CpG methylation sites and PC.

Detect Differentially Methylated Regions (DMRs)

DSS (Dispersion Shrinkage for Sequencing data) enables the detection of differentially methylation regions using counts data at single nucleotide level. It uses beta-binomial distribution to model methylation counts at each CpG site and uses Wald test to identify differentially methylation loci (DML). Nearby DMLs are then merged into a region to form differentially methylated region (DMR). Set up a DSS task to identify DMRs between two sample groups:

  • Click on the 5-base Methylation node.

  • Click Statistics section in the context-sensitive task menu.

  • Click Differential Methylation.

  • Select DSS as the Method to use for differential methylation analysis, click Next.

  • Select sampleGroup as factor for analysis, click Next.

  • Drag A to the top right Numerator box and B to the bottom right Denominator box. Click on Add comparison.

  • Keep the rest of the settings as default, then click Finish.

When the DSS task is completed, double click on the A vs B (DMR) node to open the DMR report. The DSS DMR task report lists regions on rows and the test statistics (areaStat, diff.Methy, etc.) on columns. Regions are listed in descending order by the abs(areaStat) so that the most significant DMR is listed first. diff.Methy statistics reports the difference in average methylation between the two groups, negative value indicates A is hypomethylated compared to B in the region, while positive value indicates A is hypermethylated compared to B in the region. Refer to DSS documentation to learn more about the differential methylation report.

On the DMR report, click on the volcano icon ( ) next to the comparison name to open a differential methylation plot in a Data Viewer. In the volcano plot, methylation difference (diff.Methy) is represented by the x-axis, absolute value of areaStat (abs(areaStats)) is represented by the y-axis. Each data point in the plot is a region. Hypomethylated regions are colored in blue, while hypermethylated regions are colored in red. The hypo- and hypermethylation status is defined by methylation difference (diff.Methy on x-axis), default at -0.2 and 0.2, respectively. The thresholds are adjustable using Configure tool in the left side toolbox within the plot, then select Statistics.

Filter DMRs

We recommend filtering DMRs by hypo- or hypermethylation status, using the diff.Methy statistics, to give the necessary context of which pathways are hypo- or hypermethylated from the differential comparison. To filter DMR results to hypermethylated DMRs,

  • Click A vs B (DMR) node.

  • Click Filtering section in the context-sensitive task menu.

  • Click Differential analysis filter.

  • Choose Metadata as Filter type.

  • In Filter criteria section, set Filter features by include A vs B: diff.Methy > 0.2, then click Finish.

This generates a Filtered features list node that contains DMRs passing the filtering criteria. Same steps can be applied to generate a filtered list of hypomethylated DMRs, by setting filtering criteria to include regions with diff.Methy statistics < -0.2. The filtering threshold can be adjusted, more filtering criteria can be defined, based on your research questions.

Annotate DMRs

Next, we are going to annotate the filtered DMRs list with genes information using an annotation model.

  • Click Filtered feature list node.

  • Click Region analysis section in the context-sensitive task menu.

  • Select an Annotation model, keep the remaining settings as default, click Finish.

When completed, double click Annotated regions node to open the annotation report. The annotation report shows a pie chart on gene section breakdown for the DMRs, and a table where each row is a DMR, columns are the annotated gene information.

  • Click Optional columns on the top right of the table, tick gene_name checkbox to display gene name in the table.

Gene Set Enrichment

The Gene set enrichment analysis identifies gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.

  • Click Annotated regions node.

  • Click Biological interpretation section in the context-sensitive task menu.

  • Click Gene set enrichment.

  • Select KEGG database as Database for pathway enrichment analysis. Choose Homo sapiens hsa_v12_25_04_07 from the KEGG database dropdown.

  • At Feature identifier section, tick Select feature identifier checkbox, select gene_name, then click Finish.

When completed, double click Pathway enrichment node to open the pathway enrichment report. Each row in the report is a pathway, with an enrichment score and p-value. It also lists how many genes in the pathway were in the input gene list and how many were not. Click on the gene set ID in the first column to view the pathway diagram. On the pathway diagram, click on a gene name links to KEGG page for additional details.

Get Regional Methylation

The Get Regional Methylation task enables you to generate regions from CpG methylation count data. For each region specified in a region annotation file, methylation counts of all CpG sites within the region are averaged to produce a count value in a regions-by-samples count matrix.

  • Click on 5-base Methylation data node.

  • Select Region analysis from context-sensitive menu, select Get Regional Methylation.

  • Select an annotation model from Assembly and Annotation model drop-down, then click Finish.

A built-in annotation model based on human (hg38) promoter regions from ENSEMBL version 114 is provided for Get Regional Methylation. If you have a list of regions of interest, you may upload the custom region (.bed) file.

After the task is completed, you may hover over Regional Methylation data node to see the number of regions generated.

Detect Differential Methylation

The Detect Differential Methylation task can be used for differential test on the regional data generated from the Get Regional Methylation task. It converts Beta-values to M-values and uses these to perform ANOVA differential expression analysis.

  • Click on Get Regional Methylation data node.

  • Select Methylation analysis from context-sensitive menu, select Detect Differential Methylation.

  • Add sampleGroup as factor for analysis, click Next.

  • Define and add comparison, then click Finish.

When completed, double click A vs B node to open the differential test report. Each row in the report is a region, sorted by p-values. Significance such as P-value and FDR step up are computed from the M-values. The LSMeans of the groups and the Difference are computed from the Beta-values.

Click Volcano icon next to the comparison name, and icons buttons in View columns to open visualizations and additional details on the results.

With the differential test result, you may filter the result based on differential test statisctics, annotate the regions with gene names, and perform gene set enrichment analysis to identify potential pathways.

Dual-omics Analysis With Variants Data

If variant calling was performed during the DRAGEN analysis, the variants data can be imported into the Connected Multiomics for integration analysis with DMRs to generate dual-omics insights.

Required Files And Analysis Creation

When there are both methylation data and variant data, 7 files with files extension listed below are required per sample - the first 5 files for methylation data, the last 2 files for variants data:

  • .CX_report.txt.gz

  • .methyl_metrics.csv

  • .mapping_metrics.csv

  • .wgs_coverage_metrics.csv

  • .M-bias.txt

  • .vcf.gz

  • .vcf.gz.tbi

Follow the steps in Creating a Custom Analysis to create a custom analysis. After the data is imported into a custom analysis, there should be an Import cohort task node and 2 data nodes - 5-base methylation data node for methylation data, Variants data node for variants data.

Processing The 5-base Methylation Data

Analyses that can be performed at the 5-base methylation data are described in the Custom 5-base Methylation Analysis section. You can QC the data, explore the data, make comprisons to detect DMRs, filter the differential test results, and annotate the DMRs with genomic information such as gene ID and gene name. The annotated DMRs will be used as input for dual-omics analysis with variants data.

Processing The Variants Data

Variants data should be filtered and annatated with genomic information such as gene ID and gene name prior to intergration with DMRs. In this section, we desribed analysis tasks that can be performed on the variants data to prepare it for dual-omics analysis.

Filter Variants

Filtering is recommended and subjective to the study. For examples, you may want to filter the data to remove variants with low variants quality, and/or filter to include sample group of interest (e.g., samples involved in the DMRs comparison).

  • Click on Variants data node.

  • Select Variant analysis section in the context-sensitive task menu, select Filter variants.

  • Select correct Assembly for your variants data.

  • Set filtering criteria based on annotation, samples, and/or variants, then click Finish.

When completed, double click Variants node downstream of Filter variants task to open the variant filter report. The variant filter report shows a table with number of variants pass, failed the filter, and percentage of passed variants, per sample.

Summarized Cohort Mutations

Variants information is stored on a per sample basis, the Summarize cohort mutations task enables you to create a summary of variants across samples in the input data node. The summary can be informative to identify both the frequency of variants and the samples that share a particular variant.

  • Click on Variants data node.

  • Select Variant analysis section in the context-sensitive task menu, select Summarized cohort mutations.

When completed, double-click on the Summarized variants node to open the summary report. The report opens to a table where each row is a variant, additional columns can be selected to show in the table using Optional columns button. The table can be filtered by typing in filtering criteria in text box under each column name. Refer the Summarized cohort mutation documentation for more details about the task and its output.

Remove Background Mutations

The Remove background mutations task is a variants filtering task that enables you to filter variants against built-in databases, to clean up noise from common polymorphisms and benign mutations. The available built-in databases are:

  • Primate AI: Prediction of pathogenicity on missense mutations. Percentile score ranges from 0 to 1, with 0 being benign, 1 being most pathogenic.

  • Promoter AI: Prediction of expression-altering consequences of variants at promoter regions. Scores range from -1 to 1. Positive score indicates variant likely enhances transcription, negative score indicates variant likely repress transcription.

  • gnomAD: The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. Filtering by minor allele frequency is supported.

  • DRAGEN Haplotype Database: Proprietary haplotype database built from a panel of 256 population haplotypes. Filtering by minor allele frequency is supported.

To invoke the Remove background mutations task,

  • Click Summarized variants data node.

  • Click Variant analysis section in the context-sensitive task menu.

  • Click Remove background mutations.

  • Select a database for filtering, click Next.

  • Set filtering criteria, then click Finish.

If you need to filter variants on more than one databases, run the Remove background mutations tasks sequentially.

Annotate Variants

Variants must be annotated so that it is associated with gene ID and gene name, that are both required in dual-omics analysis.

  • Click Variants data node.

  • Click Variant analysis section in the context-sensitive task menu.

  • Select Assembly.

  • Tick Annotate with genomic features, select an Annotation model, click Finish.

When completed, double-click on Annotated variants data node to open report. You may use Optional column button to show/hide annotations in the table.

Combine DMRs And Variants For Analysis

With annotated DMRs and annotated variants, we are ready to combine both data types for dual-omics analysis.

  • Click Annotated regions data node from the methylation analysis branch, then click Select.

  • Select intersection or union to combine methylation and variant data, then click Finish.

    • Intesection: Use gene ID to intersect DMRs and variants. Every intersected variant and DMR are reported.

    • Union: Report intesection (by gene ID) result and also variants that overlap DMR by specified distance (default is +/- 1000bp).

When complete, double-click on 5-base methylation and variants data node to open result. The task report is similar to the Summarized cohort mutations task report, with an additional column Origin to indicates if result in a row is present in both methylation and variant data (VC, DMR), or variant data only (VC), or methylation data only (DMR).

Gene Set Enrichment On The Combined Data

After combining methylation and variant data, gene set enrichment analysis can be used to identify gene sets and pathways overrepresented from the combined data, deriving biological meaning of your results.

  • Click 5-base methylation and variants node.

  • Click Biological interpretation section in the context-sensitive task menu.

  • Click Gene set enrichment.

  • Select KEGG database or Gene set database for analysis. KEGG database for pathway analysis, gene set database for gene ontology (GO) analysis.

  • Click Finish.

When completed, double-click Pathway enrichment node to open result. Click on the individual gene set lD to see more details about the pathway.

  • Filter the report (e.g., Enrichment score > 5) to less than 100 rows.

  • When the table has less than 100 rows, Data viewer plots are enabled. Click View plots in Data Viewer to display the pathways in a bar chart and scatterplot.

Complete Task Graph

Task graph below shows the completed analysis. Blue indicates 5-base DNA methylation specific analysis. Orange indicates variants specific analysis. Green indicates analysis using both methylation and variants.

Last updated

Was this helpful?