Exploring the data with PCA
Last updated
Last updated
Principal component analysis (PCA) is a way to explore the overall similarity between samples, visualize possible groupings within the data set, and detect outliers.
Select PCA Scatter Plot from the QA/QC
Figure 1. Principal component analysis showing total allele intensities of normal (blue) and cancer (red) samples. Each dot represents a single sample.
Each dot on the plot corresponds to a single sample and can be thought of as a summary of all normalized marker intensities for the sample. The first categorical column is used to color the plot; here, tumor samples are shown in red and normal samples are shown in blue.
To better view the data, we can rotate the plot.
Click and drag to rotate the plot
Rotating the plot allows us to look for outliers in the data on each of the three principal components (PC1-3). The percentage of the total variation explained by each PC is listed by its axis label. The chart label shows the sum percentage of the total variation explained by the displayed PCs.
We can see that the peripheral blood samples (normal) cluster together whereas the cancer tissue samples (tumor) are more dispersed and show considerable variability. This corresponds well with the known genomic variability of cancer cells.
To view the similarity of paired normal and tumor samples from the same patient, we can connect dots by Subject ID.
Select 4. SubjectID from the Connect by drop-down menu in the upper right-hand corner of the plot tab
Paired tumor and normal samples are now connected by lines, illustrating the range of differences between normal and tumor copy number in the data set (Figure 2).
Figure 2. Lines connect paired tumor and normal samples
If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.
Select to activate Rotate Mode