LogoLogo
Illumina KnowledgeIllumina SupportSign In
  • Home
  • Release Notes
  • Correlation Engine FAQ
    • Cufflinks Assembly & DE output conversion to bioset
    • RNA-Express output conversion to bioset
    • RNA-Seq alignment conversion to Correlation Engine Biosets
    • BaseSpace ChIP-Seq output conversion to bioset
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Correlation Engine FAQ

Cufflinks Assembly & DE output conversion to bioset

PreviousCorrelation Engine FAQNextRNA-Express output conversion to bioset

Last updated 3 days ago

Was this helpful?

The most straightforward way to create a bioset for import into Correlation Engine is to go to the Gene Browser section, set the Significant filter option to “True”, and click on the Save Filtered Table link at the bottom section. The “True” setting applies a q-value cut-off of 0.05. The user may want to save an unfiltered version. Curators apply an FDR analysis to determine if a particular bioset should be tagged as “Below threshold significance” and excluded from calculations in Correlation Engine.

Example content of this file is shown in the table below.

Test ID
Gene
Locus
Status
log2(FFPESample1 FPKM)
log2(FFPESample2 FPKM)
log2(Ratio)
q Value
Significant

A2ML1

A2ML1

chr12:8975149-9029381

OK

0.73

-10

-10.73

1.68E-04

TRUE

ABHD12B

ABHD12B

chr14:51338877-51371688

OK

-1.52

-10

-8.48

1.68E-04

TRUE

ACKR1

ACKR1

chr1:159173802-159176290

OK

1.17

-10

-11.17

1.68E-04

TRUE

The following table lists the columns to be extracted, new column headers, and any recommended transformations:

Original header
New header
Transformation

Gene

Gene name

None

log2(Ratio)

Log2 fold change

Remove values between -0.2630344 to 0.2630344 or log2(1/1.2) to log2(1.2)

log2(<control> FPKM)

Control expression

Unlog values

log2(<test> FPKM)

Test expression

Unlog values

q Value

q-value

None

The fpkm column headers will vary according the names of the test and control groups as designated by users. Correlation Engine biosets normally report these values in unlogged format so for consistency we recommend transforming the data prior to upload. Since the q-value cutoff has already been applied, applying the fold change cut-off is the last data quality step.

An optional step to perform at this point is to rename the gene names to refseq identifiers. Correlation Engine matches RNA-Seq biosets to the correct platform model based on species-specific refseq identifiers. This ensures that best statistics are used for correlation calculations. Skipping this step results in the bioset being treated as a custom platform. Note:

  1. Files can be provided to human, mouse, and rat genomes respectively for re-mapping

  2. If users wish to upload RNA-seq data from other species supported in Correlation Engine, gene names should be used as is and they will be ingested as custom platforms

It is advisable to add information as a header to the data table in a bioset. This informs other users of the details around the processing and group identification. Below is a listing of the content Correlation Engine normally provides and following is the layout.

  1. The Bioset summary is the same as the study title

  2. Comparison: restates the comparison using full group names

  3. Data pre-processing: fixed text

  4. Analysis summary: modified according to species

  5. Test expression: Test group name with sample count

  6. Control expression: Control group name with sample count

  7. Internal ID - <studyID>: Typically the GEO series number. Based on this ID, biosets can be matched with their Internal ID files

Bioset summary =

Comparison = <test group> v. <control group>

Data pre-processing = FASTQ files were downloaded from Sequence Read Archive (SRA). No other preprocessing was performed.

Analysis summary = Alignment genome, software used, gene identifiers used, cut-offs applied, etc.

Test expression - Median FPKM expression in <test group> (total replicates = #)

Control expression - Median FPKM expression in <control group> (total replicates = #)

Internal ID - <>

Gene name Fold change Control expression Test expression q-value

Import the finalized bioset thought the Import UI, select ranking on absolution fold change descending.

Notes: Not all genes in the cufflinks output will be recognized by the Correlation Engine gene tables, particularly some miRNAs and lincRNAs.