LogoLogo
Illumina KnowledgeIllumina SupportSign In
  • Home
  • Release Notes
  • Correlation Engine FAQ
    • Cufflinks Assembly & DE output conversion to bioset
    • RNA-Express output conversion to bioset
    • RNA-Seq alignment conversion to Correlation Engine Biosets
    • BaseSpace ChIP-Seq output conversion to bioset
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Correlation Engine FAQ

RNA-Seq alignment conversion to Correlation Engine Biosets

PreviousRNA-Express output conversion to biosetNextBaseSpace ChIP-Seq output conversion to bioset

Last updated 3 days ago

Was this helpful?

There are two options to transfer data to Correlation Engine from the outputs of the BaseSpace RNA-Seq alignment app. One, you can download the entire analysis in the project page.

2

1

1

Second, select the sample of interest and scroll down to download the FPKM or gene counts file.

2

The following table is an example of the Reference FPKM values for genes. The user should filter according to desired status and FPKM cut-offs (recommended at least >0). A new file should be created extracting the tracking_id and FPKM columns.

tracking_id
class_code
nearest_ref_id
gene_id
gene_short_name
tss_id
locus
length
coverage
FPKM
conf_lo
conf_hi
status

A1BG

-

-

A1BG

A1BG

TSS12895

chr19:58858171-58874214

-

-

1.30307

1.30307

1.30307

OK

A1BG-AS1

-

-

A1BG-AS1

A1BG-AS1

TSS15204

chr19:58858171-58874214

-

-

0.328392

0.328392

0.328392

OK

A1CF

-

-

A1CF

A1CF

TSS27441

chr10:52559168-52645435

-

-

0

0

0

OK

A2M

-

-

A2M

A2M

TSS12287

chr12:9217772-9268558

-

-

465.75

465.75

465.75

OK

A2M-AS1

-

-

A2M-AS1

A2M-AS1

TSS19734

chr12:9217772-9268558

-

-

1.52412

1.52412

1.52412

OK

A2ML1

-

-

A2ML1

A2ML1

TSS32184,TSS6636

chr12:8975149-9029381

-

-

4.83395

4.83395

4.83395

OK

A2MP1

-

-

A2MP1

A2MP1

TSS26806

chr12:9381128-9386803

-

-

0.0701295

0.0701295

0.0701295

OK

A3GALT2

-

-

A3GALT2

A3GALT2

TSS700

chr1:33772366-33786699

-

-

0

0

0

OK

A4GALT

-

-

A4GALT

A4GALT

TSS7799

chr22:43088126-43116876

-

-

9.15569

9.15569

9.15569

OK

And renaming them to Gene name and Test expression accordingly

tracking_id
FPKM

A1BG

1.30307

A1BG-AS1

0.328392

A1CF

0

A2M

465.75

A2M-AS1

1.52412

A2ML1

4.83395

A2MP1

0.0701295

A3GALT2

0

A4GALT

9.15569

Gene name
Test expression

A1BG

1.30307

A1BG-AS1

0.328392

A1CF

0

A2M

465.75

A2M-AS1

1.52412

A2ML1

4.83395

A2MP1

0.0701295

A3GALT2

0

A4GALT

9.15569

The reference gene counts file comes without headers. Before upload, add the heads Gene name and Test expression to the first and second columns, respectively

Gene name
Test expression

FAM41C

16

LOC100130417

43

SAMD11

228

NOC2L

590

KLHL17

33

FAM41C
16

LOC100130417

43

SAMD11

228

NOC2L

590

KLHL17

33

PLEKHN1

77

An optional step to perform at this point is to rename the gene names to refseq identifiers. Correlation Engine matches RNA-Seq biosets to the correct platform model based on species-specific refseq identifiers. This ensures that best statistics are used for correlation calculations. Skipping this step results in the bioset being treated as a custom platform. Note:

  1. Files can be provided to human, mouse, and rat genomes respectively for re-mapping

  2. If users wish to upload RNA-seq data from other species supported in Correlation Engine, gene names should be used as is and they will be ingested as custom platforms

It is advisable to add information as a header to the data table in a bioset. This informs other users of the details around the processing and group identification. Below is a listing of the content Correlation Engine normally provides and following is the layout. Since this basic bioset does not provide a fold change comparison column, lines 2 and 7 do not necessarily apply.

  1. The Bioset summary is the same as the study title

  2. Comparison: restates the comparison using full group names

  3. Data pre-processing: fixed text

  4. Analysis summary: modified according to species

  5. Test expression: Test group name with sample count

  6. Control expression: Control group name with sample count

  7. Internal ID - <studyID>: Typically the GEO series number. Based on this ID, biosets can be matched with their Internal ID files

Bioset summary =

Comparison = <test group> v. <control group>

Data pre-processing = FASTQ files were downloaded from Sequence Read Archive (SRA). No other preprocessing was performed.

Analysis summary = Alignment genome, software used, gene identifiers used, cut-offs applied, etc.

Test expression - Median FPKM expression in <test group> (total replicates = #)

Control expression - Median FPKM expression in <control group> (total replicates = #)

Internal ID - <>

Gene name Fold change Control expression Test expression q-value

Import the finalized bioset thought the Import UI.

Notes: Not all genes in the output will be recognized by the Correlation Engine gene tables, particularly some miRNAs and lincRNAs.

The last step before completing upload is to apply tags to the biosets. The standard types of tags are BIOSOURCE, TISSUE, PHENOTYPE/DISEASE, COMPOUND, and GENE/GENEMODE, and BIODESIGN. GENE/GENEMODES are paired to indicate a directed perturbation such as GENE KNOCKOUT, ANTIBODY TARGTE, etc. BIODESIGNs reflect the experimental structure of the experiment: DISEASE VS. NORMAL, TREATMENT VS. CONTROL, MUTANT VS. WILDTYPE, etc. In the case of single sample analysis of expression, the user can omit or apply a non-differentiating tag such as NORMAL VS. NORMAL, MUTANT VS. MUTANT, DISEASE VS. DISEASE.