TrAP - Transcriptome Analysis Pipeline


Sabrina Carpentier1, Séverine Lair1, Carlo Lucchesi2, Nicolas Servant1, Philippe La Rosa1, Emmanuel Barillot1.
  1. Institut Curie Bioinformatics Unit.
  2. Institut Curie Inserm U509.



TrAP is a pipeline dedicated to the transcriptome microarray analysis. It was developped by the bioinformatics team of the Institut Curie and is currently used by the biologists of the institute. The aim of this tool is to help users to carry out biological studies (on tumorigenesis or tumour progression) to successful conclusion(s) through a user-friendly interface.
TrAP allows users to:

The following figure illustrates the different steps of a microarray experiment and highlights TrAP involvement.

The different steps, implemented in TrAP, are described below (Click on icons to get more information):
HutuDB
TrAP was developed in PHP, HTML and Javascript, uses the Oracle database system and the R statistics platform.


Data storage and Project's management

Back to diagram
Data management is an important part of an analysis pipeline. It consists in storage, organization and management of any used or generated data.
The data management tool used is HuTuDB-transcriptome.

Normalization

Back to diagram
The aim of normalization is to extract the biological information from the microarrays and to remove experimental biases.
Concerning Affymetrix Genechip arrays we choose to integrate these three methods:


We recommend the use of GCRMA for Affymetrix Genechip arrays. TrAP alows you to evaluate the quality of the normalization thanks to boxplot, qqplot and MvAplot:


The two-colors cDNA microarrays are normalized with the within print-tip group intensity-dependent normalization (Yang & al.) using the lowess function. If spatial effects are detected, they are corrected with an ANOVA (ANalysis Of VAriance) model.

Clustering

Back to diagram
The clustering purpose is double: To identify co-regulated genes, the hierarchical clustering is the commonly used method.
The clustering consists of 3 steps:
The two-ways clustering (tree of arrays and tree of genes), generated with TrAP, can be displayed with VAMP (Visualization and Analysis of CGH arrays, transcriptome and other Molecular Profiles).Vamp allows for example to select a subtree and to zoom in it.

Biclustering algorithms will be implemented soon. This will allow the identification of gene groups that show similar activity patterns under a specific subset of the experimental conditions.

Differential Analysis

Back to diagram
The objective of differential analysis is to identify the genes whose expression differs between two predefined sample groups.
The improved detection procedure: Significance Analysis of Microarrays (SAM) was first implemented (see Tusher & al.(2001), for more details).
Other methods (such ANOVA models or non-parametric methods) are being tested and will soon be integrated to TrAP.
The result of a differential analysis, launched in TrAP, is a list of genes which are stimulated or repressed under a condition compared to an other. This list (with additional information such gene names, gene symbol, unigene id, genbank accession number, gene ontoloy annotations) can be exported in excel format.



Functional Analysis

Back to diagram
The typical question facing researchers is: what are the functions of the significant differentially expressed genes?
TrAP allows users to assess which functions a list of genes may represent. The analysis is based on the Gene Ontology (GO) project: this project provides a controlled vocabulary (ontology) to describe gene and gene product attributes in any organism (see www.geneontology.org, for more details). The three organizing principles of GO are molecular function, biological process and cellular component with, for each, associated GO terms.
In TrAP, a hypergeometric test is used to determine which GO terms (from a list of genes) are significant. In addition, the distribution of significant GO terms by molecular function, biological process and cellular component was given to users.