Analysis for RAS-APC mouse microarrays
This analysis of the Ras and Wnt pathways by microarrays is a joined work of UMR144 Institut Curie-CNRS,
Service Bioinformatique, Institut Curie,
Klinikum rechts der Isar, Munich, Germany, and
European Molecular Biology Laboratory, Heidelberg, Germany.
A paper describing these results is about to be submitted:
"Gene expression profiling in the pVill-K-rasV12G mouse model: dissecting the molecular contribution of oncogenic K-ras to colorectal tumorigenesis", Klaus-Peter Janssen*, Mechthild Wagner*, Sabrina Carpentier*, Fatima El Marjou, Philippe Hupé, Emmanuel Barillot, Daniel Louvard, Wilhelm Ansorge, and Sylvie Robine.
*) authors contributed equally .
The bioinformatic analysis was carried out by Sabrina Carpentier in close collaboration with the other authors.
The oncogene K-ras is mutated in 30-50% of all human colorectal tumors, but its role in tumor initiation and progression is still ill-defined.
We have used pVill-K-rasV12G mice that express oncogenic K-ras in intestinal epithelia and develop spontaneous digestive tumors
as a tool to dissect the specific contribution of K-ras to digestive tumorigenesis (Janssen, et al., Gastroenterology, 2002).
We then crossed the pVill-K-rasV12G mice with the Apc1638N mice described by Fodde et al, PNAS 1994, resulting in a compound model
that contains, in addition to the activated K-ras, a knock-in mutation of the tumor suppressor gene Apc.
These mice K-rasV12G/Apc1638N show a drastically increased tumor development as compared to single transgenic littermates (Janssen et al., in preparation).
In this study, we have analyzed tumors and normal intestinal mucosa from these animals with cDNA microarray profiling,
a key technique to identify global mechanisms of deregulated molecular functions.
Interestingly, the "molecular fingerprint" of human colon cancer was closely reproduced in the tumors from transgenic
animals: downregulation of metabolic enzymes, signal transduction pathways were strongly altered, changes in genes responsible for
remodeling of the extracellular matrix, for invasion, angiogenesis were prominent. Surprisingly, the tumor transcriptome of K-rasV12G/Apc1638N mice
revealed striking differences when compared to the single transgenic pVill-K-rasV12G model.
These results could help to improve the prognosis of human colon cancer by the identification of novel molecular markers and
targets for diagnosis and therapeutic intervention.
The goal of the data analysis was the comparison of intestinal tumors vs normal intestinal tissue in Ras animals and double transgenic RasAPC animals.
The data set is composed of 5 experimental groups:
RasN: (4 microarrays)
Normal jejunum mucosa tissue from transgenic animals pVillin-KrasV12G
RasT1: (4 microarrays)
Pool of 6 small tumors from two Ras animals, from jejunum, size 1-2mm
RasT2: (4 microarrays)
Pool of 3 large tumors from two Ras animals, from jejunum, size 3-5mm
RasAPC-N: (6 microarrays)
Normal jejunum mucosa tissue from double transgenic mice : pVillin-K-rasV12G/Apc1638N
RasAPC-T: (6 microarrays)
Pool of large tumors from two RasAPC animals.
We used a 15k mouse cDNA clone set from the National Institute of Aging at the NIH (http://lgsun.grc.nia.nih.gov/index.html).
PCR fragments were spotted onto EMBL-made amino-silanized glass slides (Ansorge Group).
Many sources of systematic variations affect the measured gene expression levels in
microarray experiments (Yang & al., Nucleic Acids Research, 2002). In order to extract the biological information and remove such biases, we have normalized the data.
The curvature of the plots of intensity log-ratio M=log2 R/G (where R is the intensity
signal for the red channel, i.e. Cy5, and G is the intensity signal for the green channel, i.e. Cy3)
vs. the mean log intensity , advocated that an intensity dependent
normalization method is more efficient than global methods (such as normalization by
the mean or median of M values). Moreover, we observed the existence of print-tip
effects on the fluorescence intensities.The curvature of the MA plot was clearly tip
dependent. So we first applied the within print tip group intensity dependent
normalization (Yang & al.) using the lowess function.
where is the intensity log-ratio for the array i , the print-tip j and the gene k,
and is the lowess() fit to the M vs A plot for spots printed using the jth print-tip.
We then applied the scale normalization described in Yang & al., so that all tips have
the same variance.
In spite of this normalization, we still detected a row effect in each array. So we
defined the Anova model (Analysis of Variance, Scheffé) :
where is the effect of the lth row in a "subgrid" (a subgrid is a bloc of spots
printed by the same tip) in the ith array for the kth gene, and is the error term
for the array i , the grid j, the "subgrid" row l and the gene k. We have 4 subgrids per
column and 8 subgrids per rows in each array and each subgrid consists of 24 rows, so
The normalized log-ratio becomes :
|Before Normalization||... After Normalization|
| The arrays on the right side depict the normalized values of the arrays on the left side. The normalized arrays clearly show much more homogeneous (unbiased) values.|
We selected genes present in at least 90% of the arrays, and for which the absolute
value of normalized log-ratio M is larger than 2.5 in at least 3 arrays. 203 genes were
selected. We also applied a threshold of 2, which selected 447 genes and gave very
similar clustering results.
The hierarchical clustering was performed using the agnes (Agglomerative Nesting)
implementation included in the R package cluster ( Ihaka & Gentleman ).
We tested different pairs of array / gene similarity metric (from among the euclidean distance, the Pearson correlation coefficient)
and cluster similarity metric (from among the average linkage, the complete linkage and the Ward's method).
We chose the pair which leads to the best agglomerative coefficient.
The agglomerative coefficient (AC) measures the clustering structure of the dataset :
where m(i) denotes the dissimilarity of the observation i to the first cluster it is
merged with, divided by the dissimilarity of last two clusters merged by the algorithm.
Clustering of the Arrays
We used the Euclidean distance as the array similarity metric and the complete linkage
as cluster similarity metric.
Clustering of the genes
We used the Pearson correlation coefficient as the gene similarity metric and the
Ward’s method as cluster similarity metric.
The left tree represents the clustering of the arrays. It clearly separates the different groups of tissues of the transgenic mice:|
ras normal tissue (rasN47, rasN67, rasN70, rasN74), ras tumoral tissue, small tumours (rasT148, rasT168, rasT172, rasT175),ras tumoral tissue, large tumours (rasT149, rasT169, rasT173, rasT176), rasApc normal tissue (rasAPCN1, rasAPCN2, rasAPCN55, rasAPCN56, rasAPCN91, rasAPCN92) and rasApc tumoral tissue (rasAPCN3, rasAPCN4, rasAPCN57, rasAPCN58, rasAPCN93, rasAPCN94).
Moreover it separates clearly in two branches the 2 mouse models : Ras animals vs. compound mutant RasApc mice.
The tree on the top represents the clustering of genes: we can distinguish some clusters that allow the differentiation between the different types of mice.
We have looked for genes differentially expressed between:
- Ras normal tissue vs. Ras tumoral tissue
To identify genes that are differentially expressed, we used the detection procedure
called Significance Analysis of Microarrays (SAM) (Tusher & Tibshirani, PNAS, 2001), as
implemented in the R package siggenes (Gentleman and al., 2004).
- Ras tumoral tissue: small vs. large tumours, corresponding to tumour progession
- Ras Apc normal tissue vs. Ras Apc tumoral tissue
- Ras tumoral tissue vs. Ras Apc tumoral tissue
We used the following parameters for SAM :
the vector of values for the threshold Delta : i /5 for i=1...10 and 100 permutations.
We retained the list of significant differentially expressed genes for a false discovery
rate (FDR) equal to 0.001, except for the comparison small vs. large tumours in ras tumoral tissue where FDR is equal to 0.045 .
|This plot represents, in green, the differentially expressed genes for the selected FDR between normal and tumoral tissue in Ras transgenic mice.|
We were interested in the functions of the differentially expressed genes found by
We thus applied the method Gostat ( Beissbarth & Speed, Bioinformatics, 2004) to obtain the
significant GO (Gene Ontology) annotations.
Detailed results will be soon available online.