Analysis for RAS-APC mouse microarrays

This analysis of the Ras and Wnt pathways by microarrays is a joined work of

UMR144 Institut Curie-CNRS,

Service Bioinformatique, Institut Curie,

Klinikum rechts der Isar, Munich, Germany, and

European Molecular Biology Laboratory, Heidelberg, Germany.

A paper describing these results is about to be submitted:
"Gene expression profiling in the pVill-K-rasV12G mouse model: dissecting the molecular contribution of oncogenic K-ras to colorectal tumorigenesis", Klaus-Peter Janssen*, Mechthild Wagner*, Sabrina Carpentier*, Fatima El Marjou, Philippe Hupé, Emmanuel Barillot, Daniel Louvard, Wilhelm Ansorge, and Sylvie Robine.
*) authors contributed equally .
The bioinformatic analysis was carried out by Sabrina Carpentier in close collaboration with the other authors.

Context

The oncogene K-ras is mutated in 30-50% of all human colorectal tumors, but its role in tumor initiation and progression is still ill-defined.
We have used pVill-K-rasV12G mice that express oncogenic K-ras in intestinal epithelia and develop spontaneous digestive tumors as a tool to dissect the specific contribution of K-ras to digestive tumorigenesis (Janssen, et al., Gastroenterology, 2002). We then crossed the pVill-K-rasV12G mice with the Apc1638N mice described by Fodde et al, PNAS 1994, resulting in a compound model that contains, in addition to the activated K-ras, a knock-in mutation of the tumor suppressor gene Apc. These mice K-rasV12G/Apc1638N show a drastically increased tumor development as compared to single transgenic littermates (Janssen et al., in preparation).
In this study, we have analyzed tumors and normal intestinal mucosa from these animals with cDNA microarray profiling, a key technique to identify global mechanisms of deregulated molecular functions. Interestingly, the "molecular fingerprint" of human colon cancer was closely reproduced in the tumors from transgenic animals: downregulation of metabolic enzymes, signal transduction pathways were strongly altered, changes in genes responsible for remodeling of the extracellular matrix, for invasion, angiogenesis were prominent. Surprisingly, the tumor transcriptome of K-rasV12G/Apc1638N mice revealed striking differences when compared to the single transgenic pVill-K-rasV12G model.
These results could help to improve the prognosis of human colon cancer by the identification of novel molecular markers and targets for diagnosis and therapeutic intervention.

The goal of the data analysis was the comparison of intestinal tumors vs normal intestinal tissue in Ras animals and double transgenic RasAPC animals.

Data Set

The data set is composed of 5 experimental groups:

RasN: (4 microarrays)

Normal jejunum mucosa tissue from transgenic animals pVillin-Kras^V12G

RasT1: (4 microarrays)

Pool of 6 small tumors from two Ras animals, from jejunum, size 1-2mm

RasT2: (4 microarrays)

Pool of 3 large tumors from two Ras animals, from jejunum, size 3-5mm

RasAPC-N: (6 microarrays)

Normal jejunum mucosa tissue from double transgenic mice : pVillin-K-ras^V12G/Apc^1638N

RasAPC-T: (6 microarrays)

Pool of large tumors from two RasAPC animals.

We used a 15k mouse cDNA clone set from the National Institute of Aging at the NIH (http://lgsun.grc.nia.nih.gov/index.html).
PCR fragments were spotted onto EMBL-made amino-silanized glass slides (Ansorge Group).

Microarray Normalization

Many sources of systematic variations affect the measured gene expression levels in microarray experiments (Yang & al., Nucleic Acids Research, 2002). In order to extract the biological information and remove such biases, we have normalized the data.
The curvature of the plots of intensity log-ratio M=log2 R/G (where R is the intensity signal for the red channel, i.e. Cy5, and G is the intensity signal for the green channel, i.e. Cy3) vs. the mean log intensity

, advocated that an intensity dependent normalization method is more efficient than global methods (such as normalization by the mean or median of M values). Moreover, we observed the existence of print-tip effects on the fluorescence intensities.The curvature of the MA plot was clearly tip dependent. So we first applied the within print tip group intensity dependent normalization (Yang & al.) using the lowess function.

where

is the intensity log-ratio for the array i , the print-tip j and the gene k, and

is the lowess() fit to the M vs A plot for spots printed using the jth print-tip. We then applied the scale normalization described in Yang & al., so that all tips have the same variance.

In spite of this normalization, we still detected a row effect in each array. So we defined the Anova model (Analysis of Variance, Scheffé) :

where

is the effect of the lth row in a "subgrid" (a subgrid is a bloc of spots printed by the same tip) in the ith array for the kth gene, and

is the error term for the array i , the grid j, the "subgrid" row l and the gene k. We have 4 subgrids per column and 8 subgrids per rows in each array and each subgrid consists of 24 rows, so l=1...24 .
The normalized log-ratio becomes :

Before Normalization	... After Normalization

The arrays on the right side depict the normalized values of the arrays on the left side. The normalized arrays clearly show much more homogeneous (unbiased) values.

Clustering

Filtering Genes

We selected genes present in at least 90% of the arrays, and for which the absolute value of normalized log-ratio M is larger than 2.5 in at least 3 arrays. 203 genes were selected. We also applied a threshold of 2, which selected 447 genes and gave very similar clustering results.

The hierarchical clustering was performed using the agnes (Agglomerative Nesting) implementation included in the R package cluster ( Ihaka & Gentleman ). We tested different pairs of array / gene similarity metric (from among the euclidean distance, the Pearson correlation coefficient) and cluster similarity metric (from among the average linkage, the complete linkage and the Ward's method). We chose the pair which leads to the best agglomerative coefficient. The agglomerative coefficient (AC) measures the clustering structure of the dataset : AC=mean(1-m(i)) where m(i) denotes the dissimilarity of the observation i to the first cluster it is merged with, divided by the dissimilarity of last two clusters merged by the algorithm.

Clustering of the Arrays

We used the Euclidean distance as the array similarity metric and the complete linkage as cluster similarity metric.

Clustering of the genes

We used the Pearson correlation coefficient as the gene similarity metric and the Ward’s method as cluster similarity metric.

The left tree represents the clustering of the arrays. It clearly separates the different groups of tissues of the transgenic mice:
ras normal tissue (rasN47, rasN67, rasN70, rasN74), ras tumoral tissue, small tumours (rasT148, rasT168, rasT172, rasT175),ras tumoral tissue, large tumours (rasT149, rasT169, rasT173, rasT176), rasApc normal tissue (rasAPCN1, rasAPCN2, rasAPCN55, rasAPCN56, rasAPCN91, rasAPCN92) and rasApc tumoral tissue (rasAPCN3, rasAPCN4, rasAPCN57, rasAPCN58, rasAPCN93, rasAPCN94).
Moreover it separates clearly in two branches the 2 mouse models : Ras animals vs. compound mutant RasApc mice.
The tree on the top represents the clustering of genes: we can distinguish some clusters that allow the differentiation between the different types of mice.

Differential Analysis

We have looked for genes differentially expressed between:

- Ras normal tissue vs. Ras tumoral tissue
- Ras tumoral tissue: small vs. large tumours, corresponding to tumour progession
- Ras Apc normal tissue vs. Ras Apc tumoral tissue
- Ras tumoral tissue vs. Ras Apc tumoral tissue

To identify genes that are differentially expressed, we used the detection procedure called Significance Analysis of Microarrays (SAM) (Tusher & Tibshirani, PNAS, 2001), as implemented in the R package siggenes (Gentleman and al., 2004).
We used the following parameters for SAM :
the vector of values for the threshold Delta : i /5 for i=1...10 and 100 permutations.
We retained the list of significant differentially expressed genes for a false discovery rate (FDR) equal to 0.001, except for the comparison small vs. large tumours in ras tumoral tissue where FDR is equal to 0.045 .

This plot represents, in green, the differentially expressed genes for the selected FDR between normal and tumoral tissue in Ras transgenic mice.

Functional Analysis

We were interested in the functions of the differentially expressed genes found by SAM.
We thus applied the method Gostat ( Beissbarth & Speed, Bioinformatics, 2004) to obtain the significant GO (Gene Ontology) annotations.

Detailed results will be soon available online.