[BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization

Gordon K Smyth smyth at wehi.EDU.AU
Sun May 4 06:50:50 CEST 2014


Dear Stephanie,

> Date: 30 April 2014
> From: Pekka Kohonen <pkpekka at gmail.com>
> From: Stefanie Busch <stefanie.busch2 at web.de>
> To: Bioconductor <bioconductor at r-project.org>
> Subject: Re: [BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization
>
> Hello,
>
> I have two questions and I hope you can help me.
>
> I want to compare several studies with similar design but different 
> arrays. The first step was to quantile normalize all data which works 
> well beside the two color experiment with an Agilent chip.

As you seem to have realized already, quantile normalization is not 
usually appropriate for a two colour Agilent array.  Loess normalization 
is generally for two colour arrays, and I recommend a normexp background 
correction step before that.

> I read the limma User Guide and find out that I must preprocess with the 
> function normalizeBetweenArrays. So I get M- and A-values and my 
> question is which one shows the expression values for this experiment?

Two colour arrays don't return expression values.  Instead they return 
log-ratios, which are stored in M.

When you compare Agilent to Affymetrix Chips and Illumina Beadarrays, you 
need to compare log-fold-changes and DE results, not expression values.

> For comparing the results of the different studies I want to use the 
> package: RankProd.

As far as I know, RankProd assesses differential expression and does not 
in itself help you compare one study to another.

The usual methods to compare one study to another are (i) to make a 
scatterplot of logFC from the two experiments or (ii) to use a gene set 
test such as roast() in the limma package.  The limma package can compute 
logFC for whatever comparison you are making.

> For a better comparision between the studies I used 
> the Entrez IDs and I download the last chip information directly from 
> affymerix and illumina.  So this reveal a new problem. For example on 
> the chip Affymetrix Mouse Genome 430 2.0 Array the ID 1449880_s_at 
> stands for three gene names and entrez ids:Bglap /// Bglap2 /// Bglap3 - 
> 12095 /// 12096 /// 12097. On the Illumina Chip each gene has a single 
> Array ID:

> Bglap-rs1 - ILMN_1233122 - 12095
> Bglap1     - ILMN_2610166 - 12096
> Bglap2      - ILMN_2944508 - 12097
>
> So I don't no what I should do to compare the results of this two 
> experiments. When I paste the expression values of 1449880_s_at three 
> times with the three different entrez-IDs the ranking which was 
> calculating with the RankProd-Package was changed.

> Example:
> Chip ID               Entrez-Id  Control1  control 2 etc.
> 1449880_s_at - 12095 -     3,855 -     4,211 ...
> 1449880_s_at - 12096 -     3,855 -     4,211 ...
> 1449880_s_at - 12097 -     3,855 -     4,211 ...
>
> The other possibility is to take the three expression Values of the 
> illumina chip to one value. But I don't know if the is the right way. 
> What is the better way?

For this purpose, I always recommend that, for each Entrez ID, you use the 
probe on each platform with the highest overall expression level.  The 
rationale of this is that you are using the probe that represents the 
dominant transcript for that gene in the cell type.  This method has been 
used for many published studies by now, the first of which may have been:

  http://www.biomedcentral.com/1471-2105/7/511

For example, you can proceed like this for the Agilent data, assuming you 
have put the EntrezIDs into the object:

   MA <- normalizeBetweenArrays(RG, method="loess")
   A <- rowMeans(MA$A)
   o <- order(A,decreasing=TRUE)
   MA2 <- MA[o,]
   d <- duplicated(MA$genes$EntrezID)
   MA2 <- MA2[!d,]

Now you have a data object with a unique probe for each EntrezID.

Simply averaging the probes or probe-sets is not generally recommended, 
because different probes for the same gene can have quite different 
behaviour.  A common situation is that one probe successfully probes an 
expressed transcript while another probe is essentially unexpressed.

Best wishes
Gordon

> Kind regards
> Stefanie Busch

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list