[BioC] 1. comparing chip Information in meta analysis / Rankprod and 2. two color normalization

Wed May 7 13:21:58 CEST 2014

   Dear Gordon,

   Thank you for your answer. I have still a few questions.

   1. >Two colour arrays don't return expression values. Instead they return
   log-ratios, which are stored in M. When you compare Agilent to Affymetrix
   Chips and Illumina Beadarrays, you need to compare log-fold-changes and DE
   results, not expression values.

   What does DE results mean? And what should I do with the affymetrix Chips or
   Illumina  Beadarray? I preprocess the affymetrix chips with rma, which
   already  makes a log transformation? The illumina array was background
   corrected, than log transformed and at last quantile normalized with the
   package: lumi.

   2. > For comparing the results of the different studies I want to use the >
   package:  RankProd.  As  far as I know, RankProd assesses differential
   expression and does not in itself help you compare one study to another. The
   usual methods to compare one study to another are (i) to make a scatterplot
   of logFC from the two experiments or (ii) to use a gene set test such as
   roast()  in the limma package. The limma package can compute logFC for
   whatever comparison you are making.

   I don't want to compare the studies, directly. I want to take the results of
   all experiments and get a list of genes which would be up- or downregulated
   over all studies. I think RankProd was a good choice. For this I make a big
   excel table which look like this. I have seven different experiments, so it
   is possible that Bglap is not investigated on each chip. RankProd will
   ignore the missing values.

               Experment1
   Experiment2
                con1   con2   con3   Diet1   diet2   diet3   con1    con2
   con3   con4   con4   diet1  diet2   diet3   diet4   diet5
   Bglap     2,8     2,4        2,7        3,3      3,66      3,1     5,1
   6,6      6,2      6,6      6,3    5,9      6,5      6,4       5,7     6,9
   Copd       5,4     7,2       5,8        4,3       5          4,9     3
   2,7      4        3,5       4,2     4,3      3,5    3,9        2,5      3,1
   Sirt1        7         6,5      7,2       7,3    7,1      6,7      4,5
   3,7       4,2     4,6      4,1      4,2     4,5     4,8        4,5     3,9
   ...

   cl<- 1 1 1 2 2 2 1 1 1 1 1 2 2 2 2 2
   origin<-  1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 --> this means two different
   experiments

   My aim is to have a list of up- and downregulated genes for intervention a
   (7  experiments,  intervention  a  vs.  control) and a list of up- and
   downregulated genes for intervention b (3 experiments, intervention b vs.
   control) to see if there are genes which are up- or downregulated by both
   interventions.

   3. > For this purpose, I always recommend that, for each Entrez ID, you use
   the probe on each platform with the highest overall expression level.

   Example:
   Example
               control 1    Control 2     control 3    diet1    diet2    diet3
   (this are replicates for the same group)
   Bglap         2,5              3,2                 3,1             3,9
   4,8        3,1
   Bglap          1                 0,7                0,9            1,2
   0,7       1
   Bglap          4,9              3,3                 4,1            4,8
   5,5      5,2

   So I will only take the last row? Is there a R command to filter for this
   rows in Affy or Illumina?

   4. > The rationale of this is that you are using the probe that represents
   the dominant transcript for that gene in the cell type. This method has been
   used for many published studies by now, the first of which may have been:
   http://www.biomedcentral.com/1471-2105/7/511 For example, you can proceed
   like this for the Agilent data, assuming you have put the EntrezIDs into the
   object: MA <- normalizeBetweenArrays(RG, method="loess") A <- rowMeans(MA$A)
   o     <-    order(A,decreasing=TRUE)    MA2    <-    MA[o,]    d    <-
   duplicated(MA$genes$EntrezID) MA2 <- MA2[!d,] Now you have a data object
   with a unique probe for each EntrezID.

   This command doesn't work with my example
   http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23523.WhennIfinished
   all steps there won't be any value in my list. I think the problem could be
   that MA$EntrezID is missed.

   Kind regards
   Stefanie

   Gesendet: Sonntag, 04. Mai 2014 um 06:50 Uhr
   Von: "Gordon K Smyth" <smyth at wehi.EDU.AU>
   An: "Stefanie Busch" <stefanie.busch2 at web.de>
   Cc: "Bioconductor mailing list" <bioconductor at r-project.org>
   Betreff: 1. comparing chip Information in meta analysis / Rankprod and 2.
   two color normalization
   Dear Stephanie, > Date: 30 April 2014 > From: Pekka Kohonen > From: Stefanie
   Busch > To: Bioconductor > Subject: Re: [BioC] 1. comparing chip Information
   in meta analysis / Rankprod and 2. two color normalization > > Hello, > > I
   have two questions and I hope you can help me. > > I want to compare several
   studies with similar design but different > arrays. The first step was to
   quantile  normalize  all  data which works > well beside the two color
   experiment with an Agilent chip. As you seem to have realized already,
   quantile normalization is not usually appropriate for a two colour Agilent
   array.  Loess  normalization is generally for two colour arrays, and I
   recommend a normexp background correction step before that. > I read the
   limma User Guide and find out that I must preprocess with the > function
   normalizeBetweenArrays. So I get M- and A-values and my > question is which
   one shows the expression values for this experiment? Two colour arrays don't
   return expression values. Instead they return log-ratios, which are stored
   in M. When you compare Agilent to Affymetrix Chips and Illumina Beadarrays,
   you need to compare log-fold-changes and DE results, not expression values.
   > For comparing the results of the different studies I want to use the >
   package:  RankProd.  As  far as I know, RankProd assesses differential
   expression and does not in itself help you compare one study to another. The
   usual methods to compare one study to another are (i) to make a scatterplot
   of logFC from the two experiments or (ii) to use a gene set test such as
   roast()  in the limma package. The limma package can compute logFC for
   whatever comparison you are making. > For a better comparision between the
   studies I used > the Entrez IDs and I download the last chip information
   directly from > affymerix and illumina. So this reveal a new problem. For
   example  on  >  the  chip Affymetrix Mouse Genome 430 2.0 Array the ID
   1449880_s_at > stands for three gene names and entrez ids:Bglap /// Bglap2
   /// Bglap3 - > 12095 /// 12096 /// 12097. On the Illumina Chip each gene has
   a  single  >  Array  ID: > Bglap-rs1 - ILMN_1233122 - 12095 > Bglap1 -
   ILMN_2610166 - 12096 > Bglap2 - ILMN_2944508 - 12097 > > So I don't no what
   I should do to compare the results of this two > experiments. When I paste
   the expression values of 1449880_s_at three > times with the three different
   entrez-IDs the ranking which was > calculating with the RankProd-Package was
   changed.  >  Example:  >  Chip  ID Entrez-Id Control1 control 2 etc. >
   1449880_s_at - 12095 - 3,855 - 4,211 ... > 1449880_s_at - 12096 - 3,855 -
   4,211  ...  >  1449880_s_at  - 12097 - 3,855 - 4,211 ... > > The other
   possibility is to take the three expression Values of the > illumina chip to
   one value. But I don't know if the is the right way. > What is the better
   way? For this purpose, I always recommend that, for each Entrez ID, you use
   the probe on each platform with the highest overall expression level. The
   rationale  of this is that you are using the probe that represents the
   dominant transcript for that gene in the cell type. This method has been
   used for many published studies by now, the first of which may have been:
   [1]http://www.biomedcentral.com/1471-2105/7/511 For example, you can proceed
   like this for the Agilent data, assuming you have put the EntrezIDs into the
   object: MA <- normalizeBetweenArrays(RG, method="loess") A <- rowMeans(MA$A)
   o     <-    order(A,decreasing=TRUE)    MA2    <-    MA[o,]    d    <-
   duplicated(MA$genes$EntrezID) MA2 <- MA2[!d,] Now you have a data object
   with  a unique probe for each EntrezID. Simply averaging the probes or
   probe-sets is not generally recommended, because different probes for the
   same gene can have quite different behaviour. A common situation is that one
   probe successfully probes an expressed transcript while another probe is
   essentially unexpressed. Best wishes Gordon > Kind regards > Stefanie Busch
   ______________________________________________________________________The
   information  in this email is confidential and intended solely for the
   addressee. You must not disclose, forward, print or use it without the
   permission of the sender.
   ______________________________________________________________________

References

   1. http://www.biomedcentral.com/1471-2105/7/511