[BioC] help needed on avereps function

Thu Jun 18 12:25:34 CEST 2009

Dear Dr Pepin,

thank you for your help.
My sessionInfo() is:
> sessionInfo()
R version 2.8.0 (2008-10-20)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] limma_2.16.4

and this is how I used avereps.
First I select only "Gene" on my array.

isGene<-gal[,7]=="FALSE"
MAq_gene<-MAq[isGene,]

then, on normalized data (in this case they are quantile normalized data) I 
use avereps()

MAq_av<-avereps(MAq_gene)

and finnaly I use lmFit

fit<-lmFit(MAq_av,design,weights=MAq_av$weights)

Checking MAq_av file, I noted a lot of probe with the same ID not averaged, 
although x dim of MAq_gene is 43376 and x dim of MAq_av is 41000.
I checked MAq_av$genes$ID and they are Agilent PROBE ID.
Using avereps I obtained 334 unique PROBE ID (Agilent code), and 541 unique 
PROBE ID if I do not use it.
Only 6 PROBE ID are in common between this two sets.
If I use GeneName to check genes in common, they are only 16.
I have thought about the possibility that using these two different  MA file 
in lmFit can generate two different lists of differentially expressed genes 
due to borrowing information from the other genes and, for this reason I 
chose to use the not averaged list of differentially expressed genes, 
because LIMMA have much more information to assign significance. However I 
never expected a so different result.
I hope information I give you can help us to understand where the problem 
is.
Thank you for your kind help and for your helpfulness

Erika

----- Original Message ----- 
From: "Francois Pepin" <fpepin at cs.mcgill.ca>
To: "Erika Melissari" <erika.melissari at bioclinica.unipi.it>; "BioC" 
<bioconductor at stat.math.ethz.ch>
Sent: Wednesday, June 17, 2009 00:51 AM
Subject: Re: help needed on avereps function

> Hi Erika,
>
> I'm bringing the discussion back to the list so other people can chime in 
> and so it's archived for future reference.
>
> What are you using for the ID argument in avereps? Since the code doesn't 
> seem to work for you (i.e. you still have duplicates), I'm guessing it's 
> not using the proper identifiers. Without any code, it's impossible for us 
> to understand what is happening.
>
> As for the lists of differentially expressed genes, you'd have to tell us 
> just how many genes you get with each method and how different the lists 
> are. Methods like Limma borrow information from the other genes when 
> calculating significance, so this could change the p-values. In addition, 
> multiple hypothesis testing will also be affected if you have a different 
> number of probes.
>
> So other than guessing, there's not much that we can do. Sending your code 
> (including sessionInfo()) and giving us more details of your results will 
> allow people to get a better idea of what is happening and how to fix it, 
> if necessary.
>
> Francois
>
> Erika Melissari wrote:
>>
>> Dear Dr Pepin,
>>  sorry to disturb you, but I sent several times an email to Bioconductor 
>> list about some problems that I have using avereps function and no answer 
>> I received.
>> Perhaps my question is very unimportant for Bioconductor list, but I 
>> noted some uncounted results when I use this function that concerned me 
>> and I do not manage to give an explanation.
>> If you have a little time and you would like to help me, I would like to 
>> have your opinion about these problems.
>> As LIMMA help suggested, I use avereps function after normalization and 
>> before using lmFit, that is I perform lmFit with data normalized and 
>> averaged.
>> I noted two strange results:
>> 1) I obtain a different list of differentially expressed genes if I use 
>> or not avereps function. If I have well understood this function, his 
>> effect is to average M, A and weights values for spot with the same probe 
>> id code (in my case this is an Agilent code). Why should my statistical 
>> significance  change and what list of differentially expressed genes is 
>> right...or more safe?
>> 2) when I checked the averaged list of genes I found spot not averaged 
>> with the same Probe id. You can see an example of this below. What are 
>> the reason that does not allow for the averaging?
>>  Maybe the problems that I see are not a consequence of using avereps 
>> function, particularly for the point 1), but should they to be explained 
>> in other terms?
>>  I apologize again for the disturb that I am causing you and I thank you 
>> in advance for any help you will like to give me.
>>  Best regards
>>  Erika
>>  Erika Melissari
>> Ph.D. student
>> Department of Experimental Pathology, MBIE,
>> University of Pisa
>> Santa Chiara Hospital, via Roma 67
>> 56126 Pisa
>> e-mail: erika.melissari at bioclinica.unipi.it 
>> <mailto:erika.melissari at bioclinica.unipi.it>
>> ----- Original Message -----
>> *From:* Erika Melissari <mailto:erika.melissari at bioclinica.unipi.it>
>> *To:* bioconductor at stat.math.ethz.ch 
>> <mailto:bioconductor at stat.math.ethz.ch> ; Francois Pepin 
>> <mailto:fpepin at cs.mcgill.ca>
>> *Sent:* Friday, June 05, 2009 18:23 PM
>> *Subject:* avereps function
>>
>> Dear list,
>>  I used averep function after normalization and before lmFit to average 
>> spot copies on microarrays.
>> I noted that since a lot of spots have been averaged (the total number of 
>> spots have been reduced to 41000 from 43000), other spots do not have.
>> See this example:
>>  Block Column Row ID Name Sequence ProbeUID GeneName logFC adj.P.Val B
>> 1 85 183 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB 
>> 0.266302 0.048228 0.181434
>> 1 20 393 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 
>> ACTB -0.20687 0.068295 -0.6233
>> 1 56 294 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB 
>> 0.110065 0.382405 -4.54642
>> 1 22 299 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB 
>> 0.085017 0.405978 -4.66767
>> 1 53 457 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB 
>> 0.080708 0.483304 -5.0517
>> 1 17 39 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB 
>> 0.063279 0.710629 -5.73913
>> 1 45 199 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB 
>> 0.051584 0.778778 -5.87993
>> 1 64 279 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 
>> ACTB -0.04158 0.800246 -5.91735
>> 1 16 358 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB 
>> 0.024847 0.880504 -6.03386
>> 1 21 435 A_23_P135769 NM_001101 
>> TTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTT 2871 ACTB 
>> 0.000153 0.999438 -6.11393
>> 1 4 111 A_23_P31323 NM_001101 
>> ACTCTTCCAGCCTTCCTTCCTGGGCATGGAGTCCTGTGGCATCCACGAAACTACCTTCAA 8562 ACTB 
>> 0.283846 0.043577 0.472915
>> 1 17 275 A_24_P226554 NM_001101 
>> GCACCCAGCACAATGAAGATCAAGATCATTGCTCCTCCTGAGCGCAAGTACTCCGTGTGG 21338 ACTB 
>> 0.030637 0.848504 -5.9958
>> 1 74 251 A_32_P137939 NM_001101 
>> AGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCGCACCTT 19564 
>> ACTB -0.20387 0.177982 -2.87144
>>
>>  Why the group of first 10 probes was not averaged by avereps?
>> Any suggestion will be appreciated.
>>  Thank you so much
>>  Erika
>>
>