[BioC] Agi4x44PreProcess 1.4.0 question: use of genes.rpt.agi() and Gene Sets

Tue Oct 27 16:13:01 CET 2009

Dear Francois, Tobias, and all users

Thanks to this discussion, and those that I have found on the
Archives, as Francois suggested, I am more aware now of the importance
of avoiding averages of different probes that map the same gene
transcript at different locations. I should now perhaps test - on my
data - the two options that were discussed here in this thread, i.e.

1) choosing the probe with the largest experimental variation (or with
the maximum average)
2) choosing the probe that maps closest to the 3' end

Are there any functions written to address each of these topics?

I have been searching on GMANE but, other than discussions on the
topic, I am particularly interested in functions that have
addressed/solved these issues.
For selection as at (2) above, I suppose one has to interrogate the
position of probes on chromosomes.

Thank you, again, in advance
Massimo

Massimo Pinto
Post Doctoral Research Fellow
Enrico Fermi Centre and Italian Public Health Research Institute (ISS), Rome
http://claimid.com/massimopinto

On Wed, Oct 21, 2009 at 3:48 PM, Francois Pepin <fpepin at cs.mcgill.ca> wrote:
>
> The fact that you have to summarize with Affy doesn't mean that it applies to other technologies. The Affy chips need this because they have shorter oligos (25bp) but the Agilent ones are longer (60bp) and more reliable than individual affy probes.
>
> I have to disagree with that being the most biologically relevant. As I said, a lot of the probes for the same gene will not be measuring the same thing, some will be differential splice sites, or preferentially tracking pseudo-genes, etc. From talking to Agilent scientists, one of the criterias for keeping different probes for a same gene is that they give different readings on some of their test samples. Otherwise, they just take the closest one to 3'.
>
> I have cases where both probes for a given gene show differential expression in opposite directions. There's one I believe, the other one is a probably fluke, but combining them would have been be a bad idea.
>
> Francois
>
> Tobias Straub wrote:
>>
>> Hi
>>
>> key question regarding your problem is the confidence in the measurement of a single agilent feature. in affy 3' expression arrays a robust measurement is obtained by summarization of several features. for the modern affy gene st arrays the gene-based expression measurement is also obtained by feature summarization across exons (at least this is what the affy epxression console forces you to do).
>>
>> hence, the most intuitive and biologically relevant procedure would be to apply feature summarization accordingly for agilent arrays before doing the statistics. the question how this summarization has to be done cannot easily be answered without analysis of reference samples. my personal experience: there is not a big difference between taking the median signal or just taking the feature with the highest variance. if you are particularly interested in categorizing responders, the variance method is probably more sensitive.
>>
>> best
>> Tobias
>>
>> On Oct 20, 2009, at 4:45 PM, Francois Pepin wrote:
>>
>>> Hi Massimo,
>>>
>>> I don't know about Agi4x44PreProcess, but Limma can do it with avereps.
>>>
>>> In the case of Agilent arrays, I would not recommend doing that from the start. The probes mapping to the same genes often do not measure the same thing, they can map different splice variants and some can be pretty far from the 3' end.
>>>
>>> So for differential analysis, I would suggest keeping them different. For other analyses that assume one probe per gene, such as gene ontology analysis, I would recommend an unbiased method to choose a representative probe per gene, for example the highest variance probe or the one closest to 3' end.
>>>
>>> If you search in the archives, you can find more advice as this is a common topic.
>>>
>>> Francois
>>>
>>> Massimo Pinto wrote:
>>>>
>>>> Greetings all,
>>>> I realised that I was carrying forward, in my analysis, multiple
>>>> measurements for the same gene that had been carried out using
>>>> independent probes. This is a feature of Agilent arrays, as I
>>>> understand. However, while it is clear to me that Agi4x44PreProcess
>>>> offers a function to summarize replicated probes, called
>>>> summarize.probe(), I cannot see a readily available function that
>>>> performs a similar treatment to replicated genes, i.e. Gene Sets, as
>>>> these are called in the Agi4x44 Package.
>>>> The result of calling
>>>>>
>>>>> genes.rpt.agi(dd, "hgug4112a.db", raw.data = TRUE, WRITE.html = TRUE, REPORT = TRUE)
>>>>
>>>> is an html list of Gene Sets, but these are not summarized to a
>>>> 'virtual' measurement, like summarize.probe() does for replicated
>>>> probes.
>>>> Is there a reason why one would like to carry on multiple probes for a
>>>> given gene throughout his/her subsequent analysis, including linear
>>>> modeling and gene ontology? If not, is there a function that performs
>>>> the median of such repeats?
>>>> Thank you in advance,
>>>> Yours
>>>> Massimo Pinto
>>>>>
>>>>> sessionInfo()
>>>>
>>>> R version 2.9.1 (2009-06-26)
>>>> i386-apple-darwin8.11.1
>>>> locale:
>>>> C
>>>> attached base packages:
>>>> [1] grid      stats     graphics  grDevices utils     datasets
>>>> methods   base
>>>> other attached packages:
>>>> [1] affy_1.22.0             gplots_2.7.0            caTools_1.9
>>>>     bitops_1.0-4.1          gdata_2.4.2             gtools_2.5.0-1
>>>> [7] hgug4112a.db_2.2.11     RSQLite_0.7-1           DBI_0.2-4
>>>>     Agi4x44PreProcess_1.4.0 genefilter_1.24.0       annotate_1.22.0
>>>> [13] AnnotationDbi_1.6.0     limma_2.18.0            Biobase_2.4.1
>>>> loaded via a namespace (and not attached):
>>>> [1] affyio_1.11.3        preprocessCore_1.5.3 splines_2.9.1
>>>> survival_2.35-4      xtable_1.5-5
>>>> Massimo Pinto
>>>> Post Doctoral Research Fellow
>>>> Enrico Fermi Centre and Italian Public Health Research Institute (ISS), Rome
>>>> http://claimid.com/massimopinto
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> ----------------------------------------------------------------------
>> Tobias Straub   ++4989218075439   Adolf-Butenandt-Institute, München D
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>