[BioC] pm summarization method

Mon Apr 16 17:54:07 CEST 2012

Hi Assa,

On 4/16/2012 9:05 AM, Assa Yeroslaviz wrote:
> Hi everybody,
>
> I have a question about the behavior of the expresso command when
> extracting the raw data from an affyBatch.
>
> I wanted to evaluate the raw intensities values of a specific gene from my
> data set and tried to extract it like that:
> rawdata<- expresso(totalExpressionData, bg.correct=FALSE,
>                         normalize=FALSE,
>                         pmcorrect.method="pmonly", summary.method="avgdiff",
>                         verbose=TRUE)
>
> I've got the result I wanted:
>               wt1    wt2    wt3    treat1    treat2    treat3
> gene_at    125.5    101    123.5    52.5    63.5    58
>
> The problem was that i expected them to be the other way around.

Why did you expect it to be the other way around? What exactly did you 
expect this call to expresso() to do?

Since affy is primarily based on S4 methods, it can be a bit difficult 
to figure out what a given function is going to do, so I can understand 
you not knowing what this call to expresso() is going to end up doing. 
However, what you are doing is pretty weird, no? The avgdiff method 
implies pm-mm, so what do you expect to happen if you then specify pmonly?

Given that the pmcorrect.method controls how we correct the PM probes, 
and there is a subtractmm option, one would normally assume that the 
'difference' part of avgdiff might happen in that step. But you said not 
to compute that, so all you are left with is the 'avg' part of avgdiff.

But let's set the logical assumptions aside and look at the actual code. 
Going through the code to expresso() is a bit like following Alice down 
the rabbit hole, so I will cut to the chase. In the end, you will be 
calling two pieces of code that will handle the pm adjustment and the 
summary statistic calculation. In your call to expresso() these will be 
(respectively):

 > pmcorrect.pmonly
function (object)
{
     return(pm(object))
}

and

 > generateExprVal.method.avgdiff
function (probes, ...)
{
     list(exprs = apply(probes, 2, median), se.exprs = apply(probes,
         2, sd)/sqrt(nrow(probes)))
}

So the pmonly will just give you the pm probes, and then avgdiff will 
give you the column medians of the pm data. Therefore you have basically 
told expresso() that you want the median value for the non-background 
corrected, unnormalized pm probes.

Was that your intention?

Best,

Jim

> So I decided to look into the specific probe values of the probes for this
> probe-set.
> This are the values I've got from the PM and MM respectively:
>      wt1    wt2    wt3    treat1    treat2    treat3
> probe1    403    379    220    420    530    316
> probe2    117    84    104    52    57    54
> probe3    49    49    73    38    58    52
> probe4    87    67    110    55    43    49
> probe5    66    61    51    46    72    62
> probe6    118    100    104    69    87    74
> probe7    180    142    170    45    46    45
> probe8    133    102    137    95    132    81
> probe9    80    71    65    52    54    46
> probe10    63    45    56    53    53    54
> probe11    293    321    260    444    618    408
> probe12    171    167    169    49    75    72
> probe13    198    197    307    40    67    50
> probe14    247    265    348    53    60    62
>
> probe1    533    519    294    507    739    404
> probe2    1789    1271    1468    1430    1666    1552
> probe3    56    66    59    51    45    48
> probe4    49    52    64    47    45    47
> probe5    54    47    33    49    65    55
> probe6    84    72    90    53    92    71
> probe7    73    72    65    40    53    54
> probe8    83    108    115    81    111    94
> probe9    49    56    43    52    41    53
> probe10    56    46    62    68    77    57
> probe11    54    83    55    47    64    46
> probe12    106    98    76    52    66    53
> probe13    43    48    37    36    52    39
> probe14    94    92    99    43    43    49
>
> When I calculate the average of these two tables for each array I don't get
> the same values as presented in the top table.
> I would like to understand how from the values on the last two tables I
> come to a summarized value I get. Even if I ignore the MM values
> completely, which I think it does, I still don't see how it comes to these
> values. The two probes (Nr. 1 and 11 of the PM values) are strongly differ
> from the rest of the probes for this probe-set. Are they being ignored in
> the summarization?
>
> Thanks in advance
>
> Assa
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099