[BioC] Loading quantiles normalized root data in XPS

Paul Geeleher paulgeeleher at gmail.com
Tue Feb 7 14:17:19 CET 2012


Actually doing what I'm talking about relies on being able to modify
values in "intensity(data_qn)", which I was assuming one could do but
testing it I don't think I can!?

> head(intensity(data_qn))
  X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
1 0 0             0.0000              0.000
2 1 0             0.0000              0.000
3 2 0             0.0000              0.000
4 3 0             0.0000              0.000
5 4 0             0.0000              0.000
6 5 0            85.3523            121.733

> intensity(data_qn)[6,3]
[1] 85.3523
> intensity(data_qn)[6,3] <- 0
Error in dimnames(x) : 'x' is missing

> head(intensity(data_qn))
  X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
1 0 0             0.0000              0.000
2 1 0             0.0000              0.000
3 2 0             0.0000              0.000
4 3 0             0.0000              0.000
5 4 0             0.0000              0.000
6 5 0            85.3523            121.733

This means a different approach may be necessary.

What I can do is extract the probe level estimates for each gene,
which I think you outlined in a previous email to the list
(https://mailman.stat.ethz.ch/pipermail/bioconductor/2011-September/040963.html)
and from there set those probes which aren't detected above background
to zero (once I've figured out how to map probesets to probes). I can
the do what I need to do with this data (which is basically to fit a
model based on individual probe intensities for each gene) and do this
for every gene, which won't involve modifying the intensities in
data_qn and as I'm only "attaching" probe level data for one gene at a
time, shouldn't use that much memory.

Paul

On Tue, Feb 7, 2012 at 12:52 PM, Paul Geeleher <paulgeeleher at gmail.com> wrote:
> Hi Christian, apologies for the lack of clarity in my previous email,
> I'll try to clear this up, see replies below:
>
> On Mon, Feb 6, 2012 at 7:51 PM, cstrato <cstrato at aon.at> wrote:
>> Dear Paul,
>>
>> I am afraid that it is not quite clear to me what you want to do.
>>
>> What do you mean with "obtain the expression levels of the probes"? Do you
>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your question
>> about DABG?
>
> I've read in and quantiles normalized probe level data for 176 arrays
> and what I'm trying to do now is set every probe (individual probe,
> not probeset) that is not detected above background to zero. But from
> what I've read dabg.call() can only create p-values for probeset
> level? So as compromise I'm now trying to set the expression of all
> *individual probes* which belong to a *probeset* not detected above
> background to zero. I hope that makes sense.
>
>>
>> How did you create "data_hapMap"? Maybe you could send me your complete code
>> otherwise I am only able to guess.
>
> "data_hapMap" was a mistake, sorry I copied the wrong object name it
> should have been "data_qn" which is quantiles normalized probe level
> data that I created like this:
>
> data_hapMap <- root.data(scheme.huex10stv2,
> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.root")
> #read raw data
> data_qn <- normalize.quantiles(data_hapMap, "exon_quantiles",
> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe*
> level
>
> So "data_qn" is the object I want to work with, based on your advice
> I've figured out that to access the expression levels in "data_qn" I
> need to use "attachInten()", then I can use "intensities()" to access
> the expression levels:
>
> treenames <- unlist(treeNames(data_qn))
> data_qn <- attachInten(data_qn, treenames=treenames[1:2]) #attach the
> first two samples
>
>> head(intensity(data_qn))
>  X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
> 1 0 0             0.0000              0.000
> 2 1 0             0.0000              0.000
> 3 2 0             0.0000              0.000
> 4 3 0             0.0000              0.000
> 5 4 0             0.0000              0.000
> 6 5 0            85.3523            121.733
>
> Does the X column in the output above represent the probe ID? If so
> and I have a mapping of probeset IDs to corresponding probe IDs, it
> should be fairly straightforward to set probes that are not detected
> above background to zero? Perhaps there is a more straightforward way
> of doing this though?
>
>
>>
>> I guess that "data_hapMap" contains the raw data. For these the slot "data"
>> is empty to save memory. So you need to use either attachData() or
>> attachInten(). However since you are using exon arrays you may not have
>> enough RAM, so it would be better to use function export() or export.data(),
>> or attach only a subset, see help ?attachData. See also vignette xps.pdf
>> (chapter 2.3).
>
> I think RAM shouldn't be an issue if I attach the samples one at a
> time? (I actually have access to a machine with 32gigs RAM but ideally
> would like to get what I'm doing to run on a standard desktop, which
> is actually why I'm using XPS!).
>
>>
>> When you talk about "expression matrix", how did you create it? Maybe you
>> could use function validExpr(), but w/o seeing your code it is hard to tell.
>> For DABG there are functions pvalData() and presCall(), see the examples in
>> help ?dabg.call.
>
> Yes I've managed to use dabg.call() at probeset level and access the
> p-values using pvalData() alright.
>
> Thanks again for all of your help and patience!
>
> Kind Regards,
>
> Paul.
>
>
>>
>> Best regards
>> Christian
>>
>>
>>
>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>
>>> Hi Christian,
>>>
>>> Thanks for your quick and informative reply.
>>>
>>> I have re-run the analysis and saved the R objects as you suggest. The
>>> next thing I'm trying to do is to obtain the expression levels of the
>>> probes, but this doesn't seem to be working for me:
>>>
>>>> a<- validData(data_hapMap)
>>>
>>> Error in .local(object, ...) : slot "data" has no data
>>>
>>> Based on the documentation I think validData() is the correct function.
>>>
>>> I've also performed probeset level DABG and I'm trying to set
>>> individual probes which belong to probesets with DABG<  .05 to 0 in
>>> the expression matrix.
>>>
>>> But it seems I can't see the expression matrix using validData().
>>> Perhaps there is another function. Any ideas?
>>>
>>> Thank you again for your help with this, I'm very grateful!
>>>
>>> Paul.
>>>
>>> On 2/2/12, cstrato<cstrato at aon.at>  wrote:
>>>>
>>>> Dear Paul,
>>>>
>>>> The functions root.data(), root.call() and root.expr() were created to
>>>> allow you access to the corresponding root files just in case that you
>>>> did not save your R session.
>>>>
>>>> In the cases where you compute expression levels stepwise, or only part
>>>> of them such as normalize.quantiles(), as seems to be the matter in your
>>>> case, there is no corresponding root.xxx() function to access the root
>>>> file directly. In these cases you need to save your R session to have
>>>> continued access to the resulting root file.
>>>>
>>>> Please note that saving the R session is the usual case to have access
>>>> to the root files.
>>>>
>>>> Best regards
>>>> Christian
>>>>
>>>>
>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Thanks for your quick reply. I check what kind of trees I have using
>>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu"
>>>>> rather than "int", this is presumably because my analysis required no
>>>>> background correction step?
>>>>>
>>>>> So I then tried:
>>>>>>
>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu")
>>>>>
>>>>>
>>>>> but that gives me a huge number of errors that look like this:
>>>>>
>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<PreprocesSet>.
>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<PreprocesSet>.
>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<PreprocesSet>.
>>>>> Error: Could not get tree<ExportSet>.
>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root",  :
>>>>>    error in function ‘ExportData’
>>>>>
>>>>>
>>>>> This file "exon_quantiles.root" definitely exists in the current
>>>>> working directory though... Thanks again for your help!
>>>>>
>>>>> Paul.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at aon.at>   wrote:
>>>>>>
>>>>>> Dear Paul,
>>>>>>
>>>>>> Please have a look at the help ?root.expr.
>>>>>>
>>>>>> If I understand you correctly, you did only do quantile normalization?
>>>>>>
>>>>>> To see the tree names in your file you should do:
>>>>>>>
>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>
>>>>>>
>>>>>> You will probably see trees with extension "int", see help
>>>>>> ?validTreetype.
>>>>>>
>>>>>> To load these trees you need to do:
>>>>>>>
>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int")
>>>>>>
>>>>>>
>>>>>> Please let me know if this did solve your problem.
>>>>>>
>>>>>> Best regards
>>>>>> Christian
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>>>>>> V.i.e.n.n.a           A.u.s.t.r.i.a
>>>>>> e.m.a.i.l:        cstrato at aon.at
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon
>>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", but
>>>>>>> if I try to load it the same was I'd load my raw data (using the
>>>>>>> scheme file I created for Affy exon arrays) I get the error below? I
>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a
>>>>>>> different "root scheme" file for this normalized data? Unfortunately,
>>>>>>> I haven't been able to find an answer.
>>>>>>>
>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>>>>
>>>>>>>
>>>>>>> Error in if (chipname != treetitle) { : argument is of length zero
>>>>>>>
>>>>>>> Hope someone can help,
>>>>>>>
>>>>>>> Paul.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> sessionInfo()
>>>>>>>
>>>>>>>
>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>
>>>>>>> locale:
>>>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
>
> --
> Paul Geeleher (PhD Student)
> School of Mathematics, Statistics and Applied Mathematics
> National University of Ireland
> Galway
> Ireland
> --
> www.bioinformaticstutorials.com



-- 
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com



More information about the Bioconductor mailing list