[BioC] Loading quantiles normalized root data in XPS

Paul Geeleher paulgeeleher at gmail.com
Tue Feb 7 13:52:48 CET 2012


Hi Christian, apologies for the lack of clarity in my previous email,
I'll try to clear this up, see replies below:

On Mon, Feb 6, 2012 at 7:51 PM, cstrato <cstrato at aon.at> wrote:
> Dear Paul,
>
> I am afraid that it is not quite clear to me what you want to do.
>
> What do you mean with "obtain the expression levels of the probes"? Do you
> mean "probes" (i.e. oligos) or do you mean "probesets" as in your question
> about DABG?

I've read in and quantiles normalized probe level data for 176 arrays
and what I'm trying to do now is set every probe (individual probe,
not probeset) that is not detected above background to zero. But from
what I've read dabg.call() can only create p-values for probeset
level? So as compromise I'm now trying to set the expression of all
*individual probes* which belong to a *probeset* not detected above
background to zero. I hope that makes sense.

>
> How did you create "data_hapMap"? Maybe you could send me your complete code
> otherwise I am only able to guess.

"data_hapMap" was a mistake, sorry I copied the wrong object name it
should have been "data_qn" which is quantiles normalized probe level
data that I created like this:

data_hapMap <- root.data(scheme.huex10stv2,
"/data2/paul/normalization_project/root_data/hapMap_root_data_cel.root")
#read raw data
data_qn <- normalize.quantiles(data_hapMap, "exon_quantiles",
filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe*
level

So "data_qn" is the object I want to work with, based on your advice
I've figured out that to access the expression levels in "data_qn" I
need to use "attachInten()", then I can use "intensities()" to access
the expression levels:

treenames <- unlist(treeNames(data_qn))
data_qn <- attachInten(data_qn, treenames=treenames[1:2]) #attach the
first two samples

> head(intensity(data_qn))
  X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
1 0 0             0.0000              0.000
2 1 0             0.0000              0.000
3 2 0             0.0000              0.000
4 3 0             0.0000              0.000
5 4 0             0.0000              0.000
6 5 0            85.3523            121.733

Does the X column in the output above represent the probe ID? If so
and I have a mapping of probeset IDs to corresponding probe IDs, it
should be fairly straightforward to set probes that are not detected
above background to zero? Perhaps there is a more straightforward way
of doing this though?


>
> I guess that "data_hapMap" contains the raw data. For these the slot "data"
> is empty to save memory. So you need to use either attachData() or
> attachInten(). However since you are using exon arrays you may not have
> enough RAM, so it would be better to use function export() or export.data(),
> or attach only a subset, see help ?attachData. See also vignette xps.pdf
> (chapter 2.3).

I think RAM shouldn't be an issue if I attach the samples one at a
time? (I actually have access to a machine with 32gigs RAM but ideally
would like to get what I'm doing to run on a standard desktop, which
is actually why I'm using XPS!).

>
> When you talk about "expression matrix", how did you create it? Maybe you
> could use function validExpr(), but w/o seeing your code it is hard to tell.
> For DABG there are functions pvalData() and presCall(), see the examples in
> help ?dabg.call.

Yes I've managed to use dabg.call() at probeset level and access the
p-values using pvalData() alright.

Thanks again for all of your help and patience!

Kind Regards,

Paul.


>
> Best regards
> Christian
>
>
>
> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>
>> Hi Christian,
>>
>> Thanks for your quick and informative reply.
>>
>> I have re-run the analysis and saved the R objects as you suggest. The
>> next thing I'm trying to do is to obtain the expression levels of the
>> probes, but this doesn't seem to be working for me:
>>
>>> a<- validData(data_hapMap)
>>
>> Error in .local(object, ...) : slot "data" has no data
>>
>> Based on the documentation I think validData() is the correct function.
>>
>> I've also performed probeset level DABG and I'm trying to set
>> individual probes which belong to probesets with DABG<  .05 to 0 in
>> the expression matrix.
>>
>> But it seems I can't see the expression matrix using validData().
>> Perhaps there is another function. Any ideas?
>>
>> Thank you again for your help with this, I'm very grateful!
>>
>> Paul.
>>
>> On 2/2/12, cstrato<cstrato at aon.at>  wrote:
>>>
>>> Dear Paul,
>>>
>>> The functions root.data(), root.call() and root.expr() were created to
>>> allow you access to the corresponding root files just in case that you
>>> did not save your R session.
>>>
>>> In the cases where you compute expression levels stepwise, or only part
>>> of them such as normalize.quantiles(), as seems to be the matter in your
>>> case, there is no corresponding root.xxx() function to access the root
>>> file directly. In these cases you need to save your R session to have
>>> continued access to the resulting root file.
>>>
>>> Please note that saving the R session is the usual case to have access
>>> to the root files.
>>>
>>> Best regards
>>> Christian
>>>
>>>
>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>
>>>> Hi Christian,
>>>>
>>>> Thanks for your quick reply. I check what kind of trees I have using
>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu"
>>>> rather than "int", this is presumably because my analysis required no
>>>> background correction step?
>>>>
>>>> So I then tried:
>>>>>
>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu")
>>>>
>>>>
>>>> but that gives me a huge number of errors that look like this:
>>>>
>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>> Error: Could not get directory<PreprocesSet>.
>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>> Error: Could not get directory<PreprocesSet>.
>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>> Error: Could not get directory<PreprocesSet>.
>>>> Error: Could not get tree<ExportSet>.
>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root",  :
>>>>    error in function ‘ExportData’
>>>>
>>>>
>>>> This file "exon_quantiles.root" definitely exists in the current
>>>> working directory though... Thanks again for your help!
>>>>
>>>> Paul.
>>>>
>>>>
>>>>
>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at aon.at>   wrote:
>>>>>
>>>>> Dear Paul,
>>>>>
>>>>> Please have a look at the help ?root.expr.
>>>>>
>>>>> If I understand you correctly, you did only do quantile normalization?
>>>>>
>>>>> To see the tree names in your file you should do:
>>>>>>
>>>>>> getTreeNames("exon_quantiles.root")
>>>>>
>>>>>
>>>>> You will probably see trees with extension "int", see help
>>>>> ?validTreetype.
>>>>>
>>>>> To load these trees you need to do:
>>>>>>
>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int")
>>>>>
>>>>>
>>>>> Please let me know if this did solve your problem.
>>>>>
>>>>> Best regards
>>>>> Christian
>>>>> _._._._._._._._._._._._._._._._._._
>>>>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>>>>> V.i.e.n.n.a           A.u.s.t.r.i.a
>>>>> e.m.a.i.l:        cstrato at aon.at
>>>>> _._._._._._._._._._._._._._._._._._
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon
>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", but
>>>>>> if I try to load it the same was I'd load my raw data (using the
>>>>>> scheme file I created for Affy exon arrays) I get the error below? I
>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a
>>>>>> different "root scheme" file for this normalized data? Unfortunately,
>>>>>> I haven't been able to find an answer.
>>>>>>
>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>>>
>>>>>>
>>>>>> Error in if (chipname != treetitle) { : argument is of length zero
>>>>>>
>>>>>> Hope someone can help,
>>>>>>
>>>>>> Paul.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> sessionInfo()
>>>>>>
>>>>>>
>>>>>> R version 2.11.0 (2010-04-22)
>>>>>> x86_64-redhat-linux-gnu
>>>>>>
>>>>>> locale:
>>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>



-- 
Paul Geeleher (PhD Student)
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com



More information about the Bioconductor mailing list