[BioC] Loading quantiles normalized root data in XPS

cstrato cstrato at aon.at
Tue Feb 7 21:48:39 CET 2012


Dear Paul,

see replies below:

On 2/7/12 1:52 PM, Paul Geeleher wrote:
> Hi Christian, apologies for the lack of clarity in my previous email,
> I'll try to clear this up, see replies below:
>
> On Mon, Feb 6, 2012 at 7:51 PM, cstrato<cstrato at aon.at>  wrote:
>> Dear Paul,
>>
>> I am afraid that it is not quite clear to me what you want to do.
>>
>> What do you mean with "obtain the expression levels of the probes"? Do you
>> mean "probes" (i.e. oligos) or do you mean "probesets" as in your question
>> about DABG?
>
> I've read in and quantiles normalized probe level data for 176 arrays
> and what I'm trying to do now is set every probe (individual probe,
> not probeset) that is not detected above background to zero. But from
> what I've read dabg.call() can only create p-values for probeset
> level? So as compromise I'm now trying to set the expression of all
> *individual probes* which belong to a *probeset* not detected above
> background to zero. I hope that makes sense.
>


Did you read the exon array whitepapers which you can download from 
Affymetrix?

Every probeset consists of 1-4 probes only, and every exon consists 
usually of 1-2 probesets. Each gene has a transcript_cluster_id, which 
consists of one or more probeset_ids. (You can see the mapping between 
ids using function export.scheme(..,treetype="anp",..)

Since the smallest unit is the probeset, function dabg.call() will only 
work at the probeset (and transcript) level. If you set all probes of a 
probeset to zero you may loose an entire exon.


>>
>> How did you create "data_hapMap"? Maybe you could send me your complete code
>> otherwise I am only able to guess.
>
> "data_hapMap" was a mistake, sorry I copied the wrong object name it
> should have been "data_qn" which is quantiles normalized probe level
> data that I created like this:
>
> data_hapMap<- root.data(scheme.huex10stv2,
> "/data2/paul/normalization_project/root_data/hapMap_root_data_cel.root")
> #read raw data
> data_qn<- normalize.quantiles(data_hapMap, "exon_quantiles",
> filedir=rootdata_Dir, exonlevel="all") #quantiles normalize at *probe*
> level
>
> So "data_qn" is the object I want to work with, based on your advice
> I've figured out that to access the expression levels in "data_qn" I
> need to use "attachInten()", then I can use "intensities()" to access
> the expression levels:
>
> treenames<- unlist(treeNames(data_qn))
> data_qn<- attachInten(data_qn, treenames=treenames[1:2]) #attach the
> first two samples
>
>> head(intensity(data_qn))
>    X Y GSM188869.cqu_MEAN GSM188870.cqu_MEAN
> 1 0 0             0.0000              0.000
> 2 1 0             0.0000              0.000
> 3 2 0             0.0000              0.000
> 4 3 0             0.0000              0.000
> 5 4 0             0.0000              0.000
> 6 5 0            85.3523            121.733
>
> Does the X column in the output above represent the probe ID? If so
> and I have a mapping of probeset IDs to corresponding probe IDs, it
> should be fairly straightforward to set probes that are not detected
> above background to zero? Perhaps there is a more straightforward way
> of doing this though?
>

No, (X,Y) are the coordinates of the single probes on the exon array. 
You can use function export.scheme(..,treetype="scm",..) to get the 
mapping between (X,Y) and an internal PROBESET_ID, e.g.:

UNIT_ID	X	Y	ProbeLength	Mask	EXON_ID	PROBESET_ID
31	986	1674	25	512	31	31
31	1092	677	25	512	31	31
31	796	1862	25	512	31	31
31	917	193	25	512	31	31
31	341	1677	25	512	32	32
31	144	2250	25	512	32	32
31	689	262	25	512	32	32
31	579	1670	25	512	32	32

Then you can use export.scheme(..,treetype="pbs",..) to map PROBESET_ID 
(=UNIT_ID) to the ProbesetID, e.g.:

UNIT_ID	ProbesetID	NumCells	NumAtoms	NumSubunits	UnitType
31	2315101	4	4	1	512
32	2315102	4	4	1	512

As you see PROBESET_IDs 31 and 32 have each 4 probes and belong to the 
Affymetrix ProbesetIDs 2315101 and 2315102, respectively.

You could also use functions indexUnits(), and unitID2probesetID() or 
unitID2transcriptID(), respectively.

Best regards
Christian


>
>>
>> I guess that "data_hapMap" contains the raw data. For these the slot "data"
>> is empty to save memory. So you need to use either attachData() or
>> attachInten(). However since you are using exon arrays you may not have
>> enough RAM, so it would be better to use function export() or export.data(),
>> or attach only a subset, see help ?attachData. See also vignette xps.pdf
>> (chapter 2.3).
>
> I think RAM shouldn't be an issue if I attach the samples one at a
> time? (I actually have access to a machine with 32gigs RAM but ideally
> would like to get what I'm doing to run on a standard desktop, which
> is actually why I'm using XPS!).
>
>>
>> When you talk about "expression matrix", how did you create it? Maybe you
>> could use function validExpr(), but w/o seeing your code it is hard to tell.
>> For DABG there are functions pvalData() and presCall(), see the examples in
>> help ?dabg.call.
>
> Yes I've managed to use dabg.call() at probeset level and access the
> p-values using pvalData() alright.
>
> Thanks again for all of your help and patience!
>
> Kind Regards,
>
> Paul.
>
>
>>
>> Best regards
>> Christian
>>
>>
>>
>> On 2/6/12 4:33 PM, Paul Geeleher wrote:
>>>
>>> Hi Christian,
>>>
>>> Thanks for your quick and informative reply.
>>>
>>> I have re-run the analysis and saved the R objects as you suggest. The
>>> next thing I'm trying to do is to obtain the expression levels of the
>>> probes, but this doesn't seem to be working for me:
>>>
>>>> a<- validData(data_hapMap)
>>>
>>> Error in .local(object, ...) : slot "data" has no data
>>>
>>> Based on the documentation I think validData() is the correct function.
>>>
>>> I've also performed probeset level DABG and I'm trying to set
>>> individual probes which belong to probesets with DABG<    .05 to 0 in
>>> the expression matrix.
>>>
>>> But it seems I can't see the expression matrix using validData().
>>> Perhaps there is another function. Any ideas?
>>>
>>> Thank you again for your help with this, I'm very grateful!
>>>
>>> Paul.
>>>
>>> On 2/2/12, cstrato<cstrato at aon.at>    wrote:
>>>>
>>>> Dear Paul,
>>>>
>>>> The functions root.data(), root.call() and root.expr() were created to
>>>> allow you access to the corresponding root files just in case that you
>>>> did not save your R session.
>>>>
>>>> In the cases where you compute expression levels stepwise, or only part
>>>> of them such as normalize.quantiles(), as seems to be the matter in your
>>>> case, there is no corresponding root.xxx() function to access the root
>>>> file directly. In these cases you need to save your R session to have
>>>> continued access to the resulting root file.
>>>>
>>>> Please note that saving the R session is the usual case to have access
>>>> to the root files.
>>>>
>>>> Best regards
>>>> Christian
>>>>
>>>>
>>>> On 2/2/12 1:12 PM, Paul Geeleher wrote:
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Thanks for your quick reply. I check what kind of trees I have using
>>>>> "getTreeNames()" as you'd suggested, it seems they are of type "cqu"
>>>>> rather than "int", this is presumably because my analysis required no
>>>>> background correction step?
>>>>>
>>>>> So I then tried:
>>>>>>
>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "cqu")
>>>>>
>>>>>
>>>>> but that gives me a huge number of errors that look like this:
>>>>>
>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<PreprocesSet>.
>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<PreprocesSet>.
>>>>> Error in<TFile::cd>: Unknown directory PreprocesSet
>>>>> Error: Could not get directory<PreprocesSet>.
>>>>> Error: Could not get tree<ExportSet>.
>>>>> Error in root.expr(scheme.huex10stv2, "exon_quantiles.root",  :
>>>>>     error in function ‘ExportData’
>>>>>
>>>>>
>>>>> This file "exon_quantiles.root" definitely exists in the current
>>>>> working directory though... Thanks again for your help!
>>>>>
>>>>> Paul.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 1, 2012 at 9:01 PM, cstrato<cstrato at aon.at>     wrote:
>>>>>>
>>>>>> Dear Paul,
>>>>>>
>>>>>> Please have a look at the help ?root.expr.
>>>>>>
>>>>>> If I understand you correctly, you did only do quantile normalization?
>>>>>>
>>>>>> To see the tree names in your file you should do:
>>>>>>>
>>>>>>> getTreeNames("exon_quantiles.root")
>>>>>>
>>>>>>
>>>>>> You will probably see trees with extension "int", see help
>>>>>> ?validTreetype.
>>>>>>
>>>>>> To load these trees you need to do:
>>>>>>>
>>>>>>> data_qn<- root.expr(scheme.huex10stv2, "exon_quantiles.root", "int")
>>>>>>
>>>>>>
>>>>>> Please let me know if this did solve your problem.
>>>>>>
>>>>>> Best regards
>>>>>> Christian
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>>>>>> V.i.e.n.n.a           A.u.s.t.r.i.a
>>>>>> e.m.a.i.l:        cstrato at aon.at
>>>>>> _._._._._._._._._._._._._._._._._._
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/1/12 7:07 PM, Paul Geeleher wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've used xps to quantiles normalize (at probe level) some Affy Exon
>>>>>>> Array data. I now have a "root" file called "exon_quantiles.root", but
>>>>>>> if I try to load it the same was I'd load my raw data (using the
>>>>>>> scheme file I created for Affy exon arrays) I get the error below? I
>>>>>>> can load my raw data just fine though. Any ideas? Do I perhaps need a
>>>>>>> different "root scheme" file for this normalized data? Unfortunately,
>>>>>>> I haven't been able to find an answer.
>>>>>>>
>>>>>>>> scheme.huex10stv2<- root.scheme("huex10stv2.root")
>>>>>>>> data_qn<- root.data(scheme.huex10stv2, "exon_quantiles.root")
>>>>>>>
>>>>>>>
>>>>>>> Error in if (chipname != treetitle) { : argument is of length zero
>>>>>>>
>>>>>>> Hope someone can help,
>>>>>>>
>>>>>>> Paul.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> sessionInfo()
>>>>>>>
>>>>>>>
>>>>>>> R version 2.11.0 (2010-04-22)
>>>>>>> x86_64-redhat-linux-gnu
>>>>>>>
>>>>>>> locale:
>>>>>>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>    [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>>    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
>



More information about the Bioconductor mailing list