[BioC] exonmap RMA function

Fri Jan 18 18:52:51 CET 2008

Jay an <jayuan2007 at yahoo.com> writes:

> the whole R source code shoud be:
>   >raw.data <- read.exon()
>> raw.data at cdfName <- "exon.pmcdf"
>> x.rma <- rma(raw.data)
>> write.table(x.rma, file="x.csv", sep="\t")
>
>   the generated file x.csv has 1,411,190 conlumns titles "x231501" "X2315102"....
>   can you please answer the following questions?
>   1. how to find relation bewteen probset and probes in exonmap library?

These are in the exon.pmcdf environment, e.g,.

> library(exonmap)
> library(exon.pmcdf)
> data(exonmap)
> mget(featureNames(x.rma), exon.pmcdf)

library(exon.pmcdf) attaches the exon.pmcdf environment to the search
path.  data(exonmap) loads the sample data set 'x.rma' from the
exonmap package. data(exonmap) is included here to provide an example
that all on the list can reproduce; do not perform this step if you
have your own data. x.rma is of class 'ExpressionSet', and is a subset
of the result of applying rma to six CEL files as described on its
help page (?x.rma). featureNames extracts the names of the 'features'
(probe names) of the ExpressionSet, as described on the ExpressionSet
help page (?ExpressionSet). mget looks up each feature name in the
exon.pmcdf environment. The return value of mget is a named list (as
described in the mget help page, ?mget), with each element in the list
corresponding to an element in featureName, and containing the probes
mapping to that featureName.

This assumes that you have downloaded and installed the exon.pmcdf
library, following the instructions in the exonmap package.

>   2. is it possible to find normalized value for each probe value ?

Follow Kasper's suggestion and review your understanding of the rma
function (see the rma help page, ?rma for references): the rma
function performs background correction, normalization, and
summarization of probes into probe sets, so it does not make sense to
ask for probe-level values from rma. Perhaps you would like to perform
background correction and / or normalization without summary. See the
help page ?bg.correct or ?normalize in the 'affy' package, or the
affyPLM package for more advanced use.

>> can you please tell me what the meaning of columns? are they
>> expression levels for all probesets?

Each column is a different probe set. Each row is a different
sample. Each probe set name is prefixed with an 'X'. This is a
consequence of using write.table. For an ExpressionSet, write.table
also appends columns of phenotypic data, so several columns at the end
of the table may not be probe sets.

As mentioned by James, write.exprs is usually more appropriate.
write.exprs saves the expression values with rows as probe sets and
columns as samples. In this case, row names (i.e., probe sets) are not
prefixed with 'x'.

One would normally only use these functions to export the data for use
in other programs. Methods described on the 'ExpressionSet' help page
(?ExpressionSet) would be used in an interactive session or R script.

Martin

>> how can i get RMA expression level  for each probe?
>
>
> Kasper Daniel Hansen <khansen at stat.Berkeley.EDU> wrote:
>   
> On Jan 17, 2008, at 4:57 PM, Jay an wrote:
>
>> Thanks Martin and James,
>>
>> after doing write.table(x.rma, file="x.csv", sep="\t")
>> the file x.csv has 1,411,190 conlumns titles "x231501" "X2315102"....
>> can you please tell me what the meaning of columns? are they
>> expression levels for all probesets?
>> how can i get RMA expression level for each probe?
>
> What do you mean by this. Typically summarization is considered part 
> of the RMA method and after summarization you have probeset level 
> values, not probe level values.
>
> Kasper
>
>
>>
>> Yuan
>>
>>
>> Martin Morgan wrote:
>> "James W. MacDonald" writes:
>>
>>> Hi Yuan,
>>>
>>> Jay an wrote:
>>>> Hello,
>>>> I got several .CEL files to do RMA using exonmap library below:
>>>>> raw.data <- read.exon()
>>>>> raw.data at cdfName <- "exon.pmcdf"
>>>>> x.rma <- rma(raw.data)
>>>>> write.table(x.rma, file="x.csv", sep="\t")
>>>
>>> I'm surprised that worked. If you want to write out the expression
>>
>> There's a method that converts an ExpressionSet into a data.frame,
>> taking the transpose of the expression values and cbind'ing (adding as
>> columns) the phenotypic data.
>>
>>> library(Biobase)
>>> data(sample.ExpressionSet)
>>> names(as(sample.ExpressionSet, "data.frame"))
>> [1] "AFFX.MurIL2_at" "AFFX.MurIL10_at"
>> [3] "AFFX.MurIL4_at" "AFFX.MurFAS_at"
>> [5] "AFFX.BioB.5_at" "AFFX.BioB.M_at"
>> ...
>> [499] "X31738_at" "X31739_at"
>> [501] "sex" "type"
>> [503] "score"
>>
>> This gets invoked by write.table and other functions that try to
>> coerce their arguments to a data.frame. This can be convenient for
>> instance in machine learning, where you can write a formula with '.'
>> to indicate 'all columns' of a data.frame, and just provide the
>> (probably filtered) ExpressionSet as the data argument.
>>
>> The affyPLM package might help to 'get RMA expression levels for each
>> probe'.
>>
>> Martin
>>
>>> values you want to use write.exprs(), not write.table(), which is
>>> intended to write out data.frames or matrices (and x.rma is neither).
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>> the file x.csv has 1,411,190 conlumns titles "x231501" 
>>>> "X2315102"....
>>>> can you please tell me what the meaning of columns? are they
>>>> expression levels for all probesets?
>>>> how can i get RMA expression level for each probe?
>>>>
>>>>
>>>> thanks
>>>> Yuan
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/ 
>>>> gmane.science.biology.informatics.conductor
>>>
>>> -- 
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> Affymetrix and cDNA Microarray Core
>>> University of Michigan Cancer Center
>>> 1500 E. Medical Center Drive
>>> 7410 CCGC
>>> Ann Arbor MI 48109
>>> 734-647-5623
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/ 
>>> gmane.science.biology.informatics.conductor
>>
>> -- 
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M2 B169
>> Phone: (206) 667-2793
>>
>>
>>
>> ---------------------------------
>>
>>
>> ---------------------------------
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/ 
>> gmane.science.biology.informatics.conductor
>
>
>
>        
> ---------------------------------
>
>        
> ---------------------------------
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793