[BioC] Manual annotation of ExpressionSet object created from scratch

Sean Davis sdavis2 at mail.nih.gov
Tue Oct 14 00:24:58 CEST 2008


On Mon, Oct 13, 2008 at 6:00 PM, Michael Muratet
<mmuratet at hudsonalpha.org> wrote:
>
> On Oct 13, 2008, at 4:48 PM, Sean Davis wrote:
>
>> On Mon, Oct 13, 2008 at 5:34 PM, Michael Muratet
>> <mmuratet at hudsonalpha.org> wrote:
>>>
>>> Greetings
>>>
>>> I have an ExpressionSet object that I created from scratch with
>>> expression
>>> data for features identified with Ensembl transcript IDs. The
>>> ExpressionSet
>>> constructor wants a character string for annotation data. Is there
>>> another
>>> way to populate the slot? From an AnnotatedDataFrame? Should I write a
>>> function that pulls in the data with biomaRt?
>>
>> Hi, Mike.  Perhaps you can show us what you mean.  If you are talking
>> about the annotation data slot, that is meant to be the string name of
>> the annotation data package associated with the array.  I guess that
>> you do not have an annotation data package for the array, so you can
>> leave out that slot when creating the ExpressionSet.  If you have
>> problems, it is best to post the code and, of course, your
>> sessionInfo().
>
> Sean
>
> Here's what I'm trying to do....
>
>> library("Biobase")
>> exprMatrix <- as.matrix(read.table("exprset.txt", header=TRUE, sep="\t",
>> row.names=1, as.is=TRUE))
>> pData <- read.table("phenoData.txt", row.names=1, header=TRUE, sep="\t")
>> phenoData <- new("AnnotatedDataFrame", data=pData)
>> rnaseq_exprs <- new("ExpressionSet", exprs=exprMatrix,
>> phenoData=phenoData)
>> save(rnaseq_exprs, file="rnaseq_data.Robj")
>>
>>
>
> The data consists of RNAseq reads that I have mapped to Ensembl transcripts
> and normalized appropriately, e.g.,
>
>        SL265   SL264   SL266   SL310   SL312   SL313
> ENST00000369829 0       0       0       0.00288159443768686
> 0.000696405393229021    0.000473063478950364
> ENST00000393415 0       0       0       0.000428628056614047
>  0.000621528594887718    0.00047497519763826
>
> So far this looks like a fairly useful way of looking at the data.
>
> I'd like to be able to use all the functionality I see in the docs for
> annotation of ExpressionSets. The ExpressionSet vignette talks about using
> an AnnotatedData frame but it doesn't really say where it goes. I haven't
> seen an annotation data package for Ensembl although I see how you might be
> able to create one with biomaRt. I'm looking for some expert advice so I
> don't go down any blind alleys.

For building annotation packages, see the AnnotationDbi package and
the SQLForge vignette.  See the Vignettes in Biobase for discussion of
AnnotatedDataFrame.  In short, though, an ExpressionSet contains two
AnnotatedDataFrames, one for the sample information (the phenoData)
and the other for the features on the array (the featureData).  The
featureData slot is often redundant if you build an annotation data
package.  However, you could use it to store a data frame of data from
ensembl if you like.

Sean



More information about the Bioconductor mailing list