[BioC] RE : create an new AnnotatedDataFrame

Thu Aug 9 15:52:58 CEST 2007

Hi Yad --

An AnnotatedDataFrame is a data frame, with additional information
about the columns of the data frame (such as a longer description of
the column name). The 'additional information about tthe columns' is
the 'varMetadata' (i.e., *Metadata* about the *var*iables in the data
frame).  The varMetadata is itself a data frame, and it must have a
column named 'labelDescription'.

You need to create an AnnotatedDataFrame for your feature
data. Suppose the 'MetaData' file is D:/MetaData.txt, then read the
data into a data.frame with something like

> featureDataFrame <- read.table("D:/MetaData",hheader=TRUE,
+ sep="",as.is=TRUE)

you might also have 'meta-data' about the columns in the
featureDataFrame, which you would arrange in another data.frame, but
this is not essential.

Then create an AnnotatedDataFrame

> featureData <- new("AnnotatedDataFrame", data=featureDataFrame)

and finally use this, along with the matrix of expression value, to
create an ExpressionSet

> es <- new("ExpressionSet", exprs=exprs, featureData=featureData)

If you had phenotype data as well, you could do the same steps

> phenoDataFile <- "D:/phenoData.txt"
> phenoDataFrame <- read.table(phenoDataFile,
+ header=TRUE,sep="",as.is=TRUE)
> phenoData <- new("AnnotatedDataFrame", data=phenoDataFrame)

and create an ExpressionSet with both phenoData and featureData

> es <- new("ExpressionSet", exprs=exprs, phenoData=phenoData,
+ featureData=featureData)

Notice that featureData is really meant to mark up features with
information _unique to the experiment_; what you might really want to
do is to create an annotation package, as this can then be used by
other experiments on the same chip, and by other Bioconductor software
packages that expect something in the 'annotation' slot of an
ExpressionSet. This could be fairly challenging, but would be better
in the long run.  Also, ExpressionSets are produced or readily
constructed from several of the preprocessing steps that are often
used, so perhaps creating an expression set 'by hand' is not really
what you want to be doing.

Martin

"GHAVI-HELM Yad" <yad.ghavi-helm at cea.fr> writes:

> Hi Martin,
>
>>AnnotatedDataFrame coordinates a data.frame with it's metadata. From
>>your naming convention, I'm guessing that what your command is doing
>>is trying to coordinate an expression matrix with its varMetadata.
>
> In fact, that's exactely what I want to do.
> Since I don't have any annotation package for the chips I'm using, 
> I would like to add usefull informations (for exemple correspondance between oligs ID's and Genenames) in the mData.
>
> My MetaData file looks like :
>
> 		gene	
> A_75_P0000001	TEL01L	
> A_75_P0000002	YAL067W-A	
> A_75_P0000003	YAL067C	
> A_75_P0000004	YAL067C	
> A_75_P0000005	YAL067C	
> A_75_P0000006	YAL067C	
> A_75_P0000007	YAL067C	
> A_75_P0000008	YAL067C	
>
> with as many features as my assayData (exprs2)
>
> I think it is possible to do this, because I read: 
>
> "It is also possible to record information about features that are
> unique to the experiment (e.g.,flagging particularly relevant
> features). This is done by creating or modifying an Annotated Data
> Frame like that for phenoData but with rownames of the
> AnnotatedDataFrame matching rows of the assaydata."
>
> in the Biobase "ExpressionSetIntroduction.pdf" manual.
>
> Yad.
>
>
> -------- Message d'origine--------
> De: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Date: mer. 08/08/2007 18:54
> À: GHAVI-HELM Yad
> Cc: Bioconductor at stat.math.ethz.ch
> Objet : Re: [BioC] create an new AnnotatedDataFrame
>  
> "GHAVI-HELM Yad" <yad.ghavi-helm at cea.fr> writes:
>
>>
>>
>> exprsFile<-"D:/exprsData.txt"
>> exprs<-read.table(exprsFile,header=TRUE,sep="",as.is=TRUE)
>>
>> pDataFile<-"D:/pData.txt"
>> pData<-read.table(pDataFile,header=TRUE, sep="", as.is=TRUE)
>>
>> metaData<-"D:/mData.txt"
>> mData<-read.table(metaData,header=TRUE,sep="",as.is=TRUE)
>> metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData)
>>
>> At this step I have the following error: 
>> Error in `row.names<-.data.frame`(`*tmp*`, value = c("A", "B")) : 
>>         length of 'row.names' incorrect
>>
>> It seems strangle because "A" and "B" are the colnames of exprsData
>> (or the rownames of pData).
>
> AnnotatedDataFrame coordinates a data.frame with it's metadata. From
> your naming convention, I'm guessing that what your command is doing
> is trying to coordinate an expression matrix with its varMetadata. I
> think what you want to do is
>
>> phenoData = new("AnnotatedDataFrame", data=pData, varMetadata=mData)
>
> You might then use this to create an ExpressionSet (for example)
>
>> new("ExpressionSet", exprs=exprs, phenoData=phenoData)
>
> The read.AnnotatedDataFrame page might provide some additional hints
> on reading data from files; a warning is that read.AnnotatedDataFrame
> will change (hopefully for the better) in the next release of
> Bioconductor.
>
> Hope that helps,
>
> Martin
>
>> I tried to do :
>>
>> metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData, row.names=1)
>>
>> or
>>
>> rown=rownames(exprs)
>> metData<-new("AnnotatedDataFrame",data=exprs2,varMetadata=mData, row.names=rown)
>>
>>
>> but I steel got the same error
>>
>> hope anyone could help me...
>>
>>
>>> sessionInfo()
>> R version 2.5.0 (2007-04-23) 
>> i386-pc-mingw32 
>>
>> locale:
>> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252
>>
>> attached base packages:
>>  [1] "tcltk"     "splines"   "tools"     "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
>>
>> other attached packages:
>>       YEAST     convert      marray   tkWidgets      DynDoc widgetTools  arrayMagic  genefilter    survival         vsn        affy      affyio       limma 
>>    "1.16.0"    "1.10.0"    "1.14.0"    "1.14.0"    "1.14.0"    "1.12.0"    "1.14.0"    "1.14.1"      "2.31"     "2.2.0"    "1.14.2"     "1.4.1"    "2.10.5" 
>>     Biobase 
>>    "1.14.1" 
>>  
>>
>>
>> Yad.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> -- 
> Martin Morgan
> Bioconductor / Computational Biology
> http://bioconductor.org
>

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org