[BioC] Expressionset from ArrayExpress processed data

Martin Morgan mtmorgan at fhcrc.org
Wed Jan 21 02:25:30 CET 2009


Yovanny Izquierdo Núñez <yovanny at ibp.co.cu> writes:

> Hi Martin,
>

> Thank you for your suggestions. Here's an example of how to create a
> data.frame from a sdrf file as explained in
> 'ExpressionSetIntroduction.pdf' (provided that the file is in the
> current working directory):
>
>   pData <- read.table("file.sdrf", row.names = 1, header = TRUE,  sep = "\t")
>
> From here it is possible to follow your suggestion.
>
> However, I found that my expression data contains 3 replicates per
> array, but these are not treated separately in the pData (I have 3
> times as columns in the expression data as elements in each pData
> slot). So obviously I get the error:
>
>> eset <- new("ExpressionSet", exprs=exprs, phenoData=phenoData)
> Error in validObject(.Object) :
>   invalid class "ExpressionSet" object: 1: sample numbers differ between assayData and phenoData
> invalid class "ExpressionSet" object: 2: sampleNames differ between assayData and phenoData
> In addition: Warning message:
> In sampleNames(assayData(object)) == sampleNames(phenoData(object)) :
>   longer object length is not a multiple of shorter object length
>
>
> Any ideas of how can I make them match?

Hi Yovanny -- 

There must be as many rows in pData as there are columns in exprs, and
the rows of pData must correspond to the columns of exprs. If exprs
has two arrays A, B and replicates 1, 2, 3, with columns

A1 B1 A2 B2 A3 B3

then you might

  pData3 <- cbind(rbind(pData, pData, pData), Replicate=rep(1:2, each=3))

this binds three copies of pData together by row, and then adds a
column to indicate which replicate each row represents. Then use
pData=pData3 when creating the ExpressionSet. It might be necessary to
adjust the row.names of pData3 to match the colnames of exprs, e.g.,

   row.names(pData3) <- colnames(exprs)

These are just suggestions; you'll have to manipulate pData and exprs
in a way that makes sense for the ExpressionSet and pData you actually
have.

Hope that helps,

Martin

> Thanks
> Yovanny
>
>
> ________________________________________
> De: Martin Morgan [mtmorgan at fhcrc.org]
> Enviado el: martes, 20 de enero de 2009 9:26
> Para: Yovanny Izquierdo Núñez
> CC: bioconductor at stat.math.ethz.ch
> Asunto: Re: [BioC] Expressionset from ArrayExpress processed data
>
> Hi Yovanny
>
> Yovanny Izquierdo Núñez <yovanny at ibp.co.cu> writes:
>
>> Dear BioC users,
>>
>> I'm working with experiments from the ArrayExpress database and some
>> of them do not provide the cell files, but instead the already
>> processed data in a table fromat (esasy to read with read.delim, for
>> instance). The PhenoData of the experiment comes separately in the
>> sdrf file. Is there a way to create an expressionset object from these
>> two?  The ArrayExpress package only provides functions for creating an
>
> See the 'ExpressionSetIntroduction.pdf' in the Biobase package
>
>   http://bioconductor.org/packages/2.3/bioc/html/Biobase.html
>
> I don't know how to parse the PhenoData into a data.frame, but once
> done likely you'll be able to do
>
>   phenoData <- new("AnnotatedDataFrame", pData=PhenoData)
>   eset <- new("ExpressionSet", exprs=exprs, phenoData==phenoData)
>
> Martin
>
>> AffyBatch object from the raw data and the sdrf, adf and idf files;
>> but has nothing so far to deal with the processed data.
>>
>> Thanks so much,
>>
>> Yovanny
>>
>> Instituto de Biotecnología de las Plantas Universidad Central "Marta
>> Abreu" de Las Villas Carretera a Camajuaní km 5½, Santa Clara, Villa
>> Clara, Cuba Tel: 53 (42) 281257, 281268, 281693 Fax: 53 (42) 281329
>> Web: http://www.ibp.co.cu E-Mail: info at ibp.co.cu
>>
>> _______________________________________________ Bioconductor mailing
>> list Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>
> Instituto de Biotecnología de las Plantas
> Universidad Central "Marta Abreu" de Las Villas
> Carretera a Camajuaní km 5½, Santa Clara, Villa Clara, Cuba
> Tel: 53 (42) 281257, 281268, 281693
> Fax: 53 (42) 281329
> Web: http://www.ibp.co.cu
> E-Mail: info at ibp.co.cu

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list