[BioC] phenotypic information of ALLMLL data set
whuber at embl.de
Sat Jun 5 10:32:46 CEST 2010
Ben might be able to provide more insight about the phenoData of the
data in the ALLMLL package, but note that 20 samples is a very small
number in a study of patient samples, and biologically, the results
might not be very powerful.
Since there have been quite a few experiments on pediatric blood cancers
over the last decade, you could also have a look at the ArrayExpress or
GEO databases for other datasets relevant to your question. The
Bioconductor packages ArrayExpress and GEOquery help with downloading
them directly into Bioconductor objects.
A query for "childhood leukemia" in ArrayExpress leads to 26 datasets.
x = ArrayExpress("E-GEOD-11877") ## may take a little while
#size of arrays=1164x1164 features (488 kb)
#cdf=HG-U133_Plus_2 (54675 affyids)
#number of samples=207
#number of genes=54675
I have not delved deeper into this particular dataset, it seems that you
then need to do some further parsing of the slot x$Description in order
to extract the phenotypic variables (such things depend on the amount of
care that the submitters, or the curators at GEO or ArrayExpress, have
spent on this).
On 02/06/10 09:07, Javier Pérez Florido wrote:
> Dear list,
> I'm using ALLMLL data set (from ALLMLL Bioconductor package). This
> package provides probe-level data for 20 HGU133A (MLL.A) and 20 HGU133B
> (MLL.B) arrays which are a subset of arrays from a large ALL study.
> I would like to know the phenotypic information about these data sets to
> run a differential expression analysis: I need the phenotypic info to
> group the samples by conditions. I had a look at the supplementary
> information of the paper related to this data set, but I cannot make a
> relationship between the sample names and conditions.
> Any suggestions?
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
More information about the Bioconductor