[BioC] Empty featureData in ExpressionSet generated by affy::justRMA

David Shih djh.shih at gmail.com
Fri May 20 21:12:57 CEST 2011


Hi Jim,

It turns out that the assayData slot was not filled during the
post-processing normalization.

The original ExpressionSet object returned by justRMA() correctly
populates the assayData slot.

> exprs(eset)[1:5, 1:5]
          5500024024214122006605.A01.CEL 5500024024214122006605.A02.CEL
1007_s_at                       9.261711                       9.672637
1053_at                         5.732471                       7.267763
117_at                          5.316724                       4.759532
121_at                          6.999956                       6.848935
1255_g_at                       4.614048                       4.572704
          5500024024214122006605.A03.CEL 5500024024214122006605.A05.CEL
1007_s_at                       9.780707                       9.684070
1053_at                         8.433437                       7.870544
117_at                          4.760007                       4.783464
121_at                          6.647044                       6.635504
1255_g_at                       4.577906                       4.548474
          5500024024214122006605.A06.CEL
1007_s_at                       9.384182
1053_at                         7.388526
117_at                          5.662874
121_at                          6.433151
1255_g_at                       4.640982


I'll have to update the post-processing normalization script to
reconstruct the ExpressionSet object properly.


On another note, it seems that the probe ids in the expression matrix
returned by exprs() are largely in the same order as the keys returned
by mappedkeys(hgu133a2ACCNUM):

> keys <- mappedkeys(hgu133a2ACCNUM);
> probes <- rownames(exprs(eset));

> length(probes)
[1] 22277
> length(keys)
[1] 22277
> sum(probes == keys)
[1] 22230
> probes[probes != keys]
 [1] "AFFX-hum_alu_at"             "AFFX-HUMGAPDH/M33197_3_at"  
 [3] "AFFX-HUMGAPDH/M33197_5_at"   "AFFX-HUMGAPDH/M33197_M_at"  
 [5] "AFFX-HUMISGF3A/M97935_3_at"  "AFFX-HUMISGF3A/M97935_5_at" 
 [7] "AFFX-HUMISGF3A/M97935_MA_at" "AFFX-HUMISGF3A/M97935_MB_at"
 [9] "AFFX-HUMRGE/M10098_3_at"     "AFFX-HUMRGE/M10098_5_at"    
[11] "AFFX-HUMRGE/M10098_M_at"     "AFFX-LysX-3_at"             
[13] "AFFX-LysX-5_at"              "AFFX-LysX-M_at"             
[15] "AFFX-M27830_3_at"            "AFFX-M27830_5_at"           
[17] "AFFX-M27830_M_at"            "AFFX-PheX-3_at"             
[19] "AFFX-PheX-5_at"              "AFFX-PheX-M_at"             
[21] "AFFX-r2-Bs-dap-3_at"         "AFFX-r2-Bs-dap-5_at"        
[23] "AFFX-r2-Bs-dap-M_at"         "AFFX-r2-Bs-lys-3_at"        
[25] "AFFX-r2-Bs-lys-5_at"         "AFFX-r2-Bs-lys-M_at"        
[27] "AFFX-r2-Bs-phe-3_at"         "AFFX-r2-Bs-phe-5_at"        
[29] "AFFX-r2-Bs-phe-M_at"         "AFFX-r2-Bs-thr-3_s_at"      
[31] "AFFX-r2-Bs-thr-5_s_at"       "AFFX-r2-Bs-thr-M_s_at"      
[33] "AFFX-r2-Ec-bioB-3_at"        "AFFX-r2-Ec-bioB-5_at"       
[35] "AFFX-r2-Ec-bioB-M_at"        "AFFX-r2-Ec-bioC-3_at"       
[37] "AFFX-r2-Ec-bioC-5_at"        "AFFX-r2-Ec-bioD-3_at"       
[39] "AFFX-r2-Ec-bioD-5_at"        "AFFX-r2-P1-cre-3_at"        
[41] "AFFX-r2-P1-cre-5_at"         "AFFX-ThrX-3_at"             
[43] "AFFX-ThrX-5_at"              "AFFX-ThrX-M_at"             
[45] "AFFX-TrpnX-3_at"             "AFFX-TrpnX-5_at"            
[47] "AFFX-TrpnX-M_at" 

The orders are the same, except that the order of the some control probes have
changed.

But I guess this match is not guaranteed to be true in general.


Thank you for you help.


Regards,
David

On Thu, May 19, 2011 at 09:55:51AM -0400, James W. MacDonald wrote:
> Hi David,
> 
> On 5/18/2011 2:04 PM, David Shih wrote:
> >Hi Jim,
> >
> >Thank you for clarifying the purpose of the featureData slot.
> >The assayData slot also appears empty in my ExpressionSet.
> 
> It's not empty, you just didn't access it correctly.
> 
> > showMethods(exprs, class="ExpressionSet", includeDefs = TRUE)
> Function: exprs (package Biobase)
> object="ExpressionSet"
> function (object)
> assayDataElement(object, "exprs")
> 
> and
> 
> > assayDataElement
> function (object, elt)
> assayData(object)[[elt]]
> <environment: namespace:Biobase>
> 
> So exprs() is extracting data from your assayData slot, but you have
> to choose the "exprs" element of the AssayData object that
> assayData() returns.
> 
> >
> >The rownames are missing in my expression matrix extracted using.
> >>exprs(eset)
> 
> Well, they *should* be there, so this indicates some bug in the
> processing. What does exprs(eset)[1:5,1:5} look like, exactly?
> 
> Best,
> 
> Jim
> 
> 
> >
> >I was hoping to fill the rownames of my matrix using keys obtained
> >from
> >>keys<- mappedkeys(hgu133a2ACCUM)
> >
> >I checked that all the keys are mapped, and I am under the impression
> >that they should be for hgu133a2ACCUM.
> >
> >My main concern is that hgu133a2ACCUM may not return the keys
> >(probe ids) in the same order as they appear in the expression matrix,
> >which would cause the probes to be incorrectly mapped.
> >
> >I have no problem with normalizing older platforms like U133. All the
> >probe ids are automatically assigned upon extraction with exprs().
> >However, the hthgu133a platform does not appear to be as
> >straightforward.
> >
> >I had a similar problems with the HT Human Gene 1.0 ST platform.
> >I ended up using the oligo package, whicih provides a getNetAffx()
> >function to retrieve probe annotation data.
> >However, the existing normalization code precludes from switching to
> >the oligo package.
> >
> >Best regards,
> >David
> >
> >On Wed, May 18, 2011 at 01:51:01PM -0400, James W. MacDonald wrote:
> >>Hi David,
> >>
> >>On 5/18/2011 1:31 PM, David Shih wrote:
> >>>Dear list,
> >>>
> >>>I am using the affy package to normalize expression data on the HT
> >>>Human Genome U133A platform. According to earlier posting on the list,
> >>>this is equivalent to U133Av2.
> >>>
> >>>I used the justRMA() function to normalize the data, and obtained an
> >>>ExpressionSet object with an empty featureData slot.
> >>
> >>You may be misunderstanding the purpose of the featureData slot.
> >>According to ?eSet-class:
> >>
> >>'featureData': Contains variables describing features (i.e., rows
> >>           in 'assayData') unique to this experiment. Use the
> >>           'annotation' slot to efficiently reference feature data
> >>           common to the annotation package used in the experiment.
> >>           Class: 'AnnotatedDataFrame-class'
> >>
> >>The probeset IDs are not unique to any particular experiment, so are
> >>stored as part of the assayData, which can be extracted using e.g.,
> >>exprs():
> >>
> >>>data(sample.ExpressionSet)
> >>>head(exprs(sample.ExpressionSet))
> >>                        A         B        C        D        E
> >>F       G
> >>AFFX-MurIL2_at  192.7420  85.75330 176.7570 135.5750 64.49390
> >>76.3569 160.5050
> >>AFFX-MurIL10_at  97.1370 126.19600  77.9216  93.3713 24.39860
> >>85.5088 98.9086
> >>AFFX-MurIL4_at   45.8192   8.83135  33.0632  28.7072  5.94492
> >>28.2925 30.9694
> >>AFFX-MurFAS_at   22.5445   3.60093  14.6883  12.3397 36.86630
> >>11.2568 23.0034
> >>AFFX-BioB-5_at   96.7875  30.43800  46.1271  70.9319 56.17440
> >>42.6756 86.5156
> >>AFFX-BioB-M_at   89.0730  25.84610  57.2033  69.9766 49.58220
> >>26.1262 75.0083
> >>
> >>Best,
> >>
> >>Jim
> >>
> >>
> >>>
> >>>I understand that I can retrive the annotation information from:
> >>>library(hgu133a2.db)
> >>>
> >>>However, I am uncertain whether the probes in the annotation library
> >>>are in the same order as the ExpressionSet object.
> >>>
> >>>Can I add the probe IDs using the following code?
> >>>
> >>>library(hgu133a2.db)
> >>>keys<- mappedkeys(hgu133ai2ACCNUM);
> >>>expr<- exprs(eset);
> >>>rownames(expr)<- keys;
> >>>
> >>>
> >>>Best regards,
> >>>
> >>>David Shih, Graduate Student
> >>>The Hospital for Sick Children
> >>>Brain Tumour Research Centre
> >>>101 College Street, TMDT-11-401M
> >>>Toronto, ON  M5G1L7
> >>>Canada
> >>>Tel:  416-813-7654 x4327
> >>>
> >>>
> >>>>sessionInfo()
> >>>R version 2.12.2 (2011-02-25)
> >>>Platform: x86_64-unknown-linux-gnu (64-bit)
> >>>
> >>>locale:
> >>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> >>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> >>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> >>>[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >>>
> >>>attached base packages:
> >>>[1] stats     graphics  grDevices utils     datasets  methods   base
> >>>
> >>>other attached packages:
> >>>[1] hgu133a2.db_2.4.5    org.Hs.eg.db_2.4.6   RSQLite_0.9-4
> >>>[4] DBI_0.2-5            AnnotationDbi_1.12.0 hthgu133acdf_2.7.0
> >>>[7] affy_1.28.0          Biobase_2.10.0
> >>>
> >>>loaded via a namespace (and not attached):
> >>>[1] affyio_1.18.0         preprocessCore_1.12.0 tools_2.12.2
> >>>
> >>>_______________________________________________
> >>>Bioconductor mailing list
> >>>Bioconductor at r-project.org
> >>>https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>--
> >>James W. MacDonald, M.S.
> >>Biostatistician
> >>Douglas Lab
> >>University of Michigan
> >>Department of Human Genetics
> >>5912 Buhl
> >>1241 E. Catherine St.
> >>Ann Arbor MI 48109-5618
> >>734-615-7826
> >>**********************************************************
> >>Electronic Mail is not secure, may not be read every day, and should
> >>not be used for urgent or sensitive issues
> >>
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should
> not be used for urgent or sensitive issues
>



More information about the Bioconductor mailing list