[BioC] summarized expression values from beadarray versus GenomeStudio

Mark Dunning mark.dunning at gmail.com
Wed Apr 13 13:57:14 CEST 2011


Hi Ina,

Could you send me the Illumina IDs and/or ArrayAddress IDs of any bead
types that do not get summarized by beadarray? The Humanv4 platform
that you are using has some extra spike controls that were not used on
older arrays. My guess is that the mapping files used by beadarray to
convert ArrayAddressIDs into Illumina IDs does not know about these
IDs yet. This would go a long way to explaining the difference in row
numbers.

Could you give a bit more detail on how the GenomeStudio data were
exported? i.e with/without normalisation

Regards,

Mark

On Tue, Apr 12, 2011 at 3:22 PM, Ina Hoeschele <inah at vbi.vt.edu> wrote:
> Hi Mark and Wei,
>
> thank you very much for your suggestions.
>
> For all of my 8 BSData objects the first dimension is 48,107 probes (47,224 gene probes, 883 control probes). The corresponding dataset produced by GenomeStudio contains 47,320 gene probes and 886 control probes, so I seem to have 96 fewer gene probes and 3 control probes less ... I do not know why there is this difference, but these numbers do not look like anything is really messed up.
>
> I would not be so worried about the discrepancy in values, but since the correlations among (control) samples (on different chips) are so much worse for Bioconductor compared to GenomeStudio (.91-.92 versus .98-.99), something must be going wrong somewhere.
>
> Related to this, for each sample run on a bead chip, there may be some bead types that failed. For all samples that are combined in a 'project' in GenomeStudio, bead types that have failed in any of these samples are excluded from the summarized data (unless one checks the impute option).  I wonder how this is being handled in the summarization in beadarray. Since beadarray deals with a single chip at a time, a project in beadarrary would be a single chip. So if beadarray also excludes failed bead types, then different BSData objects (each representing a single chip) may have different bead types represented. I need to check whether this might have messed up my correlations between control samples from different chips (?) But for my first batch of 8 chips, all BSData objects have the same 1st dimension, which is a bit smaller than the number of summarized probes from GenomeStudio.
>
> Ina
>
>
>
> ----- Original Message -----
> From: "Mark Dunning" <mark.dunning at gmail.com>
> To: "Ina Hoeschele" <inah at vbi.vt.edu>
> Cc: bioconductor at stat.math.ethz.ch
> Sent: Thursday, April 7, 2011 5:33:09 AM
> Subject: Re: summarized expression values from beadarray versus GenomeStudio
>
> Hi Ina,
>
> Nothing seems to be wrong with your approach and it should re-create
> the BeadStudio intensities. We tried it out on some of our own data
> and managed to get very close to the BeadStudio values.
>
> Do the number of observations reported by beadarray and GenomeStudio
> agree? What are the dimensions of your BSData object and are they what
> you are expecting? It could be that summarize is incorrectly trying to
> combine data from multiple strips.
>
> Best,
>
> Mark
>
>
>
> On Mon, Apr 4, 2011 at 11:13 PM, Ina Hoeschele <inah at vbi.vt.edu> wrote:
>> Hi Mark et al.,
>>  I have calculated correlations among the expression vectors of different samples (in particular for a control sample that we use on each BeadChip), both for the expression data that I have processed in Bioconductor using the beadarray package and for the expression data produced by GenomeStudio (selecting quantile normalization). The correlations (especially for the control samples from different chips) are clearly worse for the Bioconductor processed data and I have been trying to track down where I have a problem.
>>
>> I also have the summarized (bead-type) intensities from GenomeStudio without normalization. I obtain the corresponding summarized values from beadarray with the following code
>>
>> myMean = function(x) mean(x, na.rm = TRUE)
>> mySe = function(x) sd(x, na.rm = TRUE)/sqrt(length(x))
>> GreenChannelTransform <- function (BLData, array)
>> {
>>        x = getBeadData(BLData, array = array, what = "Grn")
>>        return(x)
>> }
>> greenChannel = new("illuminaChannel",GreenChannelTransform,illuminaOutlierMethod,myMean,mySe,"G")
>>
>> for (iChip in 1:nChips)
>> {
>>        setwd(Chip.Dir[iChip])
>>        BLData = readIllumina(useImages=FALSE, illuminaAnnotation="Humanv4")
>>        BSData <- summarize(BLData,list(greenChannel),useSampleFac=TRUE,sampleFac=NULL,removeUnMappedProbes=TRUE)
>>        save(BSData,file="BSData.rda")
>>        rm(BLData); rm(BSData); gc()
>> }
>>
>>
>> If the data are summarized in this way using Bioconductor/beadarray, would you not expect the summarized values to be identical to those from GenomeStudio?
>>
>> I checked the summarized value for one beadtype on the first several sections of chip 1.
>> The summary values from GenomeStudio are: 77.93, 159.16, 174.93, 131.05, 484.39
>> The summary values from beadarray are: 90.0, 192.0, 1q88.5, 157.0, 492.0
>> (I also calculated the first summary value by hand and come up with 103.36!)
>>
>> Why are these values different, any hint?
>>
>> Many thanks as always, Ina
>>
>



More information about the Bioconductor mailing list