[BioC] lumi and plotHousekeepingGene

Janet Young jayoung at fhcrc.org
Sat Jun 4 03:17:31 CEST 2011


Hi Pan,

Thanks for the reply.  

I think we're not talking about the same adjustments to the control data as one another.  It does clarify things a little that these are raw numbers, but I'm still confused about this one section of the code for the plotHousekeepingGene function, where it does a subtraction on the control data:

       if (max(selControlData) > 50) {
            selControlData <- selControlData - min(selControlData) + 
                1
            selControlData <- log2(selControlData)
        }

That subtraction of the minimum value hasn't made much difference for most of the housekeeping genes in our data, but for a single gene it makes the probe look very variable across arrays, but in reality it doesn't change nearly as much as the plot implies.  I've attached the two plots from our data to show what I mean (plot as given by the function, and plot if I just extract the controlData slot and plot log2s manually).  The numbers in my first email are extracted from the controlData slot and processed in the same way as the plotHousekeepingGene processes the data for plotting.

Hope my question makes more sense now!

Janet





On Jun 2, 2011, at 9:16 AM, Pan Du wrote:

> Hi Janet
> 
> The plotHousekeepingGene function plots the housekeeping gene data in the controlData slot, which is the raw data output by BeadStudio/GenomeStudio. The current implementation of lumi package does not update the controlData during the preprocessing. So even after normalization, the controlData still keeps the same.  So plotHousekeepingGene and other related controlData functions are for the QC of the raw data, although some variations can be corrected by preprocessing.
> 
> Hope this clarify your question.
> 
> Pan
> 
> On Wed, Jun 1, 2011 at 6:29 PM, Janet Young <jayoung at fhcrc.org> wrote:
> Hi,
> 
> I have a set of Illumina arrays and have been playing with lumi a little.  It seems really useful - thank you very much.
> 
> I'm pretty new to array analysis, so I'm not sure if this is a bug or intended behavior, but here goes: I've found that plotHousekeepingGene behaves a little oddly with our data (for now, processed using lumiExpresso with default settings). I think the problem is caused by the following portion of the function, that subtracts the minimum control value from all the control datapoints before taking the log and plotting.   What's the rationale for that subtraction?  I might be missing something.
> 
>    if (logMode) {
>        if (max(selControlData) > 50) {
>            selControlData <- selControlData - min(selControlData) +
>                1
>            selControlData <- log2(selControlData)
>        }
>        ylab <- "Expression Amplitude (log2)"
>    }
> 
> In the plotHousekeepingGene of our data, one of the housekeeping genes looks quite bad, so initially I was concerned: that gene has a lot lower expression than the rest of them, and expression appears to vary a lot across the arrays.  Expression is indeed low, but it doesn't really vary much across the arrays when I look at the normalized data myself, so in reality I don't think I need to worry too much (although I will be checking in with the biologists about whether that gene should be high or low in the cells they've assayed).
> 
> Here's the control data that got plotted by plotHousekeepingGene, i.e. the control data after that subtraction of the minimum value (probe 101 is low, and varies widely across arrays).  If there is a good rationale for the subtraction step, maybe I should actually be worried about this gene?
> 
>      array_1   array_2   array_3  array_4  array_5   array_6
> 101  4.307429  7.742815  8.274728  7.74685  0.00000  7.499049
> 102 14.052568 14.134073 14.201274 14.12779 14.14360 14.181657
> 103 14.667866 14.725973 14.759108 14.59437 14.53636 14.606914
> 104 13.095512 12.862831 12.729939 13.30600 13.08711 13.063597
> 105 13.768515 13.506642 14.023174 13.58535 13.93239 14.050444
> 106 13.313818 12.773840 12.792241 13.15706 13.29373 12.848232
> 107 13.916514 13.466714 13.714310 13.49404 13.82293 13.475049
> 
> But here's how the control data looks when I just take the log2 myself (probe 101 is somewhat low, but fairly constant across arrays, and not as low as the negative controls from the same arrays - their values tend to be around 6-7).
> 
>     array_1   array_2   array_3   array_4   array_5   array_6
> 101  8.21820  8.943101  9.198936  8.944858  8.124121  8.842036
> 102 14.07598 14.156210 14.222410 14.150025 14.165590 14.203080
> 103 14.68319 14.740698 14.773500 14.610494 14.553143 14.622898
> 104 13.14062 12.915693 12.787801 13.345073 13.132484 13.109700
> 105 13.79697 13.540697 14.047064 13.617617 13.957818 14.073891
> 106 13.35268 12.830000 12.847703 13.200316 13.333127 12.901621
> 107 13.94222 13.501713 13.743846 13.528393 13.850343 13.509849
> 
> 
> Hope this is helpful...
> 
> Thanks very much,
> 
> Janet Young
> 
> 
> -------------------------------------------------------------------
> 
> Dr. Janet Young
> 
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Avenue N., C3-168,
> P.O. Box 19024, Seattle, WA 98109-1024, USA.
> 
> tel: (206) 667 1471 fax: (206) 667 6524
> email: jayoung  ...at...  fhcrc.org
> 
> 
> -------------------------------------------------------------------
> 
> 
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HousekeepingGeneProfiles.pdf
Type: application/pdf
Size: 22298 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20110603/04abc085/attachment.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: HousekeepingGeneProfilesWithoutSubtraction.pdf
Type: application/pdf
Size: 19660 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20110603/04abc085/attachment-0001.pdf>


More information about the Bioconductor mailing list