[BioC] Some Genefilter questions

Robert Gentleman rgentlem at fhcrc.org
Thu Nov 30 19:15:02 CET 2006


Hi again,

  Some parts of my answer and of Jim's are in disagreement - it might be 
nice to hear other points of view here.

   The question is really whether there is anything to be gained by 
removing the probes (probesets) we know are not involved prior to 
normalization background correction or not.

   Clearly these probes will help with background correction, but they 
could substantially interfere with normalization. I don't personally 
thing (no evidence at all though) that this is a problem -  but would 
love to see some quantitative comparisons of results that took both 
approaches to see if the end results are qualitatively different.

   best wishes
     Robert


James W. MacDonald wrote:
> Hi Amy,
> 
> Amy Mikhail wrote:
>> Dear Bioconductors,
>>
>> I am annalysing 6 PlasmodiumAnopheles genechips, which have only Anopheles
>> mosquito samples hybridised to them (i.e. they are not infected
>> mosquitoes).  The 6 chips include 3 replicates, each consisting of two
>> time points.  The design matrix is as follows:
>>
>>
>>> design
>>      M15d M43d
>> [1,]    1    0
>> [2,]    0    1
>> [3,]    1    0
>> [4,]    0    1
>> [5,]    1    0
>> [6,]    0    1
>>
>>
>> I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in affy). 
>> Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and 0
>> DE genes, respectively... much less than I was expecting.
>>
>> As this affy chip contains probesets for both mosquito and malaria
>> parasite genes, I am wondering:
>>
>> (a) if it is better to remove all the parasite probesets before my analysis;
> 
> Probably. It's not the easiest thing to do. Here is a link to some code 
> you can use:
> 
> http://article.gmane.org/gmane.science.biology.informatics.conductor/9869/match=remove+probes+cdf
> 
> Read what Ariel and Jenny write there very closely so you don't make 
> mistakes.
> 
>> (b) if so at what stage I should do this (before or after normalisation
>> and background correction, or does it matter?)
> 
> Before doing anything, most likely, which is what the above code will do 
> for you.
> 
>> (c) how would I filter out these probesets using genefilter (all the
>> parasite affy IDs begin with Pf. - could I use this prefix in the affy IDs
>> to filter out the probesets, and if so how?)
>>
>> Secondly, I did not add any of the polyA controls to my samples.  I would
>> like to know:
>>
>> (d) Do any of the bg correct / normalisation methods I tried utilise
>> affymetrix control probesets, and if so, how?
> 
> No.
> 
>> (e) Should I also filter out the control sets - again, if so at what stage
>> in the analysis and what would be an appropriate code to use?
> 
> No, there aren't enough of them to have an effect on your data.
> 
>> I did try the code for non-specific filtering (on my RMA dataset) from pg.
>> 232 of the bioconductor monograph, but the reduction in the number of
>> probesets was quite drastic;
>>
>>
>>> f1 <- pOverA(0.25, log2(100))
>>> f2 <- function(x) (IQR(x) > 0.5)
>>> ff <- filterfun(f1, f2)
>>> selected <- genefilter(Baseage.transformed, ff)
>>> sum(selected)
>> [1] 404   ###(The origninal no. of probesets is 22,726)###
>>
>>> Baseage.sub <- Baseage.transformed[selected, ]
>>
>> Also, I understood from the monograph that "100" was to filter out
>> fluorescence intensities less than this, but I am not clear if this is
>> from raw intensities or log2 values?
> 
> It has to be data on the natural scale. The intensities for an Affy chip 
> come from a 16-bit TIFF image, which means the brightest value can be 
> 2^16, which in log2 scale is 16, so you cannot even have a value that 
> approaches 100 on the log scale.
> 
> Best,
> 
> Jim
> 
> 
> 
>> All the parasite probesets have raw intensities <35 .... so could I apply
>> this as a simple filter, and would this have to be on raw (rather than
>> normalised data)?
>>
>> Appologies for the long posting...
>>
>> Looking forward to any replies,
>> Regards,
>> Amy
>>
>>
>>> sessionInfo()
>> R version 2.4.0 (2006-10-03)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>>  [1] "tcltk"     "splines"   "tools"     "methods"   "stats"    
>> "graphics"  "grDevices" "utils"     "datasets"  "base"
>>
>> other attached packages:
>> plasmodiumanophelescdf              tkWidgets                 DynDoc      
>>      widgetTools            agahomology
>>               "1.14.0"               "1.12.0"               "1.12.0"      
>>         "1.10.0"               "1.14.2"
>>                affyPLM                  gcrma            matchprobes      
>>         affydata                annaffy
>>               "1.10.0"                "2.6.0"                "1.6.0"      
>>         "1.10.0"                "1.6.0"
>>                   KEGG                     GO                  limma      
>>      geneplotter               annotate
>>               "1.14.0"               "1.14.0"                "2.9.1"      
>>         "1.12.0"               "1.12.0"
>>                   affy                 affyio             genefilter      
>>         survival                Biobase
>>               "1.12.0"                "1.2.0"               "1.12.0"      
>>           "2.29"               "1.12.0"
>>
>>
>>
>> -------------------------------------------
>> Amy Mikhail
>> Research student
>> University of Aberdeen
>> Zoology Building
>> Tillydrone Avenue
>> Aberdeen AB24 2TZ
>> Scotland
>> Email: a.mikhail at abdn.ac.uk
>> Phone: 00-44-1224-272880 (lab)
>>        00-44-1224-273256 (office)
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list