[BioC] Some Genefilter questions

Amy Mikhail a.mikhail at abdn.ac.uk
Wed Nov 29 21:32:00 CET 2006


Dear Bioconductors,

I am annalysing 6 PlasmodiumAnopheles genechips, which have only Anopheles
mosquito samples hybridised to them (i.e. they are not infected
mosquitoes).  The 6 chips include 3 replicates, each consisting of two
time points.  The design matrix is as follows:

> design
     M15d M43d
[1,]    1    0
[2,]    0    1
[3,]    1    0
[4,]    0    1
[5,]    1    0
[6,]    0    1


I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in affy). 
Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and 0
DE genes, respectively... much less than I was expecting.

As this affy chip contains probesets for both mosquito and malaria
parasite genes, I am wondering:

(a) if it is better to remove all the parasite probesets before my analysis;

(b) if so at what stage I should do this (before or after normalisation
and background correction, or does it matter?)

(c) how would I filter out these probesets using genefilter (all the
parasite affy IDs begin with Pf. - could I use this prefix in the affy IDs
to filter out the probesets, and if so how?)

Secondly, I did not add any of the polyA controls to my samples.  I would
like to know:

(d) Do any of the bg correct / normalisation methods I tried utilise
affymetrix control probesets, and if so, how?

(e) Should I also filter out the control sets - again, if so at what stage
in the analysis and what would be an appropriate code to use?

I did try the code for non-specific filtering (on my RMA dataset) from pg.
232 of the bioconductor monograph, but the reduction in the number of
probesets was quite drastic;

> f1 <- pOverA(0.25, log2(100))
> f2 <- function(x) (IQR(x) > 0.5)
> ff <- filterfun(f1, f2)
> selected <- genefilter(Baseage.transformed, ff)
> sum(selected)
[1] 404   ###(The origninal no. of probesets is 22,726)###
> Baseage.sub <- Baseage.transformed[selected, ]

Also, I understood from the monograph that "100" was to filter out
fluorescence intensities less than this, but I am not clear if this is
from raw intensities or log2 values?

All the parasite probesets have raw intensities <35 .... so could I apply
this as a simple filter, and would this have to be on raw (rather than
normalised data)?

Appologies for the long posting...

Looking forward to any replies,
Regards,
Amy

> sessionInfo()
R version 2.4.0 (2006-10-03)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
 [1] "tcltk"     "splines"   "tools"     "methods"   "stats"    
"graphics"  "grDevices" "utils"     "datasets"  "base"

other attached packages:
plasmodiumanophelescdf              tkWidgets                 DynDoc      
     widgetTools            agahomology
              "1.14.0"               "1.12.0"               "1.12.0"      
        "1.10.0"               "1.14.2"
               affyPLM                  gcrma            matchprobes      
        affydata                annaffy
              "1.10.0"                "2.6.0"                "1.6.0"      
        "1.10.0"                "1.6.0"
                  KEGG                     GO                  limma      
     geneplotter               annotate
              "1.14.0"               "1.14.0"                "2.9.1"      
        "1.12.0"               "1.12.0"
                  affy                 affyio             genefilter      
        survival                Biobase
              "1.12.0"                "1.2.0"               "1.12.0"      
          "2.29"               "1.12.0"
>


-------------------------------------------
Amy Mikhail
Research student
University of Aberdeen
Zoology Building
Tillydrone Avenue
Aberdeen AB24 2TZ
Scotland
Email: a.mikhail at abdn.ac.uk
Phone: 00-44-1224-272880 (lab)
       00-44-1224-273256 (office)



More information about the Bioconductor mailing list