[BioC] Some Genefilter questions

James W. MacDonald jmacdon at med.umich.edu
Wed Nov 29 23:32:56 CET 2006


Hi Amy,

Amy Mikhail wrote:
> Dear Bioconductors,
> 
> I am annalysing 6 PlasmodiumAnopheles genechips, which have only Anopheles
> mosquito samples hybridised to them (i.e. they are not infected
> mosquitoes).  The 6 chips include 3 replicates, each consisting of two
> time points.  The design matrix is as follows:
> 
> 
>>design
> 
>      M15d M43d
> [1,]    1    0
> [2,]    0    1
> [3,]    1    0
> [4,]    0    1
> [5,]    1    0
> [6,]    0    1
> 
> 
> I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in affy). 
> Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and 0
> DE genes, respectively... much less than I was expecting.
> 
> As this affy chip contains probesets for both mosquito and malaria
> parasite genes, I am wondering:
> 
> (a) if it is better to remove all the parasite probesets before my analysis;

Probably. It's not the easiest thing to do. Here is a link to some code 
you can use:

http://article.gmane.org/gmane.science.biology.informatics.conductor/9869/match=remove+probes+cdf

Read what Ariel and Jenny write there very closely so you don't make 
mistakes.

> 
> (b) if so at what stage I should do this (before or after normalisation
> and background correction, or does it matter?)

Before doing anything, most likely, which is what the above code will do 
for you.

> 
> (c) how would I filter out these probesets using genefilter (all the
> parasite affy IDs begin with Pf. - could I use this prefix in the affy IDs
> to filter out the probesets, and if so how?)
> 
> Secondly, I did not add any of the polyA controls to my samples.  I would
> like to know:
> 
> (d) Do any of the bg correct / normalisation methods I tried utilise
> affymetrix control probesets, and if so, how?

No.

> 
> (e) Should I also filter out the control sets - again, if so at what stage
> in the analysis and what would be an appropriate code to use?

No, there aren't enough of them to have an effect on your data.

> 
> I did try the code for non-specific filtering (on my RMA dataset) from pg.
> 232 of the bioconductor monograph, but the reduction in the number of
> probesets was quite drastic;
> 
> 
>>f1 <- pOverA(0.25, log2(100))
>>f2 <- function(x) (IQR(x) > 0.5)
>>ff <- filterfun(f1, f2)
>>selected <- genefilter(Baseage.transformed, ff)
>>sum(selected)
> 
> [1] 404   ###(The origninal no. of probesets is 22,726)###
> 
>>Baseage.sub <- Baseage.transformed[selected, ]
> 
> 
> Also, I understood from the monograph that "100" was to filter out
> fluorescence intensities less than this, but I am not clear if this is
> from raw intensities or log2 values?

It has to be data on the natural scale. The intensities for an Affy chip 
come from a 16-bit TIFF image, which means the brightest value can be 
2^16, which in log2 scale is 16, so you cannot even have a value that 
approaches 100 on the log scale.

Best,

Jim



> 
> All the parasite probesets have raw intensities <35 .... so could I apply
> this as a simple filter, and would this have to be on raw (rather than
> normalised data)?
> 
> Appologies for the long posting...
> 
> Looking forward to any replies,
> Regards,
> Amy
> 
> 
>>sessionInfo()
> 
> R version 2.4.0 (2006-10-03)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
>  [1] "tcltk"     "splines"   "tools"     "methods"   "stats"    
> "graphics"  "grDevices" "utils"     "datasets"  "base"
> 
> other attached packages:
> plasmodiumanophelescdf              tkWidgets                 DynDoc      
>      widgetTools            agahomology
>               "1.14.0"               "1.12.0"               "1.12.0"      
>         "1.10.0"               "1.14.2"
>                affyPLM                  gcrma            matchprobes      
>         affydata                annaffy
>               "1.10.0"                "2.6.0"                "1.6.0"      
>         "1.10.0"                "1.6.0"
>                   KEGG                     GO                  limma      
>      geneplotter               annotate
>               "1.14.0"               "1.14.0"                "2.9.1"      
>         "1.12.0"               "1.12.0"
>                   affy                 affyio             genefilter      
>         survival                Biobase
>               "1.12.0"                "1.2.0"               "1.12.0"      
>           "2.29"               "1.12.0"
> 
> 
> 
> -------------------------------------------
> Amy Mikhail
> Research student
> University of Aberdeen
> Zoology Building
> Tillydrone Avenue
> Aberdeen AB24 2TZ
> Scotland
> Email: a.mikhail at abdn.ac.uk
> Phone: 00-44-1224-272880 (lab)
>        00-44-1224-273256 (office)
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list