[BioC] Some Genefilter questions

Amy Mikhail a.mikhail at abdn.ac.uk
Thu Nov 30 19:07:09 CET 2006


Hi Robert and Jim,

Many thanks for your advice.  I have some more questions...

First, I tried what Robert suggested on my expression set.  However I got
a strange result:

> load("E:\\Amy - Bioconductor analysis\\03. Base age\\Affymetrix - Base
Age results & analysis\\Baseage - RMA normalised.RData")
> ls()
[1] "Data"      "eset"      "phenodata" "x"         "xy"        "y"

> parasites = grep("^Pf", featureNames(eset))
> parasites
   [1] 18192 18193 18194 18195 18196 18197 18198 18199 18200 18201 18202
18203
  [13] 18204 18205 18206 18207 18208 18209 18210 18211 18212 18213 18214
18215
  [25] 18216 18217 18218 18219 18220 18221 18222 18223 18224 18225 18226
18227 ### this list continues untill no. 4,514 ###

I was expexting the parasite affy IDs to be listed  here, but these are (I
think) the probeset numbers (I can't tell if they are the right ones or
not...)?

> mossie.sub = eset[!parasites,]
> mossie.sub
Expression Set (exprSet) with
        0 genes
        6 samples
        phenoData object with 3 variables and 6 cases
        varLabels
                Name: short name of datasets for graphs
                Population: Age of adult mosquitoes (in days) included in
the sample
                Replicate: Replicate number of the experiment

So now it has removed all the genes... I don't understand why this would
happen since the subset called "parasites" only contains a fraction of the
total number of probesets (4,514 out of 22,769).

Next, I wanted to try Jim's suggestion on the raw data.  I can follow
Jenny's post up to:

" all you need now is your affybatch object, and a character vector of
probe set names"

I have an affybatch object, but how do I create a character vector for the
probesets I want to remove?

I'm still not very R-literate, so tried using the same code as previous
except with the raw data instead of my expression set but the
"featureNames" bit was a problem:

> parasites = grep("^Pf", featureNames(data))
Error in function (classes, fdef, mtable)  :
        unable to find an inherited method for function "featureNames",
for signature "function"

Any ideas?

Regards,

Amy

---------------------------------------------------------------------------

> Hi Amy,
>
> Amy Mikhail wrote:
>> Dear Bioconductors,
>>
>> I am annalysing 6 PlasmodiumAnopheles genechips, which have only
>> Anopheles
>> mosquito samples hybridised to them (i.e. they are not infected
>> mosquitoes).  The 6 chips include 3 replicates, each consisting of two
>> time points.  The design matrix is as follows:
>>
>>
>>>design
>>
>>      M15d M43d
>> [1,]    1    0
>> [2,]    0    1
>> [3,]    1    0
>> [4,]    0    1
>> [5,]    1    0
>> [6,]    0    1
>>
>>
>> I have tried both gcRMA (in AffyLMGUI), and RMA, MBEI and MAS5 (in
>> affy).
>> Looking at the (BH) adjusted p values <0.05, this gave me 2, 12, 0 and
>> 0
>> DE genes, respectively... much less than I was expecting.
>>
>> As this affy chip contains probesets for both mosquito and malaria
>> parasite genes, I am wondering:
>>
>> (a) if it is better to remove all the parasite probesets before my
>> analysis;
>
> Probably. It's not the easiest thing to do. Here is a link to some code
> you can use:
>
> http://article.gmane.org/gmane.science.biology.informatics.conductor/9869/match=remove+probes+cdf
>
> Read what Ariel and Jenny write there very closely so you don't make
> mistakes.
>
>>
>> (b) if so at what stage I should do this (before or after normalisation
>> and background correction, or does it matter?)
>
> Before doing anything, most likely, which is what the above code will do
> for you.
>
>>
>> (c) how would I filter out these probesets using genefilter (all the
>> parasite affy IDs begin with Pf. - could I use this prefix in the affy
>> IDs
>> to filter out the probesets, and if so how?)
>>
>> Secondly, I did not add any of the polyA controls to my samples.  I
>> would
>> like to know:
>>
>> (d) Do any of the bg correct / normalisation methods I tried utilise
>> affymetrix control probesets, and if so, how?
>
> No.
>
>>
>> (e) Should I also filter out the control sets - again, if so at what
>> stage
>> in the analysis and what would be an appropriate code to use?
>
> No, there aren't enough of them to have an effect on your data.
>
>>
>> I did try the code for non-specific filtering (on my RMA dataset) from
>> pg.
>> 232 of the bioconductor monograph, but the reduction in the number of
>> probesets was quite drastic;
>>
>>
>>>f1 <- pOverA(0.25, log2(100))
>>>f2 <- function(x) (IQR(x) > 0.5)
>>>ff <- filterfun(f1, f2)
>>>selected <- genefilter(Baseage.transformed, ff)
>>>sum(selected)
>>
>> [1] 404   ###(The origninal no. of probesets is 22,726)###
>>
>>>Baseage.sub <- Baseage.transformed[selected, ]
>>
>>
>> Also, I understood from the monograph that "100" was to filter out
>> fluorescence intensities less than this, but I am not clear if this is
>> from raw intensities or log2 values?
>
> It has to be data on the natural scale. The intensities for an Affy chip
> come from a 16-bit TIFF image, which means the brightest value can be
> 2^16, which in log2 scale is 16, so you cannot even have a value that
> approaches 100 on the log scale.
>
> Best,
>
> Jim
>
>
>
>>
>> All the parasite probesets have raw intensities <35 .... so could I
>> apply
>> this as a simple filter, and would this have to be on raw (rather than
>> normalised data)?
>>
>> Appologies for the long posting...
>>
>> Looking forward to any replies,
>> Regards,
>> Amy
>>
>>
>>>sessionInfo()
>>
>> R version 2.4.0 (2006-10-03)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>>  [1] "tcltk"     "splines"   "tools"     "methods"   "stats"
>> "graphics"  "grDevices" "utils"     "datasets"  "base"
>>
>> other attached packages:
>> plasmodiumanophelescdf              tkWidgets                 DynDoc
>>      widgetTools            agahomology
>>               "1.14.0"               "1.12.0"               "1.12.0"
>>         "1.10.0"               "1.14.2"
>>                affyPLM                  gcrma            matchprobes
>>         affydata                annaffy
>>               "1.10.0"                "2.6.0"                "1.6.0"
>>         "1.10.0"                "1.6.0"
>>                   KEGG                     GO                  limma
>>      geneplotter               annotate
>>               "1.14.0"               "1.14.0"                "2.9.1"
>>         "1.12.0"               "1.12.0"
>>                   affy                 affyio             genefilter
>>         survival                Biobase
>>               "1.12.0"                "1.2.0"               "1.12.0"
>>           "2.29"               "1.12.0"
>>
>>
>>
>> -------------------------------------------
>> Amy Mikhail
>> Research student
>> University of Aberdeen
>> Zoology Building
>> Tillydrone Avenue
>> Aberdeen AB24 2TZ
>> Scotland
>> Email: a.mikhail at abdn.ac.uk
>> Phone: 00-44-1224-272880 (lab)
>>        00-44-1224-273256 (office)
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not
> be used for urgent or sensitive issues.
>


-------------------------------------------
Amy Mikhail
Research student
University of Aberdeen
Zoology Building
Tillydrone Avenue
Aberdeen AB24 2TZ
Scotland
Email: a.mikhail at abdn.ac.uk
Phone: 00-44-1224-272880 (lab)
       00-44-1224-273256 (office)



More information about the Bioconductor mailing list