[BioC] nsFilter and GSEA

Robert Gentleman rgentlem at fhcrc.org
Fri Jan 11 17:50:55 CET 2008


Hi,
  It looks like something fairly odd is going on, and that we are not 
seeing all of the code that is being run.

  What chip are you using?  What is very odd is that in your first 
example 1098 "duplicate" probes are found, but in the second run only 3. 
Basically this cannot happen (since the probes are the same) and 
suggests that some piece of code has manipulated the names, and at that 
point I think fairly bad things are going to happen. So this would be 
one place to try and fix things.

  Second, nsFilter filters by default at the median, so you should 
retain about 0.5 of your probe sets. But since you loose so many (you 
didn't tell us the chip so I can't be sure) but it looks like all of the 
values are corrupt for that example as well.

  So, I think that you are looking in the wrong place. Your problem is 
probably earlier on.

  best wishes
    Robert


Paolo Innocenti wrote:
> Hi again,
> 
> I tried with a different normalisation method, and I was pretty 
> surprised by the results:
> 
>  > eset.mas <- mas5(mydata)
> background correction: mas
> PM/MM correction : mas
> expression values: mas
> background correcting...done.
> 14010 ids to be processed
> |                    |
> |####################|
>  > eset.mas.f <- nsFilter(eset.mas)
>  > eset.mas.f$filter.log
> $numDupsRemoved
> [1] 1098
> 
> $numLowVar
> [1] 1
> 
> $feature.exclude
> [1] 3
> 
> $numRemoved.ENTREZID
> [1] 786
> 
>  > eset.rma <- rma(mydata)
> Background correcting
> Normalizing
> Calculating Expression
>  > eset.rma.f <- nsFilter(eset.rma)
>  > eset.rma.f$filter.log
> $numDupsRemoved
> [1] 3
> 
> $numLowVar
> [1] 13047
> 
> $feature.exclude
> [1] 3
> 
> $numRemoved.ENTREZID
> [1] 786
> 
>  > dim(eset.rma.f$eset)
> Features  Samples
>       171       15
>  > dim(eset.mas.f$eset)
> Features  Samples
>     12122       15
> 
> I don't understand how is it possible. Any suggestion about what to do? 
> Should I lower the cutoff for the rma, or that processing method doesn't 
> work for my dataset?
> 
> Paolo
> PS: I tried also a really low cutoff, but the situation doesn't change, 
> unless I choose a cutoff=0.1:
> 
>  > eset.filter <- nsFilter(eset,var.cutoff=0.2)
>  > eset.filter$filter.log
> $numDupsRemoved
> [1] 69
> 
> $numLowVar
> [1] 10560
> 
> $feature.exclude
> [1] 3
> 
> $numRemoved.ENTREZID
> [1] 786
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list