[BioC] Filtering by fold change

Mon Jun 28 12:32:47 CEST 2004

On Mon, 2004-06-28 at 07:47, Anthony Bosco wrote:
> The easiet way is as follows for Affymetrix data.
> 
> Load and analyse data
> 
> data<-ReadAffy()
> 
> eset<-rma(data)
> 
> exp<-eset at exprs
> Calculate fold change (m) values for 2 chips
> 
> m<-exp[,1]-exp[,1]

Err, isn't this equivalent to m <- x - x which always produce 0. Besides
I think the definition of Fold Change is the difference of the group
_means_. 

Using exp is bad idea as it is also an R function, see help("exp").

> Ask for top up and down regulated genes
> 
> X<-names(m[abs(m)>2])

Your solution can produce spurious results. You should surround the
abs(m) > 2 with which(). See below :

> x <- 1:5
> names(x) <- letters[1:5]
> x
a b c d e
1 2 3 4 5
>
> names( x > 2.5 )
[1] "a" "b" "c" "d" "e"
>
> names( which( x > 2.5 ) )
[1] "c" "d" "e"

Note : There is the issue of translating from log2 base if you use RMA.
So a FC of 2 would correspond to log2(FC) of 1.

> Keep adjusting m cut off (ie 2 in above example) to get desired number of genes
> 
> Filter data
> 
> exp.subset<-exp[X,]
> 
> Alternatively Sort on fold change
> 
> m.sort <- sort(m,dec=T)
> top.genes <- m.sort[1:1000]
> 
> exp.subset<-exp[top.genes,]

It is much easier and faster to use indices than matching rownames.

data <- matrix( rnorm(100000), nc=10 )
rownames(data) <- paste("g", 1:10000, sep="")

> w <- which(abs(rowMeans(data)) > 0.5)
> names.w <- rownames(data)[w]
length(w); length(names.w)
[1] 1173
[1] 1173

> system.time( sub1 <- data[ w, ] )
[1] 0 0 0 0 0
> system.time( sub1 <- data[ names.w, ] )
[1] 0.47 0.00 0.48 0.00 0.00