[BioC] Filtering by fold change
Adaikalavan Ramasamy
ramasamy at cancer.org.uk
Mon Jun 28 12:32:47 CEST 2004
On Mon, 2004-06-28 at 07:47, Anthony Bosco wrote:
> The easiet way is as follows for Affymetrix data.
>
> Load and analyse data
>
> data<-ReadAffy()
>
> eset<-rma(data)
>
> exp<-eset at exprs
> Calculate fold change (m) values for 2 chips
>
> m<-exp[,1]-exp[,1]
Err, isn't this equivalent to m <- x - x which always produce 0. Besides
I think the definition of Fold Change is the difference of the group
_means_.
Using exp is bad idea as it is also an R function, see help("exp").
> Ask for top up and down regulated genes
>
> X<-names(m[abs(m)>2])
Your solution can produce spurious results. You should surround the
abs(m) > 2 with which(). See below :
> x <- 1:5
> names(x) <- letters[1:5]
> x
a b c d e
1 2 3 4 5
>
> names( x > 2.5 )
[1] "a" "b" "c" "d" "e"
>
> names( which( x > 2.5 ) )
[1] "c" "d" "e"
Note : There is the issue of translating from log2 base if you use RMA.
So a FC of 2 would correspond to log2(FC) of 1.
> Keep adjusting m cut off (ie 2 in above example) to get desired number of genes
>
> Filter data
>
> exp.subset<-exp[X,]
>
> Alternatively Sort on fold change
>
> m.sort <- sort(m,dec=T)
> top.genes <- m.sort[1:1000]
>
> exp.subset<-exp[top.genes,]
It is much easier and faster to use indices than matching rownames.
data <- matrix( rnorm(100000), nc=10 )
rownames(data) <- paste("g", 1:10000, sep="")
> w <- which(abs(rowMeans(data)) > 0.5)
> names.w <- rownames(data)[w]
length(w); length(names.w)
[1] 1173
[1] 1173
> system.time( sub1 <- data[ w, ] )
[1] 0 0 0 0 0
> system.time( sub1 <- data[ names.w, ] )
[1] 0.47 0.00 0.48 0.00 0.00
More information about the Bioconductor
mailing list