[R] Microarray analysis

Phil Spector spector at stat.Berkeley.EDU
Fri Nov 3 22:04:59 CET 2006


Friederike -
    The program is doing exactly what you asked it to do, but please
notice that

varprobe<-c(var(schizo.matrix[x,]))

creates a vector containing all the elements of the variance-covariance 
matrix of schizo.matrix[x,], which is a 20000x150 matrix, giving a 
variance-covariance matrix of 150x150 or 22500 elements.  You then rank all
of the elements of this variance-covariance matrix, and ask for the
ones that are ranked higher than 10000, leaving 22500-10000=12500
elements.  So now we should probably look at what you wanted to do:

x <- 1:20000
y <- 2:151
schizo.matrix<-data.matrix(schizo[,y])
varprobe<-apply(schizo.matrix[x,],1,var)  # calculate the variance of each row
# reorder the rows of schizo.matrix to correspond to varprobe and 
# select the top 10000
schizo.sub = schizo.matrix[order(varprobe,decreasing=TRUE),][1:10000,]

There are other ways of extracting the top 10000, but I think that this
shows how to calculate what you really want.

                                        - Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu



On Fri, 3 Nov 2006, Friederike Barthel wrote:

> Dear List
>
> I am currently running a microarray analysis on the dataset schizo and would
> like to filter out all genes with a low variance. However, when running the
> code detailed below, I end up with 12,500 genes in my final set rather than
> the 10,000 I was looking for. Can anyone pinpoint where I am going wrong?
>
> ***********reading in data**********
>
> schizo<-read.table("octassign_data.txt",header=T, sep="\t")
>
> dim(schizo)
>
> head(schizo)
>
> attach(schizo)
>
> ***********creating matrix and calculating variance across probesets********
>
> x<-c(1:20000)
>
> y<-c(2:151)
>
> schizo.matrix<-data.matrix(schizo[,y])
>
> varprobe<-c(var(schizo.matrix[x,]))
>
> hist(varprobe)
>
> **************filter out low variance*************
>
> top10000 <- which(rank(varprobe)>10000)
>
> schizo.sub<-schizo[top10000,]
>> dim(schizo.sub)
> [1] 12500   151
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list