[BioC] genefilter displaying the expression set

Peder Worning pwo at exiqon.com
Wed Feb 25 13:12:33 CET 2009


Hi Dhaarini,

Filtering genes are a delicate but important matter and you can filter
them on low values and variance. Expression values that do not chance
over you samples are not very informative.

I have made my own function that use the genefilter package that
combines low value, NA's, low variation and range. I use that but I
always try it out with different parameter to see what happens to my
data.
I am working with microRNA arrays and my data are in logscale, but the
principles should be the same. 

Here is the code, be ware of line shifts introduced by outlook:

Data.filter <-
function(e.matrix,kk=as.integer(ncol(e.matrix)/8),aa=7,na=5,var=0.1,er=3
00){
# This function takes an expression matrix with genes in rows and
samples in columns 
# It filter genes out that do not meet the criteria
# kk minimal number of values > aa; na maximun number of NA; var minimal
variation of values; er minimal range of 2^values
  e.matrix.f <- e.matrix [genefilter(e.matrix , kOverA(k= kk, A=aa,
na.rm=TRUE)),]
  nna <- apply(e.matrix.f,1,function(x){(sum(is.na(x)))})
  e.matrix.f <- e.matrix.f[nna<=na,]
  rvar <- apply(e.matrix.f,1,function(x){var(x, na.rm = TRUE)})
  e.matrix.f = e.matrix.f[(rvar>=var),]
  exp.range <-
apply(e.matrix.f,1,function(x){2**max(x,na.rm=TRUE)-2**min(x,na.rm=TRUE)
})
  e.matrix.f <- e.matrix.f[exp.range>er,]
  e.matrix.f
} 

Good luck
Peder 

Best regards 

Exiqon A/S

 Peder Worning, Ph.D.

Senior Scientist, Biomarker Discovery

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of dhaarini s
Sent: Wednesday, February 25, 2009 9:40 AM
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] genefilter displaying the expression set

Hi all!
I am new to R and Bioconductor. I am having a dataset of 22283 genes and
190
samples. Due to the huge size of the data, I want to filter some
irrelevant
genes. I tried the "genefilter" package of BioC, but then understand
that it
does gene filtering by simply displaying whether the gene satifies the
filter condition or not by marking it as TRUE. This is how I proceeded:
> library(genefilter)
> f1 <- kOverA(5, 10)
> flist <- filterfun(f1)
> ans <- genefilter(tumor, flist)
(The object "tumor" contains my expression dataset.) The output is
something
like this:
"x"
"1007_s_at" TRUE
"1053_at" FALSE
"117_at" FALSE
"121_at" FALSE
"200001_at" TRUE
"200002_at" TRUE
..........................
But, Iwould like to know whether the genefilter will return me an
expression
set containing the filtered genes and their expression values for the
samples. Please help me out!
Thanks in advance.
Regards,
Dhaarini

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list