[BioC] genefilter displaying the expression set

Jenny Drnevich drnevich at illinois.edu
Wed Feb 25 15:32:46 CET 2009


Hi Dhaarini,

What is in Peder's code but is NOT in the genefilter vignette is what 
you should do with the output of genefilter(), which is a logical 
vector the same length as the number of genes. You can use this 
vector to subset your expression data object like so:

 > ans <- genefilter(tumor, flist)
 > sum(ans)   # tells you how many genes pass your filter
 > tumor.filt <- tumor[ans,]     # subsetting your expression object 
by the TRUE/FALSE vector

IMO, the vignette, genefilter/doc/howtogenefilter.pdf should also 
give an example of how to use the output of genefilter() to subset 
your expression object (hint, hint Biocore Team c/o BioC user list, 
the maintainer of genefilter).

Cheers,
Jenny


At 06:12 AM 2/25/2009, Peder Worning wrote:
>Hi Dhaarini,
>
>Filtering genes are a delicate but important matter and you can filter
>them on low values and variance. Expression values that do not chance
>over you samples are not very informative.
>
>I have made my own function that use the genefilter package that
>combines low value, NA's, low variation and range. I use that but I
>always try it out with different parameter to see what happens to my
>data.
>I am working with microRNA arrays and my data are in logscale, but the
>principles should be the same.
>
>Here is the code, be ware of line shifts introduced by outlook:
>
>Data.filter <-
>function(e.matrix,kk=as.integer(ncol(e.matrix)/8),aa=7,na=5,var=0.1,er=3
>00){
># This function takes an expression matrix with genes in rows and
>samples in columns
># It filter genes out that do not meet the criteria
># kk minimal number of values > aa; na maximun number of NA; var minimal
>variation of values; er minimal range of 2^values
>   e.matrix.f <- e.matrix [genefilter(e.matrix , kOverA(k= kk, A=aa,
>na.rm=TRUE)),]
>   nna <- apply(e.matrix.f,1,function(x){(sum(is.na(x)))})
>   e.matrix.f <- e.matrix.f[nna<=na,]
>   rvar <- apply(e.matrix.f,1,function(x){var(x, na.rm = TRUE)})
>   e.matrix.f = e.matrix.f[(rvar>=var),]
>   exp.range <-
>apply(e.matrix.f,1,function(x){2**max(x,na.rm=TRUE)-2**min(x,na.rm=TRUE)
>})
>   e.matrix.f <- e.matrix.f[exp.range>er,]
>   e.matrix.f
>}
>
>Good luck
>Peder
>
>Best regards
>
>Exiqon A/S
>
>  Peder Worning, Ph.D.
>
>Senior Scientist, Biomarker Discovery
>
>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch
>[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of dhaarini s
>Sent: Wednesday, February 25, 2009 9:40 AM
>To: bioconductor at stat.math.ethz.ch
>Subject: [BioC] genefilter displaying the expression set
>
>Hi all!
>I am new to R and Bioconductor. I am having a dataset of 22283 genes and
>190
>samples. Due to the huge size of the data, I want to filter some
>irrelevant
>genes. I tried the "genefilter" package of BioC, but then understand
>that it
>does gene filtering by simply displaying whether the gene satifies the
>filter condition or not by marking it as TRUE. This is how I proceeded:
> > library(genefilter)
> > f1 <- kOverA(5, 10)
> > flist <- filterfun(f1)
> > ans <- genefilter(tumor, flist)
>(The object "tumor" contains my expression dataset.) The output is
>something
>like this:
>"x"
>"1007_s_at" TRUE
>"1053_at" FALSE
>"117_at" FALSE
>"121_at" FALSE
>"200001_at" TRUE
>"200002_at" TRUE
>..........................
>But, Iwould like to know whether the genefilter will return me an
>expression
>set containing the filtered genes and their expression values for the
>samples. Please help me out!
>Thanks in advance.
>Regards,
>Dhaarini
>
>         [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu



More information about the Bioconductor mailing list