[BioC] genefilter kOverA filter by range

James W. MacDonald jmacdon at uw.edu
Thu Jan 23 23:50:13 CET 2014


Hi Kristina,

This part:

filtest<-rowSums(abs(otu_table(comp))>3)>1

should actually be

filtest<-rowSums(abs(otu_table(comp))>3)>0

returns a logical vector, which you would then use as input to 
prune_taxa(), not filter_taxa(). I am surprised it worked for you, as 
internally filter_taxa is doing something like

apply(OTU, 1, flist)

where flist is supposed to be a function, not a logical vector.

Alternatively you could just create a kOverA() version that does what 
you want:

kOverAbsA <- function(k, A, na.rm = TRUE{
         function(x) {
             if(na.rm) x <- x[!is.na(x)]
             sum(abs(x) > A) >= k
        }
}

and then do what you tried before

filtered <- filter_taxa(comp, filterfun(kOverAbsA(1, 3)), TRUE)

Best,

Jim




On 1/23/2014 5:15 PM, Kristina M Fontanez wrote:
> Hi James,
>
> Unfortunately, your proposed solution didn’t work for me. I think it’s 
> because I am working with objects built with the phyloseq package 
> which I am trying to subsequently filter with the genefilter functions.
>
> First, a review of the two phyloseq objects from my last post:
> > comp
> phyloseq-class experiment-level object
> otu_table()   OTU Table:         [ 2151 taxa and 3 samples ]
> tax_table()   Taxonomy Table:    [ 2151 taxa by 6 taxonomic ranks ]
> > LFC3
> phyloseq-class experiment-level object
> otu_table()   OTU Table:         [ 164 taxa and 3 samples ]
> tax_table()   Taxonomy Table:    [ 164 taxa by 6 taxonomic ranks ]
>
> Now, your solution:
> > filtest<-rowSums(abs(otu_table(comp))>3)>1
> > filtest[1:5]
> OTU1206 OTU1203 OTU1297 OTU1338 OTU1144
>    TRUE    TRUE    TRUE    TRUE    TRUE
> > LFC3true=filter_taxa(comp,filtest,TRUE)
> > LFC3true
> phyloseq-class experiment-level object
> otu_table()   OTU Table:         [ 66 taxa and 3 samples ]
> tax_table()   Taxonomy Table:    [ 66 taxa by 6 taxonomic ranks ]
> > min(otu_table(comp))
> [1] -7.4
> > min(otu_table(LFC3true))
> [1] -5.5
>
> BUT, as you can see the new LFC3true object is still missing that -7.4 
> value and now it contains even less taxa than the LFC3 object. If the 
> filter works correctly, I should be getting MORE taxa added to that 
> object.
>
> I also tried your solution verbatim but ran into trouble because my 
> phyloseq object can’t be subset in the way you suggested:
> > newcomp=comp[filtest,]
> Error in comp[filtest, ] : object of type 'S4' is not subsettable
>
> I believe that in order to use the filter_taxa function to subset the 
> phyloseq object, I need a genefilter list object. Pasted below is the 
> information in the phyloseq manual from the filter_taxa object.
>
> filter_taxa Filter taxa based on across-sample OTU abundance criteria
>
> Description
>
> This function is directly analogous to the genefilter function for 
> microarray filtering, but is used for filtering OTUs from phyloseq 
> objects. It applies an arbitrary set of functions — as a function 
> list, for instance, created by filterfun — as across-sample criteria, 
> one OTU at a time. It takes as input a phyloseq object, and returns a 
> logical vector indicating whether or not each OTU passed the criteria. 
> Alternatively, if the "prune" option is set to FALSE, it returns the 
> already-trimmed version of the phyloseq object.
>
> Usage
>
> Arguments
>
> physeq (Required). A phyloseq-class object that you want to trim/filter.
>
> flist (Required). A function or list of functions that take a vector 
> of abundance values and return a logical. Some canned useful function 
> types are included in the genefilter-package.
>
> prune (Optional). A logical. Default FALSE. If TRUE, then the function 
> returns the pruned phyloseq-class object, rather than the logical 
> vector of taxa that passed the filter.
>
> Value
>
> A logical vector equal to the number of taxa in physeq. This can be 
> provided directly to prune_taxa as first argument. Alternatively, if 
> prune==TRUE, the pruned phyloseq-class object is returned instead.
>
>
> Thanks,
> Kristina
> ------------------------------------------------------------------
> Kristina Fontanez, Postdoctoral Fellow
> fontanez at mit.edu <mailto:fontanez at mit.edu>
> Massachusetts Institute of Technology
> Department of Civil and Environmental Engineering
> 48-120E
> 15 Vassar Street
> Cambridge, MA 02139
>
>
>
> On Jan 23, 2014, at 4:38 PM, James W. MacDonald <jmacdon at uw.edu 
> <mailto:jmacdon at uw.edu>> wrote:
>
>> Hi Kristina,
>>
>> On 1/23/2014 4:25 PM, Kristina M Fontanez wrote:
>>> Dear Bioconductors,
>>>
>>> I am trying to use the genefilter package to filter a set of 
>>> Log2fold changes so that I can keep those taxa with Log2fold changes 
>>> > 3. However, the data itself consists of both positive and negative 
>>> values, as is the case with log 2 fold comparisons.
>>
>> You don't need the genefilter package to do this, and in fact 
>> genefilter is intended for a completely different task.
>>
>> Instead you just need to use simple R commands.
>>
>> filt <- rowSums(abs(comp) > 3) > 1
>> comp[filt,]
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>> Example data:
>>> OTU Table:          [5 taxa and 3 samples]
>>>                      taxa are rows
>>>         LvS DvS LvD
>>> OTU1206    10.3     1.3     9.0
>>> OTU1203     8.3     2.7     5.5
>>> OTU1297     6.8    -0.9     7.7
>>> OTU1338     6.2    -1.4     7.7
>>> OTU1144     7.4     1.6     5.8
>>>
>>> I want to create a filter so that the OTUs with Log2 fold changes > 
>>> magnitude 3 in either the positive or negative direction are kept. 
>>> However, the documentation for kOverA in the genefilter package 
>>> implies that you can only input “values you want to exceed”. As the 
>>> code below is currently written, I am only keeping taxa with a log2 
>>> fold change > +3 in any one sample. However, taxa with a log2 fold 
>>> change of -7 in a particular sample would be left out. I tested 
>>> whether I was missing any OTUs by looking for the minimum value in 
>>> the original OTU table (comp) and in the filtered OTU table (LFC3). 
>>> As you can see the minimum -7.4 log2 fold change value in comp does 
>>> not exist in the LFC3 object so it was excluded by my flist2 filter.
>>>
>>> Is there a similar function like kOverA that I can use to get large 
>>> magnitude changes in both the positive and negative directions?
>>>
>>> I tried the code:
>>>> comp
>>> phyloseq-class experiment-level object
>>> otu_table()   OTU Table:         [ 2151 taxa and 3 samples ]
>>> tax_table()   Taxonomy Table:    [ 2151 taxa by 6 taxonomic ranks ]
>>>
>>>> flist2<-filterfun(kOverA(1,3.0))
>>>> LFC3=filter_taxa(comp,flist2,TRUE)
>>>> LFC3
>>> phyloseq-class experiment-level object
>>> otu_table()   OTU Table:         [ 164 taxa and 3 samples ]
>>> tax_table()   Taxonomy Table:    [ 164 taxa by 6 taxonomic ranks ]
>>>> min(otu_table(comp))
>>> [1] -7.4
>>>> min(otu_table(LFC3))
>>> [1] -5.5
>>>
>>> Thank you,
>>> Kristina
>>>
>>>> sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] genefilter_1.44.0 ggplot2_0.9.3.1   scales_0.2.3 
>>>      phyloseq_1.7.12
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] ade4_1.6-2            annotate_1.40.0       AnnotationDbi_1.24.0
>>>  [4] ape_3.0-11            Biobase_2.22.0        BiocGenerics_0.8.0
>>>  [7] biom_0.3.11           Biostrings_2.30.1     cluster_1.14.4
>>> [10] codetools_0.2-8       colorspace_1.2-4      DBI_0.2-7
>>> [13] DESeq2_1.2.8          dichromat_2.0-0       digest_0.6.4
>>> [16] foreach_1.4.1         GenomicRanges_1.14.4  grid_3.0.2
>>> [19] gtable_0.1.2          igraph_0.6.6          IRanges_1.20.6
>>> [22] iterators_1.0.6       labeling_0.2          lattice_0.20-24
>>> [25] locfit_1.5-9.1        MASS_7.3-29           Matrix_1.1-1.1
>>> [28] multtest_2.18.0       munsell_0.4.2         nlme_3.1-113
>>> [31] parallel_3.0.2        permute_0.8-0         plyr_1.8
>>> [34] proto_0.3-10          RColorBrewer_1.0-5    Rcpp_0.10.6
>>> [37] RcppArmadillo_0.4.000 reshape2_1.2.2        RJSONIO_1.0-3
>>> [40] RSQLite_0.11.4        splines_3.0.2         stats4_3.0.2
>>> [43] stringr_0.6.2         survival_2.37-4       tools_3.0.2
>>> [46] vegan_2.0-10          XML_3.95-0.2          xtable_1.7-1
>>> [49] XVector_0.2.0
>>>
>>> ------------------------------------------------------------------
>>> Kristina Fontanez, Postdoctoral Fellow
>>> fontanez at mit.edu <mailto:fontanez at mit.edu><mailto:fontanez at mit.edu>
>>> Massachusetts Institute of Technology
>>> Department of Civil and Environmental Engineering
>>> 48-120E
>>> 15 Vassar Street
>>> Cambridge, MA 02139
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the 
>>> archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list