[BioC] filter low expression tags

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Thu Nov 29 05:28:46 CET 2012


You keep the genes where at least 2 samples have a cpm greater than 100.

rowSums(cpm(d) >100)

counts, for each gene (row), how many samples have a cpm >= 100.

Kasper

On Wed, Nov 28, 2012 at 10:54 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Wed, Nov 28, 2012 at 10:14 PM, Vittoria Roncalli <roncalli at hawaii.edu> wrote:
>> Hi,
>>
>> I would like to understand how the filter of low expression tags works. If
>> I run the command
>>
>>>keep <- rowSums (cpm(d)>100) >=2
>> d <- d[keep,]
>> dim(d)
>>
>> as in the use guide page 32, this means that I am using a cutoff of 100cpm,
>> but how are treated the 2 samples? Did are they averaged and then the low
>> tags are removed?
>> Is each sample considered separate and filtered by itself?
>> Thanks foe the help in advance
>
> How many samples (columns) do you have?
>
> You should first look at the output of `cpm(d) > 100` to see what you
> are getting -- this will be a logical (boolean) matrix that has the
> same dimensionality as `cpm(d)`.
>
> rowSums( a logical matrix )
>
> returns a vector that is as long as there are rows in the logical
> matrix, and each value indicates how many columns are TRUE in that
> row.
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list