[R] subsetting a data.frame based on a specific group of columns

Fri Nov 6 14:53:27 CET 2015

sorry, for the misunderstanding. here is a more elaborate description of
what i would like to achieve.

I have a data set of counts from a RNA-Seq experiment and would like to
filter reads with low counts. I don't want to set everything to 0
automatically.

I would like to set each categorical group (e.g. condition) to 0, if and
only if all replica in the group together have less than 100 reads.
in my examples I used X and Y to represents the categories. Ususally they
have a more distinct names like "control", "knockout1", "dKo" etc.

So what I really like to do is to check if the sum of all the "control"
samples is lower than 100. If so, set all control sample to 0. This I would
like to check *for each category* of every row of the data set.

I hope it is more clear now

thanks
Assa

On Fri, Nov 6, 2015 at 2:29 PM, jim holtman <jholtman at gmail.com> wrote:

> Is this what you want:
>
> > x <- read.table(text = "X1    X2    X3    Y1    Y2    Y3
> + 1232    357    23    0    9871    72
> + 0    71    9    811    795    743
> + 43    919    1111    0    76    14", header = TRUE)
> > x
>     X1  X2   X3  Y1   Y2  Y3
> 1 1232 357   23   0 9871  72
> 2    0  71    9 811  795 743
> 3   43 919 1111   0   76  14
> >
> > # create indices of columns that start with the same character
> > indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
> > names(indx) <- NULL  # remove names so output not messed up
> >
> > result <- lapply(indx, function(a){
> +     row_sum <- rowSums(x[, a])
> +     x[row_sum < 100, a] <- 0
> +     x[, a]
> + })
> > # combine back together
> > do.call(cbind, result)
>     X1  X2   X3  Y1   Y2  Y3
> 1 1232 357   23   0 9871  72
> 2    0   0    0 811  795 743
> 3   43 919 1111   0    0   0
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz <frymor at gmail.com> wrote:
>
>> Hi,
>>
>> I have a data frame with multiple columns, which are belong to several
>> groups
>> like that:
>> X1    X2    X3    Y1    Y2    Y3
>> 1232    357    23    0    9871    72
>> 0    71    9    811    795    743
>> 43    919    1111    0    76    14
>>
>> I would like to filter such rows out, where the sums in one group is lower
>> than a specifc value. For example, I would like to set all the values in a
>> group of cloums to zero, if the sum in one group is less than 100
>> In my example table I would like to set the values in the second row for
>> the three X-columns to 0, so that the table looks like that:
>>
>> X1    X2    X3    Y1    Y2    Y3
>> 1232    357    23    0    9871    72
>> 0    0    0    811    795    743
>> 43    919    1111    0    0    0
>>
>> the same apply also for the Y-values in the last column.
>> Is there a more efficient way of doing it than going row by row and use
>> the
>> apply function on each of the subgroups I have in the columns?
>>
>> thanks
>> Assa
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

	[[alternative HTML version deleted]]