[R] subsetting a data.frame based on a specific group of columns

jim holtman jholtman at gmail.com
Fri Nov 6 15:10:42 CET 2015


I assume the solution is somewhat the same; you just have to define how to
determine what the "distinctive" names are to create the groupings.  My
solution assumed it was the first character.  If the group names end in a
unique sequence, you can use this to form the groups, or you can provide a
list of the first part of the names to match on to form the groups.  You
need to provide a reasonable subset of the data so that we can exactly
understand what the data is and how it should be grouped.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Nov 6, 2015 at 8:53 AM, Assa Yeroslaviz <frymor at gmail.com> wrote:

> sorry, for the misunderstanding. here is a more elaborate description of
> what i would like to achieve.
>
> I have a data set of counts from a RNA-Seq experiment and would like to
> filter reads with low counts. I don't want to set everything to 0
> automatically.
>
> I would like to set each categorical group (e.g. condition) to 0, if and
> only if all replica in the group together have less than 100 reads.
> in my examples I used X and Y to represents the categories. Ususally they
> have a more distinct names like "control", "knockout1", "dKo" etc.
>
> So what I really like to do is to check if the sum of all the "control"
> samples is lower than 100. If so, set all control sample to 0. This I would
> like to check *for each category* of every row of the data set.
>
> I hope it is more clear now
>
> thanks
> Assa
>
>
> On Fri, Nov 6, 2015 at 2:29 PM, jim holtman <jholtman at gmail.com> wrote:
>
>> Is this what you want:
>>
>> > x <- read.table(text = "X1    X2    X3    Y1    Y2    Y3
>> + 1232    357    23    0    9871    72
>> + 0    71    9    811    795    743
>> + 43    919    1111    0    76    14", header = TRUE)
>> > x
>>     X1  X2   X3  Y1   Y2  Y3
>> 1 1232 357   23   0 9871  72
>> 2    0  71    9 811  795 743
>> 3   43 919 1111   0   76  14
>> >
>> > # create indices of columns that start with the same character
>> > indx <- split(seq(ncol(x)), substring(colnames(x), 1, 1))
>> > names(indx) <- NULL  # remove names so output not messed up
>> >
>> > result <- lapply(indx, function(a){
>> +     row_sum <- rowSums(x[, a])
>> +     x[row_sum < 100, a] <- 0
>> +     x[, a]
>> + })
>> > # combine back together
>> > do.call(cbind, result)
>>     X1  X2   X3  Y1   Y2  Y3
>> 1 1232 357   23   0 9871  72
>> 2    0   0    0 811  795 743
>> 3   43 919 1111   0    0   0
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Fri, Nov 6, 2015 at 5:40 AM, Assa Yeroslaviz <frymor at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a data frame with multiple columns, which are belong to several
>>> groups
>>> like that:
>>> X1    X2    X3    Y1    Y2    Y3
>>> 1232    357    23    0    9871    72
>>> 0    71    9    811    795    743
>>> 43    919    1111    0    76    14
>>>
>>> I would like to filter such rows out, where the sums in one group is
>>> lower
>>> than a specifc value. For example, I would like to set all the values in
>>> a
>>> group of cloums to zero, if the sum in one group is less than 100
>>> In my example table I would like to set the values in the second row for
>>> the three X-columns to 0, so that the table looks like that:
>>>
>>> X1    X2    X3    Y1    Y2    Y3
>>> 1232    357    23    0    9871    72
>>> 0    0    0    811    795    743
>>> 43    919    1111    0    0    0
>>>
>>> the same apply also for the Y-values in the last column.
>>> Is there a more efficient way of doing it than going row by row and use
>>> the
>>> apply function on each of the subgroups I have in the columns?
>>>
>>> thanks
>>> Assa
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list