[R] how to split a data frame by two variables

MacQueen, Don macqueen1 at llnl.gov
Thu Sep 1 20:28:08 CEST 2011


Even though it's not needed, here's a small followup.

I usually use this
  split(x, paste(x$let,x$g))

But since
   split(x, list(x$let,x$g))
works, so does
   split(x, x[,c('let','g')])

> all.equal( split(x, x[,c('let','g')]) , split(x,list(x$let,x$g)))
[1] TRUE


As to which is the best, hard to say. If the variable names you want to
split by are held in character vector, then the third one has an advantage

  splt.by <- c('let','g')
  split(x, x[,splt.by] )

If x were large, and the number of columns to split by were large, there
might be performance differences, but I suspect they would have to be
*very* large before it mattered.

-Don


-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 9/1/11 11:08 AM, "Changbin Du" <changbind at gmail.com> wrote:

>Thanks for the great helps from David, Jim and Liviu. It solved my
>problem.
>
>Appreciated!
>
>On Thu, Sep 1, 2011 at 11:01 AM, David Winsemius
><dwinsemius at comcast.net>wrote:
>
>>
>> On Sep 1, 2011, at 1:53 PM, Changbin Du wrote:
>>
>>  HI, Dear R community,
>>>
>>> I want to split a data frame by using two variables: let and g
>>>
>>>  x = data.frame(num =
>>>>
>>> c(10,11,12,43,23,14,52,52,12,**23,21,23,32,31,24,45,56,56,76,**45),
>>>let =
>>> letters[1:5], g = 1:2)
>>>
>>>> x
>>>>
>>>  num let g
>>> 1   10   a 1
>>> 2   11   b 2
>>> 3   12   c 1
>>> 4   43   d 2
>>> 5   23   e 1
>>> 6   14   a 2
>>> 7   52   b 1
>>> 8   52   c 2
>>> 9   12   d 1
>>> 10  23   e 2
>>> 11  21   a 1
>>> 12  23   b 2
>>> 13  32   c 1
>>> 14  31   d 2
>>> 15  24   e 1
>>> 16  45   a 2
>>> 17  56   b 1
>>> 18  56   c 2
>>> 19  76   d 1
>>> 20  45   e 2
>>>
>>> I tried the following:
>>>
>>> xs = split(x,x$g*x$let)
>>>
>>
>> Probably
>>
>>  xs = split(x,list(x$g,x$let))
>>
>>>
>>> *Warning message:
>>> In Ops.factor(x$g, x$let) : * not meaningful for factors*
>>>
>>>
>>> xs = split(x,c(x$g*x$let))
>>>
>>> *Warning message:
>>> In Ops.factor(x$g, x$let) : * not meaningful for factors
>>> *
>>>
>>> Can someone give some hints?
>>>
>>> Thanks!
>>>
>>>
>>> --
>>> Sincerely,
>>> Changbin
>>> --
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________**________________
>>> R-help at r-project.org mailing list
>>> 
>>>https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mail
>>>man/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
>
>-- 
>Sincerely,
>Changbin
>--
>
>Changbin Du
>Data Analysis Group, Affymetrix Inc
>6550 Emeryville, CA, 94608
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list