[R] split function

David Winsemius dwinsemius at comcast.net
Fri Feb 26 21:06:03 CET 2010


On Feb 26, 2010, at 2:40 PM, rusers.sh wrote:

> Your method seems to only re-express the data "data.frame(x, g)" using
> another format.

In all fairness to the first respondent to your question, that _was_  
what it appeared you were requesting. My other thoughts would be:

 > cbind(x[order(g)], sort(as.numeric(as.character(g)))) # g is a factor
# so sort(g) or g[order(g)] returns the internal index.
             [,1] [,2]
  [1,] -0.0678237    0
  [2,]  2.2538149    0
  [3,]  1.8951257    0
  [4,]  2.2079620    0
  [5,]  3.2011267    1
  [6,] -0.5524036    1
  [7,]  0.7891743    1
  [8,]  2.2520006    1
  [9,]  1.1191421    1
[10,]  2.2923470    1
[11,]  3.5831695    1
[12,]  2.2299013    2
[13,]  1.5140759    2

or:

 >split(data.frame(x=x,g=g), g)
 > split(data.frame(x=x,g=g), g)
$`0`
             x g
6  -0.0678237 0
15  2.2538149 0
18  1.8951257 0
30  2.2079620 0

$`1`
             x g
1   3.2011267 1
3  -0.5524036 1
10  0.7891743 1
12  2.2520006 1
17  1.1191421 1
19  2.2923470 1
29  3.5831695 1

$`2`
            x g
2  2.2299013 2
<snipped output>

Which also re-express it. But if that is not what you want then offer  
a better explanation .... and a different example of desired output.


> The results are really from the generated data frame. Maybe
> be not good.
>> table(g)
> g
> 0 1 2 3
> 7 9 8 6
>  I hope to randomly split the value 'x' according to the different  
> sample
> sizes of different levels, displayed above. That is, 7 for level 0,  
> 9 for
> level 1, et al.
>  Thanks.

Or maybe you don't want the value of x but the number of elements?

 > tapply(x, g, length)  # another way to get a table
  0  1  2  3             # and with different numbers since you did  
not use set.seed(123)
  4  7 11  8

Please do clarify.

>
> 2010/2/26 Henrique Dallazuanna <wwwhsd at gmail.com>
>
>> Try this:
>>
>> split(data.frame(x, g), g)
>>
>> On Fri, Feb 26, 2010 at 3:55 PM, rusers.sh <rusers.sh at gmail.com>  
>> wrote:
>>> Hi,
>>> I am using split function and wonder how to add the factor to the
>> splitted
>>> results.
>>> #Example
>>> n <- 3; nn <- 10
>>> g <- factor(round(n * stats::runif(n * nn)))   #factor
>>> x <- rnorm(n * nn) + sqrt(as.numeric(g))    #value
>>> xg <- split(x, g)
>>> xg
>>> $`0`
>>> [1]  0.82513702 -0.03911584  2.32955347  0.36745335  1.75572642
>> 2.65461438
>>> 0.41675829
>>> $`1`
>>> [1]  0.8583493  2.4264804 -0.3622378  3.1770015  0.5162129
>>> $`2`
>>> [1] 1.7914651 1.1440121 0.8097543 1.2064742 1.6411988 1.3743778
>> 1.7094387
>>> 2.1204501 1.9330132 2.0731997
>>> [11] 2.8931865 2.5825309 0.6978723
>>> $`3`
>>> [1] 3.0246214 1.6870782 0.9685926 1.6449350 0.9378751
>>>> g
>>> [1] 2 2 3 2 1 3 2 3 3 1 2 2 2 2 0 0 3 0 2 2 1 1 2 2 0 1 2 0 0 0
>>> Levels: 0 1 2 3
>>>
>>> Anybody can tell me how to add the corresponding values of factor  
>>> "g" to
>>> the splitted results 'xg' to get a data frame?
>>> Something like,
>>>
>>> Splitted/xg     factor/g
>>> 0.82513702        0
>>> -0.03911584       0
>>> 2.32955347        0
>>>   ...
>>> I know i can use "xg$'0',xg$'1',xg$'2',xg$'3'" to get the values  
>>> of each
>>> class and then add a new variable to indicate the factor.
>>> But i hope to get a method to automatic do those things. Any ideas?
>>> Thanks.
>>>
>>>
>>> --


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list