[R] splitting dataframe, assign to new dataframe, add new rows to new dataframe

cls59 chuck at sharpsteen.net
Tue Oct 13 03:41:20 CEST 2009




wk yeo wrote:
> 
> 
> Hi, all,
> 
> My objective is to split a dataframe named "cmbine" according to the value
> of "classes". After the split, I will take the first instance from each
> class and bin them into a new dataframe, "df1". In the 2nd iteration, I
> will take the 2nd available instance and bin them into another new
> dataframe, "df2".
> 
> 
>>cmbine$names
> apple tiger pencil chicken banana pear
> 
>>cmbine$mass
> 0.50 100.00 0.01 1.00 0.15 0.30
> 
>>cmbine$classes
> 1 2 3 2 1 1
> 
> 

If possible, it would be helpful to provide sample data in a form that could
be copied and pasted directly into an R session, like so:

cmbine <- data.frame( names = c('apple', 'tiger', 'pencil', 'chicken',
'banana', 'pear' ) )
cmbine['mass'] <- c(0.50, 100.00, 0.01, 1.00, 0.15, 0.30)
cmbine['classes'] <- factor(c(1, 2, 3, 2,1 ,1))

It saves people on the list a bunch of coping/pasting/quote adding. Another
quick way to do this is to use  the dump() which spits out the structure of
your object in a way that can be copied and pasted:

dump( 'cmbine', file='' )



wk yeo wrote:
> 
> 
> These are the results which I want to obtain:
> 
>>df1
> classes  mass
> apple  0.50
> tiger  100.00
> pencil  0.01
> 
>>df2
> classes  mass
> banana  0.15
> chicken  1.00
> 
>>df3
> classes  mass
> pear  0.30
> 
> Below shows what I have tried. The main problem I have = I don't know how
> to assign the selected instance into a new dataframe with a name which is
> generated 'on-the-fly' based on the value of j (the jth row).
> 
> 
> for (i in 1:3) {
> same_cell <- cmbine[cmbine$classes == i, ]
> if (nrow(same_cell)!=0){
>   for (j in 1:nrow(same_cell)){
>     picked <- same_cell[j, ]
>     assign(paste("df", j, sep=""), picked)
>     #assign(paste("df",j, sep=""), paste("df", j, sep=""))
>   }
> }
> 
> 

I'm assuming you want the results grouped by class, i.e. all the 1s in one
data frame all the 2s in another. This can be done with a slight
modification of your loop:

for (i in 1:3) {
  same_cell <- cmbine[cmbine$classes == i, ]

  if (nrow(same_cell)!=0){
    assign(paste("df", i, sep=""), same_cell)
  }

}

However, the results I get aren't the same as the results you said you
wanted:

> df1
   names mass classes
1  apple 0.50       1
5 banana 0.15       1
6   pear 0.30       1

> df2
    names mass classes
2   tiger  100       2
4 chicken    1       2

> df3
   names mass classes
3 pencil 0.01       3


The "R way" of doing this is to use the by() function, which breaks a data
frame into sub-data frames based on a column of factors-- such as the
classes. For your example, it would be used as:

by( cmbine, cmbine[['classes']], function( df ){

  # Lots of stuff can happen inside this function, in this case we are
really
  # just returning the subset that got passed in.

  return( df )

})

cmbine[["classes"]]: 1
   names mass classes
1  apple 0.50       1
5 banana 0.15       1
6   pear 0.30       1
----------------------------------------------------------------------- 
cmbine[["classes"]]: 2
    names mass classes
2   tiger  100       2
4 chicken    1       2
----------------------------------------------------------------------- 
cmbine[["classes"]]: 3
   names mass classes
3 pencil 0.01       3

The by() function returns a fancy list, each component of which can be
accessed using the [] operator.

Hope this helps!

-Charlie


-----
Charlie Sharpsteen
Undergraduate
Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: http://www.nabble.com/splitting-dataframe%2C-assign-to-new-dataframe%2C-add-new-rows-to-new-dataframe-tp25865409p25865911.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list