[R] Does SQL group by have a heavy duty equivalent in R

Farrel Buchinsky fjbuch at gmail.com
Sun Dec 31 22:16:55 CET 2006


I converted the whole data frame to character by using
as.matrix

And then using a posting that explained how to get the naming conventions 
back (which had been lost when converting to matrix)

Anything that I did not list with the id's it insisted in including them 
with the measured variables. In other words it would not let me drop.

despite

melted<-melt(BigDF, id=c("SAMPLE_ID","ASSAY_ID"), 
measured=c("GENOTYPE_ID","DESCRIPTION"))

unique(melted$variable)
 [1] CUSTOMER       PROJECT        PLATE          EXPERIMENT     CHIP 
WELL_POSITION  GENOTYPE_ID    DESCRIPTION    ENTRY_OPERATOR
[10] INTERACT       PLATEc
Levels: CUSTOMER PROJECT PLATE EXPERIMENT CHIP WELL_POSITION GENOTYPE_ID 
DESCRIPTION ENTRY_OPERATOR INTERACT PLATEc


I should have only got GENOTYPE_ID    and DESCRIPTION

"hadley wickham" <h.wickham at gmail.com> wrote in message 
news:f8e6ff050612310758p11f96c0dl256ac5b15d11dc2c at mail.gmail.com...
>> nr.attempts
>> <-aggregate(RawSeq$GENOTYPE_ID,list(sample=RawSeq$SAMPLE_ID,assay=RawSeq$ASSAY_ID),length)
>> This was simply to figure out how many times the same piece of 
>> information
>> had been obtained. I ran out of patience. It took beyond forever and 
>> tapply
>> did not perform much better. The reshape package did not help - it 
>> implied
>> one was out of luck if the data was not numeric. All of my data is 
>> character
>> or factor.
>
> The reshape package will work if all your data is numeric, or all of
> it is character - it doesn't work with a mix.  I will try and make
> this more clear in the documentation.
> However, depending on the size and structure of your data it may not
> be any faster than tapply or aggregate.
>
> Hadley
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list