[R] R 3.1.2 using a custom function in aggregate() function on Windows 7 OS 64bit

Bert Gunter gunter.berton at gene.com
Thu Mar 5 17:59:55 CET 2015


That's not what ?aggregate says:

"aggregate.data.frame is the data frame method. If x is not a data
frame, it is coerced to one, which must have a non-zero number of
rows. Then, each of the variables (columns) in x is split into subsets
of cases (rows) of identical combinations of the components of by, and
FUN is applied to each such subset with further arguments in ...
passed to it."


As I read this, the argument of FUN is a data frame that is a subset
of the original frame, defined by the by variable values.


No?


-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Thu, Mar 5, 2015 at 8:55 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
> I don't see your point. No matter which version of aggregate you use, FUN is applied to vectors. Those vectors may be columns in a data frame or not, but FUN is always given one vector at a time by aggregate.
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------
> Sent from my phone. Please excuse my brevity.
>
> On March 5, 2015 8:12:39 AM PST, Bert Gunter <gunter.berton at gene.com> wrote:
>>Sorry, Jeff. aggregate() is generic.
>>
>>>From ?aggregate:
>>
>>"## S3 method for class 'data.frame'
>>aggregate(x, by, FUN, ..., simplify = TRUE)"
>>
>>Cheers,
>>Bert
>>
>>Bert Gunter
>>Genentech Nonclinical Biostatistics
>>(650) 467-7374
>>
>>"Data is not information. Information is not knowledge. And knowledge
>>is certainly not wisdom."
>>Clifford Stoll
>>
>>
>>
>>
>>On Thu, Mar 5, 2015 at 7:54 AM, Jeff Newmiller
>><jdnewmil at dcn.davis.ca.us> wrote:
>>> The aggregate function applies FUN to vectors, not data frames. For
>>example, the default "mean" function accepts a vector such as a column
>>in a data frame and returns a scalar (well, a vector of length 1).
>>Aggregate then calls this function once for each piece of the column(s)
>>you give it. Your function wants two vectors, but aggregate does not
>>understand how to give two inputs.
>>>
>>> (In the future, please follow R-help mailing list guidelines and post
>>using plain text so your code does not get messed up.)
>>>
>>> You could use split to break your data frame into a list of data
>>frames, and then sapply to extract the results you are looking for. I
>>prefer to use the plyr or dplyr or data.table packages to do all this
>>for me.
>>>
>>> d_rule <- function( DF ) {
>>>   i <- which( DF$a==max( DF$a ) )
>>>   if ( length( i ) == 1 ){
>>>     DF[ i, "x" ]
>>>   } else {
>>>     min( DF[ , "x" ] ) # did you mean min( DF$x[i] ) ?
>>>   }
>>> }
>>>
>>> dat <- data.frame( a=c(2,2,1,4,2,5,2,3,4,4)
>>>     , x = c(1:10)
>>>     , g = c(1,1,2,2,3,3,4,4,5,5)
>>>     )
>>> # note that cbind on vectors creates a matrix
>>> # in a matrix all columns must be of the same type
>>> # but data frames generally have a variety of types
>>> # so don't use cbind when making a data frame
>>>
>>> library( dplyr )
>>>
>>> result <- dat %>% group_by( g ) %>% do( answer = d_rule( . ) ) %>%
>>as.data.frame
>>>
>>>
>>---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>>Live...
>>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>Go...
>>>                                       Live:   OO#.. Dead: OO#..
>>Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>rocks...1k
>>>
>>---------------------------------------------------------------------------
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On March 4, 2015 2:02:06 PM PST, Typhenn Brichieri-Colombi via R-help
>><r-help at r-project.org> wrote:
>>>>Hello,
>>>>
>>>>I am trying to use the following custom function in an
>>>>aggregatefunction, but cannot get R to recognize my data. I’ve read
>>the
>>>>help on function()and on aggregate() but am unable to solve my
>>problem.
>>>>How can I get R torecognize the data inputs for the custom function
>>>>nested within aggregate()?
>>>>
>>>>My custom function is found below, as well as the errormessage I get
>>>>when I run it on a test data set (I will be using this functionon a
>>>>much larger dataset (over 600,000 rows))
>>>>
>>>>Thank you for your time and your help!
>>>>
>>>>
>>>>
>>>>d_rule<-function(a,x){
>>>>
>>>>i<-which(a==max(a))
>>>>
>>>>out<-ifelse(length(i)==1, x[i], min(x))
>>>>
>>>>return(out)
>>>>
>>>>}
>>>>
>>>>
>>>>
>>>>a<-c(2,2,1,4,2,5,2,3,4,4)
>>>>
>>>>x<-c(1:10)
>>>>
>>>>g<-c(1,1,2,2,3,3,4,4,5,5)
>>>>
>>>>dat<-as.data.frame(cbind(x,g))
>>>>
>>>>
>>>>
>>>>test<-aggregate(dat, by=list(g), FUN=d_rule,dat$a, dat$x)
>>>>
>>>>Error in dat$x : $ operator is invalid for atomic vectors
>>>>
>>>>
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide
>>>>http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list