[R] Why mean is not working in by?

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Wed Dec 9 00:18:54 CET 2015


Got it - thank you, everybody!
by splits it into data frames.
Lesson: use aggregate.

On Tue, Dec 8, 2015 at 6:17 PM, William Dunlap <wdunlap at tibco.com> wrote:
> by() calls FUN with a data.frame as the argument.  summary(), sum(), etc.
> have methods that work on data.frames but sd() and mean() do not.
>
> aggregate() calls its FUN with each column of a data.frame as the argument.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, Dec 8, 2015 at 3:08 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>>
>> Sorry, I omitted the first line:
>>
>> myvars <- c("Sepal.Length", "Sepal.Width")
>> by(data = iris[myvars], INDICES = iris["Species"], FUN = summary)
>> by(data = iris[myvars], INDICES = iris["Species"], FUN = sum)
>> by(data = iris[myvars], INDICES = iris["Species"], FUN = var)
>> by(data = iris[myvars], INDICES = iris["Species"], FUN = max)
>> by(data = iris[myvars], INDICES = iris["Species"], FUN = min)
>>
>> by(data = iris[myvars], INDICES = iris["Species"], FUN = sd)
>> by(data = iris[myvars], INDICES = iris["Species"], FUN = mean)
>>
>> The first lines are doing what I expected them to do: for each level
>> of the factor "Species" they gave me a summary, a sum, a variance, a
>> max, a min for each of the 2 variables in question (myvars).
>> I expected by to generate the sd and the mean for the 2 variables in
>> question for each level of "Species".
>>
>> On Tue, Dec 8, 2015 at 5:50 PM, Sarah Goslee <sarah.goslee at gmail.com>
>> wrote:
>> > Hi Dimitri,
>> >
>> > I changed this into a reproducible example (we don't know what myvars
>> > is). Assuming length(myvars) > 1, I'm not convinced that your first
>> > five lines "work" either: what do you expect?
>> >
>> > I get:
>> >
>> >> by(data = iris[, -5], INDICES = iris["Species"], FUN = min)
>> > Species: setosa
>> > [1] 0.1
>> > ------------------------------------------------------------------
>> > Species: versicolor
>> > [1] 1
>> > ------------------------------------------------------------------
>> > Species: virginica
>> > [1] 1.4
>> >
>> > But was expecting:
>> >
>> >> aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=min)
>> >      Species Sepal.Length Sepal.Width Petal.Length Petal.Width
>> > 1     setosa          4.3         2.3          1.0         0.1
>> > 2 versicolor          4.9         2.0          3.0         1.0
>> > 3  virginica          4.9         2.2          4.5         1.4
>> >
>> >
>> >
>> > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=sd)
>> > aggregate(iris[,-5], by=iris[,"Species", drop=FALSE], FUN=mean)
>> >
>> > provide the answers I would expect. If you want clearer advice, you
>> > need to provide an actually reproducible example, and tell us more
>> > about what you expect to get.
>> >
>> > Sarah
>> >
>> >
>> > On Tue, Dec 8, 2015 at 5:30 PM, Dimitri Liakhovitski
>> > <dimitri.liakhovitski at gmail.com> wrote:
>> >> Hello!
>> >> Could you please explain why the first 5 lines work but the last 2
>> >> lines don't?
>> >> Thank you!
>> >>
>> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = summary)
>> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sum)
>> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = var)
>> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = max)
>> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = min)
>> >>
>> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = sd)
>> >> by(data = iris[myvars], INDICES = iris["Species"], FUN = mean)
>> >>
>> >> --
>> >> Dimitri Liakhovitski
>> >>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Dimitri Liakhovitski



More information about the R-help mailing list