[R] column selection for aggregate()

Gabor Grothendieck ggrothendieck at gmail.com
Mon Jan 18 19:30:43 CET 2010


It looks ok except you have both specified the wanted factors and
removed the undesired factors from the data frame.  You only need to
do one of these as in the example I gave, not both, so the solution
could be simpler.

On Mon, Jan 18, 2010 at 11:19 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:
> Hi!
>
> It looks like it works perfectly.
> However, since I cannot check whether I get the good result or not, can you
> please let me know if you see any mistakes?
>
> Here is the code:
> ssfamean <- summaryBy(.~SPECSHOR+BONE+TO_POS+FACETTE+SHEARFAC+ENA_BA, data =
> subset(ssfa, select = - c(MEASUREM, SEL_FACET, SEL_MEAS)), FUN=mean)
>
> That should give me the mean for all numerical variables grouped by
> SPECSHOR+BONE+TO_POS+FACETTE+SHEARFAC+ENA_BA (i.e. the mean of the rows with
> equal values for all these variables) on the data file ssfa without the
> columns for MEASUREM, SEL_FACET, SEL_MEAS, right?
>
> Sorry to ask such stupid question, but this line will give me the data I
> have to analyze, I cannot afford to make any mistake here (nowhere of
> course, but here I cannot really check).
>
> Thanks in advance
> Ivan
>
>
> Gabor Grothendieck a écrit :
>
> Try summaryBy in the doBy package. e.g. using the built-in CO2
> summarize each numeric variable by each factor except for the factors
> Plant and Type:
>
> library(doBy)
> summaryBy(. ~ ., data = subset(CO2, select = - c(Plant, Type)))
>
>
> On Mon, Jan 18, 2010 at 9:53 AM, Ivan Calandra
> <ivan.calandra at uni-hamburg.de> wrote:
>
>
> Hi everybody!
>
> I'm working on R today so I have a lot of questions (you may have
> noticed that it's the 3rd email today). I'm new on R, so please excuse
> the "spam"!
>
> I have a dataset "ssfa" with many rows and the column names are:
>  > names(ssfa)
>  [1] "SPECSHOR"  "BONE"      "TO_POS"    "MEASUREM"  "FACETTE"   "SHEARFAC"
>  [7] "ENA_BA"    "SEL_FACET" "SEL_MEAS"  "Asfc"      "Smc"       "epLsar"
> [13] "HAsfc4"    "HAsfc9"    "HAsfc16"   "HAsfc25"   "HAsfc36"   "HAsfc49"
> [19] "HAsfc64"   "HAsfc81"   "HAsfc100"  "HAsfc121"  "Tfv"       "Ftfv"
>
> I want to aggregate that way:
> ssfamean <- aggregate(ssfa[c("Asfc", "Smc", "epLsar", "HAsfc4",
> "HAsfc9", "HAsfc16", "HAsfc25", "HAsfc36", "HAsfc49", "HAsfc64",
> "HAsfc81", "HAsfc100", "HAsfc121", "Tfv", "Ftfv")], ssfa[c("SPECSHOR",
> "BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")], mean).
>
> As you can see, it is very long since I have many variables. Basically I
> want to select all numerical variables (10 to 24), and all categorical
> variables except MEASUREM, SEL_FACET and SEL_MEAS without having to
> write each of them. I would also like to avoid writing the names, the
> indexes would be nice.
> I tried with:
>  > ssfamean <- aggregate(ssfa[c(ssfa[[10]]:ssfa[[24]])],
> ssfa[c("SPECSHOR", "BONE", "TO_POS", "FACETTE", "SHEARFAC", "ENA_BA")],
> mean)
> but it obviously doesn't work (well "obviously"...)
>
> Could anyone help me on this?
> Thanks in advance
> Ivan
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>



More information about the R-help mailing list