[R] Weird behavior of aggregate() function

Bastien.Ferland-Raymond at mffp.gouv.qc.ca Bastien.Ferland-Raymond at mffp.gouv.qc.ca
Mon Jan 26 19:52:41 CET 2015


Thanks Ista for youe help, it works and I understand why.

However, I'm still confuse why the previous code lost the "factor key".  It could just have converted to factors and output factors but instead it's outputing integer...

I'm not a very big fan of the default stringAsFactors=T, but that's another debate.

Anyway, thanks again,

Bastien 

-----Message d'origine-----
De : Ista Zahn [mailto:istazahn at gmail.com] 
Envoyé : 26 janvier 2015 11:51
À : Ferland-Raymond, Bastien (DIF)
Cc : r-help at r-project.org
Objet : Re: [R] Weird behavior of aggregate() function

?aggregate informs you that unless x is a time series it will be converted to a data.frame. data.frame will convert your character to a factor unless you tell it not to.

You can prevent this by converting vari to a data.frame yourself, passing the stringsAsFactors argument, like this:

aggregate(data.frame(TE = vari, stringsAsFactors = FALSE),
by=list(gr),faire.paires)

Best,
Ista

On Mon, Jan 26, 2015 at 11:30 AM,
<Bastien.Ferland-Raymond at mffp.gouv.qc.ca> wrote:
>
> Hello list,
>
> I have found a weird behavior of the aggregate() function when used with characters. I think the problem as to do with converting characters to factors.
>
> I'm trying to aggregate a character vector using an homemade function.  My function is giving me all the possible pairs of modalities observed.
>
>
> Reproducible code:
>
> #######
> ### my grouping variable
> gr <- c("A","A","B","B","C","C","C","D","D","E","E","E")
> ### my variable
> vari <- 
> c("rs2","rs2","mj2","mj1","rs1","rs1","rs2","mj1","mj1","rs1","mj1","m
> j2")
>
> ### what the table would look like
> cbind(gr,vari)
>
> ###  My function that gives every pairs of variables possible (my real 
> function can go up to length(TE)==5, but for the sake of the example, 
> I've reduced it here) faire.paires <- function(TE){ gg <- rbind(c(TE[1],TE[2]),
>             c(TE[1],TE[3]))
> gg <- gg[rowSums(is.na(gg))==0,,drop=F] gg }
>
> ###  The function gives exactly what I want when I run it on a 
> specific entry faire.paires(TE = vari[gr=="B"])
>
> ###  But with aggregate(), it transforms everything into integer res 
> <- aggregate(list(TE = vari), by=list(gr),faire.paires) res
> str(res)
>
> ###  it's like it's using factor than losing the key to tell me which 
> integer ###  mean which modality
>
>
> ###  if I give it directly factors:
> res2 <- aggregate(list(TE = as.factor(vari)), 
> by=list(gr),faire.paires)
> res2
> str(res2)
>
> ###  does not fix the problem.
> ############
>
> Any idea?
>
> I know my function may not be the best or most efficient way to 
> succeed. However, I'm still puzzled on why aggregate gives me this weird output.
>
> Best regards,
>
> Bastien Ferland-Raymond, M.Sc. Stat., M.Sc. Biol.
> Division des orientations et projets spéciaux Direction des 
> inventaires forestiers Ministère des Forêts, de la Faune et des Parcs
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list