[R] Sort problem in merge()

Gabor Grothendieck ggrothendieck at gmail.com
Mon Mar 6 17:48:10 CET 2006


I think you will need to reorder it:

out <- merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE, sort = FALSE)
out[out$seq, -2]



On 3/6/06, Gregor Gorjanc <gregor.gorjanc at bfro.uni-lj.si> wrote:
> Gabor Grothendieck wrote:
> > If you make the levels the same does that give what you want:
> >
> > levs <- c(LETTERS[1:6], "0")
> > tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
> > tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
> > merge(tmp2, tmp1, all = TRUE, sort = FALSE)
> > merge(tmp1, tmp2, all = TRUE, sort = FALSE)
>
> Gabor thanks for this, but unfortunatelly the result is the same. I get
> the following via both ways - note that I use all.x or all.y = TRUE.
>
> > merge(tmp2, tmp1, all.x = TRUE, sort = FALSE)
>  col1 col2
> 1    C    1
> 2    C    1
> 3    A   NA
> 4    A   NA
> 5    0   NA
> 6    0   NA
>
> But I want this order as it is in tmp 1
>
>  col1
> 1    A
> 2    A
> 3    C
> 4    C
> 5    0
> 6    0
>
>
>
>
> >>Hello!
> >>
> >>I am merging two datasets and I have encountered a problem with sort.
> >>Can someone please point me to my error. Here is the example.
> >>
> >>## I have dataframes, first one with factor and second one with factor
> >>## and integer
> >>
> >>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0")))
> >>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F")), col2 = 1:4)
> >>>tmp1
> >>
> >> col1
> >>1    A
> >>2    A
> >>3    C
> >>4    C
> >>5    0
> >>6    0
> >>
> >>>tmp2
> >>
> >> col1 col2
> >>1    C    1
> >>2    D    2
> >>3    E    3
> >>4    F    4
> >>
> >>## Now merge them
> >>
> >>>(tmp12 <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
> >>
> >>                 all.x = TRUE, sort = FALSE))
> >> col1 col2
> >>1    C    1
> >>2    C    1
> >>3    A   NA
> >>4    A   NA
> >>5    0   NA
> >>6    0   NA
> >>
> >>## As you can see, sort was applied, since row order is not the same as
> >>## in tmp1. Reading help page for ?merge did not reveal much about
> >>## sorting. However I did try to see the result of "non-default" -
> >>## help page says that order should be the same as in 'y'. So above
> >>## makes sense
> >>
> >>## Now merge - but change x an y
> >>
> >>>(tmp21 <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
> >>
> >>                 all.y = TRUE, sort = FALSE))
> >> col1 col2
> >>1    C    1
> >>2    C    1
> >>3    A   NA
> >>4    A   NA
> >>5    0   NA
> >>6    0   NA
> >>
> >>## The result is the same. I am stumped here. But looking a bit at these
> >>## object I found something peculiar
> >>
> >>
> >>>str(tmp1)
> >>
> >>`data.frame':   6 obs. of  1 variable:
> >> $ col1: Factor w/ 3 levels "0","A","C": 2 2 3 3 1 1
> >>
> >>>str(tmp2)
> >>
> >>`data.frame':   4 obs. of  2 variables:
> >> $ col1: Factor w/ 4 levels "C","D","E","F": 1 2 3 4
> >> $ col2: int  1 2 3 4
> >>
> >>>str(tmp12)
> >>
> >>`data.frame':   6 obs. of  2 variables:
> >> $ col1: Factor w/ 3 levels "0","A","C": 3 3 2 2 1 1
> >> $ col2: int  1 1 NA NA NA NA
> >>
> >>>str(tmp21)
> >>
> >>`data.frame':   6 obs. of  2 variables:
> >> $ col1: Factor w/ 6 levels "C","D","E","F",..: 1 1 6 6 5 5
> >> $ col2: int  1 1 NA NA NA NA
> >>
> >>## Is it OK, that internal presentation of factors vary between
> >>## different merges. Levels are also different, once only levels
> >>## from original data.frame are used, while in second example all
> >>## levels are propagated.
> >>
> >>## I have tried the same with characters
> >>
> >>>tmp1$col1 <- as.character(tmp1$col1)
> >>>tmp2$col1 <- as.character(tmp2$col1)
> >>>(tmp12c <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
> >>
> >>                 all.x = TRUE, sort = FALSE))
> >> col1 col2
> >>1    C    1
> >>2    C    1
> >>3    A   NA
> >>4    A   NA
> >>5    0   NA
> >>6    0   NA
> >>
> >>
> >>>(tmp21c <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
> >>
> >>                 all.y = TRUE, sort = FALSE))
> >> col1 col2
> >>1    C    1
> >>2    C    1
> >>3    A   NA
> >>4    A   NA
> >>5    0   NA
> >>6    0   NA
> >>
> >>## The same with characters. Is this a bug. It definitely does not agree
> >>## with help page, since order is not the same as in 'y'. Can someone
> >>## please check on newer versions?
> >>
> >>## Is there any other way to get the same order as in 'y' i.e. tmp1?
> >>
> >>
> >>>R.version
> >>
> >>        _
> >>platform i486-pc-linux-gnu
> >>arch     i486
> >>os       linux-gnu
> >>system   i486, linux-gnu
> >>status
> >>major    2
> >>minor    2.0
> >>year     2005
> >>month    10
> >>day      06
> >>svn rev  35749
> >>language R
> >>
> >>Thank you very much!
> >>
> >>--
> >>Lep pozdrav / With regards,
> >>   Gregor Gorjanc
> >>
> >>----------------------------------------------------------------------
> >>University of Ljubljana     PhD student
> >>Biotechnical Faculty
> >>Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
> >>Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si
> >>
> >>SI-1230 Domzale             tel: +386 (0)1 72 17 861
> >>Slovenia, Europe            fax: +386 (0)1 72 17 888
> >>
> >>----------------------------------------------------------------------
> >>"One must learn by doing the thing; for though you think you know it,
> >> you have no certainty until you try." Sophocles ~ 450 B.C.
> >>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >>
>
>
> --
> Lep pozdrav / With regards,
>    Gregor Gorjanc
>
> ----------------------------------------------------------------------
> University of Ljubljana     PhD student
> Biotechnical Faculty
> Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
> Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si
>
> SI-1230 Domzale             tel: +386 (0)1 72 17 861
> Slovenia, Europe            fax: +386 (0)1 72 17 888
>
> ----------------------------------------------------------------------
> "One must learn by doing the thing; for though you think you know it,
>  you have no certainty until you try." Sophocles ~ 450 B.C.
> ----------------------------------------------------------------------
>




More information about the R-help mailing list