[R] Sort problem in merge()

Gregor Gorjanc gregor.gorjanc at gmail.com
Mon Mar 6 21:40:16 CET 2006


Gabor and Jean thank you for your time and answers. Gabors approach does
not do what I want (with or without sort). Gabor note that when we merge
data.frame, this new data.frame gets new row.names and we can not be
consistent with sort.

> out <- merge(cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE)
  col1 seq col2
1    0   5   NA
2    0   6   NA
3    A   1   NA
4    A   2   NA
5    C   3    1
6    C   4    1

> out[out$seq, -2]
  col1 col2
5    C    1
6    C    1
1    0   NA
2    0   NA
3    A   NA
4    A   NA

> out <- merge(cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE,
               sort = TRUE)
  col1 seq col2
1    0   5   NA
2    0   6   NA
3    A   1   NA
4    A   2   NA
5    C   3    1
6    C   4    1

> out[out$seq, -2]
  col1 col2
5    C    1
6    C    1
1    0   NA
2    0   NA
3    A   NA
4    A   NA

But I want to get out

A NA
A NA
C 1
C 1
0 NA
0 NA

i.e. with the same order as in tmp1. I really need the same order, since
I will cbind this data frame to another one and I need to keep the order
intact.

I am quite confident that this points to a bug in merge code or at least
in merge documentation. NA's seem to introduce problems as showed by
Jean. Can someone (R core) also confirm this?

Gabor Grothendieck wrote:
> Actually we don't need sort = FALSE if we are reordering it anyways:
> 
> out <- merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE)
> out[out$seq, -2]
> 
> On 3/6/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> 
>>I think you will need to reorder it:
>>
>>out <- merge( cbind(tmp1, seq = 1:nrow(tmp1)), tmp2, all.x = TRUE, sort = FALSE)
>>out[out$seq, -2]
>>
>>
>>
>>On 3/6/06, Gregor Gorjanc <gregor.gorjanc at bfro.uni-lj.si> wrote:
>>
>>>Gabor Grothendieck wrote:
>>>
>>>>If you make the levels the same does that give what you want:
>>>>
>>>>levs <- c(LETTERS[1:6], "0")
>>>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
>>>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
>>>>merge(tmp2, tmp1, all = TRUE, sort = FALSE)
>>>>merge(tmp1, tmp2, all = TRUE, sort = FALSE)
>>>
>>>Gabor thanks for this, but unfortunatelly the result is the same. I get
>>>the following via both ways - note that I use all.x or all.y = TRUE.
>>>
>>>
>>>>merge(tmp2, tmp1, all.x = TRUE, sort = FALSE)
>>>
>>> col1 col2
>>>1    C    1
>>>2    C    1
>>>3    A   NA
>>>4    A   NA
>>>5    0   NA
>>>6    0   NA
>>>
>>>But I want this order as it is in tmp 1
>>>
>>> col1
>>>1    A
>>>2    A
>>>3    C
>>>4    C
>>>5    0
>>>6    0
>>>
>>>
>>>
>>>
>>>
>>>>>Hello!
>>>>>
>>>>>I am merging two datasets and I have encountered a problem with sort.
>>>>>Can someone please point me to my error. Here is the example.
>>>>>
>>>>>## I have dataframes, first one with factor and second one with factor
>>>>>## and integer
>>>>>
>>>>>
>>>>>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0")))
>>>>>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F")), col2 = 1:4)
>>>>>>tmp1
>>>>>
>>>>>col1
>>>>>1    A
>>>>>2    A
>>>>>3    C
>>>>>4    C
>>>>>5    0
>>>>>6    0
>>>>>
>>>>>
>>>>>>tmp2
>>>>>
>>>>>col1 col2
>>>>>1    C    1
>>>>>2    D    2
>>>>>3    E    3
>>>>>4    F    4
>>>>>
>>>>>## Now merge them
>>>>>
>>>>>
>>>>>>(tmp12 <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>>>>
>>>>>                all.x = TRUE, sort = FALSE))
>>>>>col1 col2
>>>>>1    C    1
>>>>>2    C    1
>>>>>3    A   NA
>>>>>4    A   NA
>>>>>5    0   NA
>>>>>6    0   NA
>>>>>
>>>>>## As you can see, sort was applied, since row order is not the same as
>>>>>## in tmp1. Reading help page for ?merge did not reveal much about
>>>>>## sorting. However I did try to see the result of "non-default" -
>>>>>## help page says that order should be the same as in 'y'. So above
>>>>>## makes sense
>>>>>
>>>>>## Now merge - but change x an y
>>>>>
>>>>>
>>>>>>(tmp21 <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>>>>
>>>>>                all.y = TRUE, sort = FALSE))
>>>>>col1 col2
>>>>>1    C    1
>>>>>2    C    1
>>>>>3    A   NA
>>>>>4    A   NA
>>>>>5    0   NA
>>>>>6    0   NA
>>>>>
>>>>>## The result is the same. I am stumped here. But looking a bit at these
>>>>>## object I found something peculiar
>>>>>
>>>>>
>>>>>
>>>>>>str(tmp1)
>>>>>
>>>>>`data.frame':   6 obs. of  1 variable:
>>>>>$ col1: Factor w/ 3 levels "0","A","C": 2 2 3 3 1 1
>>>>>
>>>>>
>>>>>>str(tmp2)
>>>>>
>>>>>`data.frame':   4 obs. of  2 variables:
>>>>>$ col1: Factor w/ 4 levels "C","D","E","F": 1 2 3 4
>>>>>$ col2: int  1 2 3 4
>>>>>
>>>>>
>>>>>>str(tmp12)
>>>>>
>>>>>`data.frame':   6 obs. of  2 variables:
>>>>>$ col1: Factor w/ 3 levels "0","A","C": 3 3 2 2 1 1
>>>>>$ col2: int  1 1 NA NA NA NA
>>>>>
>>>>>
>>>>>>str(tmp21)
>>>>>
>>>>>`data.frame':   6 obs. of  2 variables:
>>>>>$ col1: Factor w/ 6 levels "C","D","E","F",..: 1 1 6 6 5 5
>>>>>$ col2: int  1 1 NA NA NA NA
>>>>>
>>>>>## Is it OK, that internal presentation of factors vary between
>>>>>## different merges. Levels are also different, once only levels
>>>>>## from original data.frame are used, while in second example all
>>>>>## levels are propagated.
>>>>>
>>>>>## I have tried the same with characters
>>>>>
>>>>>
>>>>>>tmp1$col1 <- as.character(tmp1$col1)
>>>>>>tmp2$col1 <- as.character(tmp2$col1)
>>>>>>(tmp12c <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>>>>
>>>>>                all.x = TRUE, sort = FALSE))
>>>>>col1 col2
>>>>>1    C    1
>>>>>2    C    1
>>>>>3    A   NA
>>>>>4    A   NA
>>>>>5    0   NA
>>>>>6    0   NA
>>>>>
>>>>>
>>>>>
>>>>>>(tmp21c <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>>>>
>>>>>                all.y = TRUE, sort = FALSE))
>>>>>col1 col2
>>>>>1    C    1
>>>>>2    C    1
>>>>>3    A   NA
>>>>>4    A   NA
>>>>>5    0   NA
>>>>>6    0   NA
>>>>>
>>>>>## The same with characters. Is this a bug. It definitely does not agree
>>>>>## with help page, since order is not the same as in 'y'. Can someone
>>>>>## please check on newer versions?
>>>>>
>>>>>## Is there any other way to get the same order as in 'y' i.e. tmp1?
>>>>>
>>>>>
>>>>>
>>>>>>R.version
>>>>>
>>>>>       _
>>>>>platform i486-pc-linux-gnu
>>>>>arch     i486
>>>>>os       linux-gnu
>>>>>system   i486, linux-gnu
>>>>>status
>>>>>major    2
>>>>>minor    2.0
>>>>>year     2005
>>>>>month    10
>>>>>day      06
>>>>>svn rev  35749
>>>>>language R
>>>>>
>>>>>Thank you very much!
>>>>>
>>>>>--
>>>>>Lep pozdrav / With regards,
>>>>>  Gregor Gorjanc
>>>>>
>>>>>----------------------------------------------------------------------
>>>>>University of Ljubljana     PhD student
>>>>>Biotechnical Faculty
>>>>>Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
>>>>>Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si
>>>>>
>>>>>SI-1230 Domzale             tel: +386 (0)1 72 17 861
>>>>>Slovenia, Europe            fax: +386 (0)1 72 17 888
>>>>>
>>>>>----------------------------------------------------------------------
>>>>>"One must learn by doing the thing; for though you think you know it,
>>>>>you have no certainty until you try." Sophocles ~ 450 B.C.
>>>>>
>>>>>______________________________________________
>>>>>R-help at stat.math.ethz.ch mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>>>>
>>>
>>>
>>>--
>>>Lep pozdrav / With regards,
>>>   Gregor Gorjanc
>>>
>>>----------------------------------------------------------------------
>>>University of Ljubljana     PhD student
>>>Biotechnical Faculty
>>>Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
>>>Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si
>>>
>>>SI-1230 Domzale             tel: +386 (0)1 72 17 861
>>>Slovenia, Europe            fax: +386 (0)1 72 17 888
>>>
>>>----------------------------------------------------------------------
>>>"One must learn by doing the thing; for though you think you know it,
>>> you have no certainty until you try." Sophocles ~ 450 B.C.
>>>----------------------------------------------------------------------
>>>
>>


-- 
Lep pozdrav / With regards,
    Gregor Gorjanc

----------------------------------------------------------------------
University of Ljubljana     PhD student
Biotechnical Faculty
Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si

SI-1230 Domzale             tel: +386 (0)1 72 17 861
Slovenia, Europe            fax: +386 (0)1 72 17 888

----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.




More information about the R-help mailing list