[R] Sort problem in merge()

Gregor Gorjanc gregor.gorjanc at bfro.uni-lj.si
Mon Mar 6 15:52:58 CET 2006


Gabor Grothendieck wrote:
> If you make the levels the same does that give what you want:
> 
> levs <- c(LETTERS[1:6], "0")
> tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0"), levs))
> tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F"), levs), col2 = 1:4)
> merge(tmp2, tmp1, all = TRUE, sort = FALSE)
> merge(tmp1, tmp2, all = TRUE, sort = FALSE)

Gabor thanks for this, but unfortunatelly the result is the same. I get
the following via both ways - note that I use all.x or all.y = TRUE.

> merge(tmp2, tmp1, all.x = TRUE, sort = FALSE)
  col1 col2
1    C    1
2    C    1
3    A   NA
4    A   NA
5    0   NA
6    0   NA

But I want this order as it is in tmp 1

  col1
1    A
2    A
3    C
4    C
5    0
6    0




>>Hello!
>>
>>I am merging two datasets and I have encountered a problem with sort.
>>Can someone please point me to my error. Here is the example.
>>
>>## I have dataframes, first one with factor and second one with factor
>>## and integer
>>
>>>tmp1 <- data.frame(col1 = factor(c("A", "A", "C", "C", "0", "0")))
>>>tmp2 <- data.frame(col1 = factor(c("C", "D", "E", "F")), col2 = 1:4)
>>>tmp1
>>
>> col1
>>1    A
>>2    A
>>3    C
>>4    C
>>5    0
>>6    0
>>
>>>tmp2
>>
>> col1 col2
>>1    C    1
>>2    D    2
>>3    E    3
>>4    F    4
>>
>>## Now merge them
>>
>>>(tmp12 <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>
>>                 all.x = TRUE, sort = FALSE))
>> col1 col2
>>1    C    1
>>2    C    1
>>3    A   NA
>>4    A   NA
>>5    0   NA
>>6    0   NA
>>
>>## As you can see, sort was applied, since row order is not the same as
>>## in tmp1. Reading help page for ?merge did not reveal much about
>>## sorting. However I did try to see the result of "non-default" -
>>## help page says that order should be the same as in 'y'. So above
>>## makes sense
>>
>>## Now merge - but change x an y
>>
>>>(tmp21 <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>
>>                 all.y = TRUE, sort = FALSE))
>> col1 col2
>>1    C    1
>>2    C    1
>>3    A   NA
>>4    A   NA
>>5    0   NA
>>6    0   NA
>>
>>## The result is the same. I am stumped here. But looking a bit at these
>>## object I found something peculiar
>>
>>
>>>str(tmp1)
>>
>>`data.frame':   6 obs. of  1 variable:
>> $ col1: Factor w/ 3 levels "0","A","C": 2 2 3 3 1 1
>>
>>>str(tmp2)
>>
>>`data.frame':   4 obs. of  2 variables:
>> $ col1: Factor w/ 4 levels "C","D","E","F": 1 2 3 4
>> $ col2: int  1 2 3 4
>>
>>>str(tmp12)
>>
>>`data.frame':   6 obs. of  2 variables:
>> $ col1: Factor w/ 3 levels "0","A","C": 3 3 2 2 1 1
>> $ col2: int  1 1 NA NA NA NA
>>
>>>str(tmp21)
>>
>>`data.frame':   6 obs. of  2 variables:
>> $ col1: Factor w/ 6 levels "C","D","E","F",..: 1 1 6 6 5 5
>> $ col2: int  1 1 NA NA NA NA
>>
>>## Is it OK, that internal presentation of factors vary between
>>## different merges. Levels are also different, once only levels
>>## from original data.frame are used, while in second example all
>>## levels are propagated.
>>
>>## I have tried the same with characters
>>
>>>tmp1$col1 <- as.character(tmp1$col1)
>>>tmp2$col1 <- as.character(tmp2$col1)
>>>(tmp12c <- merge(tmp1, tmp2, by.x = "col1", by.y = "col1",
>>
>>                 all.x = TRUE, sort = FALSE))
>> col1 col2
>>1    C    1
>>2    C    1
>>3    A   NA
>>4    A   NA
>>5    0   NA
>>6    0   NA
>>
>>
>>>(tmp21c <- merge(tmp2, tmp1, by.x = "col1", by.y = "col1",
>>
>>                 all.y = TRUE, sort = FALSE))
>> col1 col2
>>1    C    1
>>2    C    1
>>3    A   NA
>>4    A   NA
>>5    0   NA
>>6    0   NA
>>
>>## The same with characters. Is this a bug. It definitely does not agree
>>## with help page, since order is not the same as in 'y'. Can someone
>>## please check on newer versions?
>>
>>## Is there any other way to get the same order as in 'y' i.e. tmp1?
>>
>>
>>>R.version
>>
>>        _
>>platform i486-pc-linux-gnu
>>arch     i486
>>os       linux-gnu
>>system   i486, linux-gnu
>>status
>>major    2
>>minor    2.0
>>year     2005
>>month    10
>>day      06
>>svn rev  35749
>>language R
>>
>>Thank you very much!
>>
>>--
>>Lep pozdrav / With regards,
>>   Gregor Gorjanc
>>
>>----------------------------------------------------------------------
>>University of Ljubljana     PhD student
>>Biotechnical Faculty
>>Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
>>Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si
>>
>>SI-1230 Domzale             tel: +386 (0)1 72 17 861
>>Slovenia, Europe            fax: +386 (0)1 72 17 888
>>
>>----------------------------------------------------------------------
>>"One must learn by doing the thing; for though you think you know it,
>> you have no certainty until you try." Sophocles ~ 450 B.C.
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>


-- 
Lep pozdrav / With regards,
    Gregor Gorjanc

----------------------------------------------------------------------
University of Ljubljana     PhD student
Biotechnical Faculty
Zootechnical Department     URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3                   mail: gregor.gorjanc <at> bfro.uni-lj.si

SI-1230 Domzale             tel: +386 (0)1 72 17 861
Slovenia, Europe            fax: +386 (0)1 72 17 888

----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.
----------------------------------------------------------------------



More information about the R-help mailing list