[R] (Nothing to do with) merge problem... extra lines appear in the presence of NAs

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat May 20 14:58:43 CEST 2006


I think you forgot to read over your own message before sending it: take a 
look at a1 which has FOUR rows with mdate == 2005-06-09.  Those correspond 
to rows to 9:12 in the result, as you are merging on 'mdate'.

You example is not reproducible, of course, since you used random values.
Perhaps you intended

a1[floor(runif(nacount)*count), "value"] <- NA


On Sat, 20 May 2006, Sean O'Riordain wrote:

> Good morning!

[Or afternoon in Europe, ....]

> I've searched the docs etc...  Am I doing something wrong or is this a bug?
>
> I'm doing a merge of two dataframes and getting extra rows in the
> resulting dataframe - the dataframes being merged might have NAs...
>
> count <- 10
> nacount <- 3
> a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
> names(a1) <- "mdate"
> a1$value <- runif(count)
> a1[floor(runif(nacount)*count),]$value <- NA
>
> a2 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
> names(a2) <- "mdate"
> a2$value2 <- runif(count)
> #a2[floor(runif(nacount)*count),]$value2 <- NA
>
>> a1
>        mdate     value
> 1  2005-06-09        NA
> 2  2005-06-02 0.5287683
> 3  2005-06-03 0.7563833
> 4  2005-06-09        NA
> 5  2005-06-05 0.1027646
> 6  2005-06-06 0.7775884
> 7  2005-06-07 0.2993592
> 8  2005-06-09        NA
> 9  2005-06-09 0.7434682
> 10 2005-06-10 0.2096477
>> a2
>        mdate    value2
> 1  2005-06-01 0.5347852
> 2  2005-06-02 0.9322765
> 3  2005-06-03 0.9106499
> 4  2005-06-04 0.6810564
> 5  2005-06-05 0.5871867
> 6  2005-06-06 0.8123808
> 7  2005-06-07 0.9675379
> 8  2005-06-08 0.9470369
> 9  2005-06-09 0.7493767
> 10 2005-06-10 0.8864103
>> atot <- merge(a1,a2,all=T)
>
> However, I find the following results to be quite un-intuitive - are
> they correct?  May I draw your attention to lines 9:12...  Should
> lines 9:11 be there?
>
>> atot
>        mdate     value    value2
> 1  2005-06-01        NA 0.5347852
> 2  2005-06-02 0.5287683 0.9322765
> 3  2005-06-03 0.7563833 0.9106499
> 4  2005-06-04        NA 0.6810564
> 5  2005-06-05 0.1027646 0.5871867
> 6  2005-06-06 0.7775884 0.8123808
> 7  2005-06-07 0.2993592 0.9675379
> 8  2005-06-08        NA 0.9470369
> 9  2005-06-09        NA 0.7493767
> 10 2005-06-09        NA 0.7493767
> 11 2005-06-09        NA 0.7493767
> 12 2005-06-09 0.7434682 0.7493767
> 13 2005-06-10 0.2096477 0.8864103
>
> Note with no NAs, it works perfectly and as expected...
>> a1 <- as.data.frame(as.Date("2005-06-01")+0:(count-1))
>> names(a1) <- "mdate"
>> a1$value <- runif(count)
>> #a1[floor(runif(nacount)*count),]$value <- NA
>>
>> atot <- merge(a1,a2,all=T)
>>
>> atot
>        mdate      value    value2
> 1  2005-06-01 0.35002519 0.5347852
> 2  2005-06-02 0.76318940 0.9322765
> 3  2005-06-03 0.32759570 0.9106499
> 4  2005-06-04 0.47218729 0.6810564
> 5  2005-06-05 0.74435374 0.5871867
> 6  2005-06-06 0.81415290 0.8123808
> 7  2005-06-07 0.04774783 0.9675379
> 8  2005-06-08 0.21799101 0.9470369
> 9  2005-06-09 0.99472758 0.7493767
> 10 2005-06-10 0.41974293 0.8864103
>
> R started in each case with --vanilla
>               _
> platform       i386-pc-mingw32
> arch           i386
> os             mingw32
> system         i386, mingw32
> status         Patched
> major          2
> minor          3.0
> year           2006
> month          05
> day            11
> svn rev        38037
> language       R
> version.string Version 2.3.0 Patched (2006-05-11 r38037)
>
> win-xp-pro sp2 - binary installs from CRAN
>
>
> it works in a similar way if I say
> atot <- merge(a1,a2,by.x="mdate",by.y="mdate",all=T)
> or even
> atot <- merge(a1,a2,by="mdate",all=T)
>
> also tested on versions 2.2.1, 2.3.0
>
> cheers,
> Sean O'Riordain
>
> (ps. ctrl-v paste wouldn't work on 2.4.0-dev downloaded this morning -
> didn't try very hard though)
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list