[R] Trying to merge new data set to bottom of old data set. Both are zoo objects.

Gabor Grothendieck ggrothendieck at gmail.com
Wed Apr 4 15:05:41 CEST 2012


On Wed, Apr 4, 2012 at 1:47 AM, knavero <knavero at gmail.com> wrote:
> Here's a case where it doesn't work. Again, the problem is that when I use
> the rbind or concatenate functions, the 2012 data set seems to go ahead of
> the 2010 and 2011 portions of the data set. The problem seems dependent on
> the text files I read in:
>
> http://r.789695.n4.nabble.com/file/n4531011/old.txt old.txt
>
> http://r.789695.n4.nabble.com/file/n4531011/new.txt new.txt
>
> using this code:
>
> http://pastebin.com/8W6KaaPQ
>
> In a case where it works, and the data seemed to be in the right order, I
> read in a different old.txt named old1.txt and somehow it seemed to work.
> The contents and format were similar to that of new.txt where there was 18
> columns with the same headers. Here are the files to use:
>
> http://r.789695.n4.nabble.com/file/n4531011/old1.txt old1.txt
>
> http://r.789695.n4.nabble.com/file/n4531011/new.txt new.txt
>
> using this code:
>
> http://pastebin.com/6iNF5bPd
>
> That should clarify the issue I'm having. Let me know if a dput is necessary
> here. However all the vectors and vector modes seem to check out okay.
>

The problem is that the dates in the new file are of the form 2/23/12
but they are being read in using "%m/%d/%Y %H:%M" .  The %Y should be
%y.  For the old file the format is correct.

A few other points:

- it would be better to use library() than require() here.  If there
is some problem and it can't load the package then library will fail
with an error right at that point -- this is what we want in order to
best reveal where the problem is but with require() it will simply
return FALSE and keep processing and then the error will be later in
the code which is not as convenient for figuring out what went wrong.
Alternately you can use stopifnot(require(...whatever...)).

- please try to cut your data down as far as feasible.  If each file
had 3 lines, say, the same error would have been revealed and it would
have been easier to manage.  Also it would have been possible to
remove all the columns not used and still illustrate this error.  The
very process of reducing it to the smallest dataset you can often
reveal the error.

- if you must post in this fashion then note that read.zoo uses
read.table which can read directly off the net:

new.txt <- "http://r.789695.n4.nabble.com/file/n4531011/new.txt"
new <- read.zoo(new.txt, ...whatever...)

- its better to write out TRUE and FALSE since F and T can be ordinary
variables that a program can create but TRUE and FALSE are keywords so
they can't be overwritten.

- you may or may not prefer this style but it would be possible to replace this:

cls <- c("NULL", NA, "numeric",
      "NULL", "NULL",
      "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL",
      "NULL", "NULL", "NULL", "NULL", "NULL", "NULL")

with this:

cls <- rep(c("NULL", NA, "numeric", "NULL"), c(1, 1, 1, 15))

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list