[R] Wrong output due to what I think might be a data type issue (zoo read in problem)

Gabor Grothendieck ggrothendieck at gmail.com
Tue Mar 20 14:02:36 CET 2012


On Tue, Mar 20, 2012 at 1:24 AM, knavero <knavero at gmail.com> wrote:
> found a temporary fix (I'm sure it's redundant and not as elegant, but here
> it is):
>
> require(zoo)
> require(chron)
> setwd("/home/knavero/Desktop/")
>
> fmt = "%m/%d/%Y %H:%M"
> tail1 = function(x) tail(x, 1)
> rawData = read.zoo("weatherData.txt", header = T, FUN = as.chron,
>   format = fmt, sep = "\t", aggregate = tail1)
>   #colClasses = c(NA, "matrix"))
>
> rawData = zoo(cbind(temp = as.vector(rawData)), time(rawData))
>
> oneMin = seq(start(rawData), end(rawData), by = times("01:00:00"))
> intData = na.approx(rawData, xout = oneMin)
>
> par(mfrow = c(3, 1), oma = c(0, 0, 2, 0), mar = c(2, 4, 1, 1))
>
> plot(rawData, type = "p", ylim = c(0, 100))
> grid(col = "darkgrey")
>
> plot(intData, type = "p", ylim = c(0, 100))
> grid(col = "darkgrey")
>
> Silly coding huh? It works though....the plots were just to double check
> btw...nothing significant obviously
>

If you specify the column classes a better error message can be produced:

> weatherData.txt <- "http://r.789695.n4.nabble.com/file/n4487682/weatherData.txt"
> rawData = read.zoo(weatherData.txt, header = T, FUN = as.chron,
+    format = fmt, sep = "\t", aggregate = tail1, colClasses = c(NA, "numeric"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a real', got 'M'

from which we see that there is an M in the second column.  Using a
text editor we can fix it up or we could specify that M is a comment
character (better make sure there are no M's in the header though) in
which case we will get an NA in that position:

> rawData <- read.zoo(weatherData.txt, header = T, FUN = as.chron,
+     format = fmt, sep = "\t", aggregate = tail1, comment = "M")
> rawData[9553]
(01/03/12 10:53:00)
                 NA

We could use na.omit(rawData) to eliminate it.

Another approach to finding it is:

> L <- read.table(weatherData.txt, colClasses = "character", header = TRUE, sep = "\t")
> ix <- is.na(as.numeric(L[[2]])); which(ix); L[ix, 2]
Warning message:
NAs introduced by coercion
[1] 9553
[1] "M"


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list