[R] fill 0-row data.frame with 1 line of NAs

Thu Jul 12 00:37:52 CEST 2012

In that case, I think that using a subscript of NA is the
best way to go.  It works for both matrices and data.frames
(unlike an integer larger than nrow(data)) and its meaning
is pretty clear.

Also, you will probably get better results if the function
in your call to apply() returns the index (perhaps NA) of a row
of a data.frame instead of the row itself.  Then subscript that data.frame
once with the output of apply rather than subscripting it many
times and rbinding the results back together.  This is natural
if you use match(), as it returns NA for no match (merge() does
this sort of thing).

Here is an example of this sort of thing when using a non-standard
sort of match.  The following matches a long/lat pair to that of the
nearest city in the table, but returns NA if the point is too far from
any city:

nearestTo <- function (x, table, limit = 1) 
{
    stopifnot(all(is.element(c("long", "lat"), names(x))), all(is.element(c("long", 
        "lat"), names(table))))
    dists <- sqrt((x["lat"] - table[, "lat"])^2 + (x["long"] - 
        table[, "long"])^2)
    retval <- which.min(dists)
    if (dists[retval] > limit) {
        retval <- NA_integer_
    }
    retval
}

cities <- data.frame(
     long = c(-117.833, -116.217, -123.083, -123.9, -121.733, 
        -117.033, -122.683, -122.333, -117.433),
     lat = c(44.7833, 43.6, 44.05, 46.9833, 42.1667, 
        46.4, 45.5167, 47.6167, 47.6667),
     row.names = c("Baker", "Boise", "Eugene", "Hoquiam", 
        "Klamath Falls", "Lewiston", "Portland", 
        "Seattle", "Spokane")
)

df <- data.frame(
     long = c(-116.77, -123.68, -122.96, -120.81, -116.26, 
        -123.54, -121.22, -115.12),
     lat = c(47.3, 44.53, 44.35, 45.99, 46.75, 43.78, 
        42.71, 46.66))

whichCity <- apply(df, 1, nearestTo, cities, limit=1)
whichCity
# [1]  9  3  3 NA  6  3  5 NA
cbind(df, nearbyCity = rownames(cities)[whichCity])
#      long   lat    nearbyCity
# 1 -116.77 47.30       Spokane
# 2 -123.68 44.53        Eugene
# 3 -122.96 44.35        Eugene
# 4 -120.81 45.99          <NA>
# 5 -116.26 46.75      Lewiston
# 6 -123.54 43.78        Eugene
# 7 -121.22 42.71 Klamath Falls
# 8 -115.12 46.66          <NA>

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: Liviu Andronic [mailto:landronimirc at gmail.com]
> Sent: Wednesday, July 11, 2012 2:19 PM
> To: William Dunlap
> Cc: arun; R help
> Subject: Re: [R] fill 0-row data.frame with 1 line of NAs
> 
> On Wed, Jul 11, 2012 at 9:56 PM, William Dunlap <wdunlap at tibco.com> wrote:
> > Why does one want to replace a zero-row data.frame
> > with a one-row data.frame of NA's?  Unless this is for
> > an external program that cannot handle zero-row inputs,
> > this suggests that there is an unnecessary limitation (i.e.,
> > a bug) in the R code that uses this data.frame.
> >
> I'm running an apply(df, 1, f) function, where f() matches a df$string
> in another matrix and fetches data associated with this string. When
> no match is made I do not need a zero-row data frame, but to preserve
> the structure of the original df I need a data frame with 1 row of
> NAs. There may be a nicer approach, but I'm not aware of any.
> 
> Regards
> Liviu