[R] maintaining variable types in data frames

jim holtman jholtman at gmail.com
Fri Jan 23 04:59:42 CET 2009


How about this:

> Y <- as.data.frame(matrix(c("c","d",NA,4),2,2), stringsAsFactors=FALSE)
> X <- as.data.frame(matrix(c("a","b",1,2),2,2), stringsAsFactors=FALSE)
> Y
  V1   V2
1  c <NA>
2  d    4
> X
  V1 V2
1  a  1
2  b  2
> Y[] <- lapply(seq(ncol(Y)), function(.col){
+     ifelse(is.na(Y[,.col]), X[,.col], Y[,.col])
+ })
>
> Y
  V1 V2
1  c  1
2  d  4
>


On Thu, Jan 22, 2009 at 10:44 PM, Mike Miller <mbmiller at taxa.epi.umn.edu> wrote:
> On Thu, 22 Jan 2009, Mike Miller wrote:
>
>> Suppose X and Y are two data frames with the same structures, variable
>> names and dimensions but with different data and different patterns of
>> missing.  I want to replace missing values in Y with corresponding values
>> from X.  I'll construct a simple two-by-two case:
>>
>>> X <- as.data.frame(matrix(c("a","b",1,2),2,2), stringsAsFactors=FALSE)
>>> X[,2] <- as.integer(X[,2])
>>> str(X)
>>
>> 'data.frame':   2 obs. of  2 variables:
>>  $ V1: chr  "a" "b"
>>  $ V2: int  1 2
>>
>>> Y <- as.data.frame(matrix(c("c","d",NA,4),2,2), stringsAsFactors=FALSE)
>>> Y[,2] <- as.integer(Y[,2])
>>> str(Y)
>>
>> 'data.frame':   2 obs. of  2 variables:
>>  $ V1: chr  "c" "d"
>>  $ V2: int  NA 4
>>
>> This seems to be what I want to do...
>>
>>> Y[is.na(Y)] <- X[is.na(Y)]
>>
>> ...and it works except that the structure of Y is changed so that Y$V2 is
>> now of type chr instead of type int:
>>
>>> str(Y)
>>
>> 'data.frame':   2 obs. of  2 variables:
>>  $ V1: chr  "c" "d"
>>  $ V2: chr  "1" "4"
>
>
> I figured out a good answer.  We can just decide the list of columns we want
> to work with and then use a for loop.  This avoids problems with changing
> variable types:
>
> cols <- 38:47
> keep <- is.na(Y)
> for (i in cols) { nas <- which(keep[,i]); if ( length(nas) > 0 ) { Y[nas,i]
> <- X[nas,i] }}
>
> Something like that makes for a good one-liner on the interactive command
> line, but this looks neater in a script:
>
> cols <- 38:47
> keep <- is.na(Y)
> for (i in cols) {
>    nas <- which(keep[,i])
>    if ( length(nas) > 0 ) {
>       Y[nas,i] <- X[nas,i]
>     }
>  }
>
> It shouldn't be too hard to write a function that does that kind of thing.
>
> The only problem I know of is that if X and Y don't have exactly the same
> levels for factors, if there are factors, there could be problems.  It would
> probably take a few more lines to deal with this
>
> A couple of people wrote to me with helpful suggestions, but no one had a
> really great, established kind of solution.  I'm a little surprised.  But,
> with an average of 125 messages per day (!) on this list, I shouldn't be
> surprised that a long message like this one won't be read by everyone.
>
> Best,
> Mike
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list