[R] Correcting for missing data combinations

Gray Calhoun gray.calhoun at gmail.com
Sat Dec 12 00:32:50 CET 2009


This is nice; the matching could be shortened by using merge:

### quoted from the previous message
> #generate data - two factors - 4 levels in factor1, 26 levels in factor2
> df <- data.frame(factor1 = sample(LETTERS[1:4], 100, replace=T),
>    factor2 = sample(letters, 100, replace=T), value = runif(100))
>
> #generate possible combinations
> poss.comb <- expand.grid(factor1 = LETTERS[1:4], factor2 = letters)
>

## to merge the two dataframes:

adf <- merge(poss.comb, df, all.x = TRUE)
adf$value[is.na(all.df$value)] <- 0

Though you may want to leave these values as "NA".  Using

expand.grid(factor1 = unique(factor1), factor2 = unique(factor2))

could also help

--Gray

On Fri, Dec 11, 2009 at 3:42 PM, Greg Hirson <ghirson at ucdavis.edu> wrote:
> One approach would be to use expand.grid to generate all combinations and
> then match against what you have.
>
> A short example:
>
> #generate data - two factors - 4 levels in factor1, 26 levels in factor2
> df <- data.frame(factor1 = sample(LETTERS[1:4], 100, replace=T),
>    factor2 = sample(letters, 100, replace=T), value = runif(100))
>
> #generate possible combinations
> poss.comb <- expand.grid(factor1 = LETTERS[1:4], factor2 = letters)
>
> #find matches
> present <- paste(poss.comb$factor1, poss.comb$factor2 %in% paste(df$factor1,
> df$factor2)
>
> #find possible combinations not in the data
> poss.comb[!present, ]
>
> #add 0 as value
> zerodata <- cbind(poss.comb[!present, ], value=0)
>
> #and append to data
> rbind(df, zerodata)
>
> In place of letters and LETTERS, you could use unique(Factor1) and
> unique(Factor2) from your own data in creating the poss.comb list.
>
> HTH,
>
> Greg
>
>
> On 12/11/09 10:19 AM, GL wrote:
>>
>> I can think of many brute-force ways to do this outside of R, but was
>> wondering if there was a simple/elegant solution within R instead.
>>
>> I have a table that looks something like the following:
>>
>> Factor1 Factor2         Value
>> A       11/11/2009      5
>> A       11/12/2009      4
>> B       11/11/2009      7
>> B       11/13/2009      8
>>
>> > From that I need to generate all permutations of Factor1 and Factor2 and
>> force a 0 for any combination that doesn’t exist in the actual data table.
>> By way of example, I’d like the output for above to end up as:
>>
>>  Factor1       Factor2         Value
>> A       11/11/2009      5
>> A       11/12/2009      4
>> A       11/13/2009      0
>> B       11/11/2009      7
>> B       11/12/2009      0
>> B       11/13/2009      8
>>
>> Truly appreciate any thoughts.
>>
>>
>
> --
> Greg Hirson
> ghirson at ucdavis.edu
>
> Graduate Student
> Agricultural and Environmental Chemistry
>
> 1106 Robert Mondavi Institute North
> One Shields Avenue
> Davis, CA 95616
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Gray Calhoun

Assistant Professor of Economics
Iowa State University




More information about the R-help mailing list