[R] Tricky (?) conversion from data.frame to matrix where not all pairs exist

David Winsemius dwinsemius at comcast.net
Wed Jun 22 15:38:40 CEST 2011


On Jun 22, 2011, at 9:19 AM, Marius Hofert wrote:

> Hi,
>
> and what's the simplest way to obtain a *data.frame* with all years?
> The matching seems more difficult here because the years can/will  
> show up several times...
>
> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001),
>                 block=c("a","a","a","b","c","c"), value=1:6))
> (df. <- data.frame(year=rep(2000:2002, 3), block=rep(c("a", "b",  
> "c"), each=3), value=0))
> # how to fill in the given values?

These days I think most people would reach for melt() in either  
reshape or reshape2 packages:

 > require(reshape2)
Loading required package: reshape2
 > melt(xtb)
   year block value
1 2000     a     1
2 2001     a     2
3 2002     a     3
4 2000     b     0
5 2001     b     4
6 2002     b     0
7 2000     c     5
8 2001     c     6
9 2002     c     0

It seems to do a good job of guessing what you want whereas the  
reshape function in my hands is very failure prone (... yes, the  
failings are mine.)
-- 
David
>
> Cheers,
>
> Marius
>
>
> On 2011-06-22, at 14:40 , Dennis Murphy wrote:
>
>> I saw it as an xtabs object - I didn't think to check whether it was
>> also a matrix object. Thanks for the clarification, David.
>>
>> Dennis
>>
>> On Wed, Jun 22, 2011 at 4:59 AM, David Winsemius <dwinsemius at comcast.net 
>> > wrote:
>>>
>>> On Jun 21, 2011, at 6:51 PM, Dennis Murphy wrote:
>>>
>>>> Ahhh...you want a matrix. xtabs() doesn't easily allow coercion  
>>>> to a
>>>> matrix object, so try this instead:
>>>
>>> What am I missing? A contingency table already inherits from  
>>> matrix-class
>>> and if you insisted on coercion it  appears simple:
>>>
>>>> xtb <- xtabs(value ~ year + block, data = df)
>>>> is.matrix(xtb)
>>> [1] TRUE
>>>> as.matrix(xtb)
>>>     block
>>> year   a b c
>>> 2000 1 0 5
>>> 2001 2 4 6
>>> 2002 3 0 0
>>>
>>> --
>>> David.
>>>
>>>>
>>>> library(reshape)
>>>> as.matrix(cast(df, year ~ block, fill = 0))
>>>>   a b c
>>>> 2000 1 0 5
>>>> 2001 2 4 6
>>>> 2002 3 0 0
>>>>
>>>> Hopefully this is more helpful...
>>>> Dennis
>>>>
>>>> On Tue, Jun 21, 2011 at 3:35 PM, Dennis Murphy  
>>>> <djmuser at gmail.com> wrote:
>>>>>
>>>>> Hi:
>>>>>
>>>>> xtabs(value ~ year + block, data = df)
>>>>>    block
>>>>> year   a b c
>>>>> 2000 1 0 5
>>>>> 2001 2 4 6
>>>>> 2002 3 0 0
>>>>>
>>>>> HTH,
>>>>> Dennis
>>>>>
>>>>> On Tue, Jun 21, 2011 at 3:13 PM, Marius Hofert <m_hofert at web.de>  
>>>>> wrote:
>>>>>>
>>>>>> Dear expeRts,
>>>>>>
>>>>>> In the minimal example below, I have a data.frame containing  
>>>>>> three
>>>>>> "blocks" of years
>>>>>> (the years are subsets of 2000 to 2002). For each year and  
>>>>>> block a
>>>>>> certain "value" is given.
>>>>>> I would like to create a matrix that has row names given by all  
>>>>>> years
>>>>>> ("2000", "2001", "2002"),
>>>>>> and column names given by all blocks ("a", "b", "c"); the  
>>>>>> entries are
>>>>>> then given by the
>>>>>> corresponding value or zero if not year-block combination exists.
>>>>>>
>>>>>> What's a short way to achieve this?
>>>>>>
>>>>>> Of course one can setup a matrix and use for loops (see  
>>>>>> below)... but
>>>>>> that's not nice.
>>>>>> The problem is that the years are not running from 2000 to 2002  
>>>>>> for all
>>>>>> three "blocks"
>>>>>> (the second block only has year 2001, the third one has only  
>>>>>> 2000 and
>>>>>> 2001).
>>>>>> In principle, table() nicely solves such a problem (see below)  
>>>>>> and fills
>>>>>> in zeros.
>>>>>> This is what I would like in the end, but all non-zero entries  
>>>>>> should be
>>>>>> given by df$value,
>>>>>> not (as table() does) by their counts.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Marius
>>>>>>
>>>>>> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001),
>>>>>>                block=c("a","a","a","b","c","c"), value=1:6))
>>>>>> table(df[,1:2]) # complements the years and fills in 0
>>>>>>
>>>>>> year <- c(2000, 2001, 2002)
>>>>>> block <- c("a", "b", "c")
>>>>>> res <- matrix(0, nrow=3, ncol=3, dimnames=list(year, block))
>>>>>> for(i in 1:3){ # year
>>>>>>  for(j in 1:3){ # block
>>>>>>      for(k in 1:nrow(df)){
>>>>>>          if(df[k,"year"]==year[i] && df[k,"block"]==block[j])  
>>>>>> res[i,j]
>>>>>> <- df[k,"value"]
>>>>>>      }
>>>>>>  }
>>>>>> }
>>>>>> res # does the job; but seems complicated
>>>>
>>>
>>>
>>> David Winsemius, MD
>>> West Hartford, CT
>>>
>>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list