[R] Tricky (?) conversion from data.frame to matrix where not all pairs exist

Wed Jun 22 15:46:18 CEST 2011

Hi David,

thanks for the quick response. That's nice. Is there also a way without loading an additional package? I'd prefer loading less packages if possible.

Cheers,

Marius

On 2011-06-22, at 15:38 , David Winsemius wrote:

> 
> On Jun 22, 2011, at 9:19 AM, Marius Hofert wrote:
> 
>> Hi,
>> 
>> and what's the simplest way to obtain a *data.frame* with all years?
>> The matching seems more difficult here because the years can/will show up several times...
>> 
>> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001),
>>                block=c("a","a","a","b","c","c"), value=1:6))
>> (df. <- data.frame(year=rep(2000:2002, 3), block=rep(c("a", "b", "c"), each=3), value=0))
>> # how to fill in the given values?
> 
> These days I think most people would reach for melt() in either reshape or reshape2 packages:
> 
> > require(reshape2)
> Loading required package: reshape2
> > melt(xtb)
>  year block value
> 1 2000     a     1
> 2 2001     a     2
> 3 2002     a     3
> 4 2000     b     0
> 5 2001     b     4
> 6 2002     b     0
> 7 2000     c     5
> 8 2001     c     6
> 9 2002     c     0
> 
> It seems to do a good job of guessing what you want whereas the reshape function in my hands is very failure prone (... yes, the failings are mine.)
> -- 
> David
>> 
>> Cheers,
>> 
>> Marius
>> 
>> 
>> On 2011-06-22, at 14:40 , Dennis Murphy wrote:
>> 
>>> I saw it as an xtabs object - I didn't think to check whether it was
>>> also a matrix object. Thanks for the clarification, David.
>>> 
>>> Dennis
>>> 
>>> On Wed, Jun 22, 2011 at 4:59 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>> 
>>>> On Jun 21, 2011, at 6:51 PM, Dennis Murphy wrote:
>>>> 
>>>>> Ahhh...you want a matrix. xtabs() doesn't easily allow coercion to a
>>>>> matrix object, so try this instead:
>>>> 
>>>> What am I missing? A contingency table already inherits from matrix-class
>>>> and if you insisted on coercion it  appears simple:
>>>> 
>>>>> xtb <- xtabs(value ~ year + block, data = df)
>>>>> is.matrix(xtb)
>>>> [1] TRUE
>>>>> as.matrix(xtb)
>>>>    block
>>>> year   a b c
>>>> 2000 1 0 5
>>>> 2001 2 4 6
>>>> 2002 3 0 0
>>>> 
>>>> --
>>>> David.
>>>> 
>>>>> 
>>>>> library(reshape)
>>>>> as.matrix(cast(df, year ~ block, fill = 0))
>>>>>  a b c
>>>>> 2000 1 0 5
>>>>> 2001 2 4 6
>>>>> 2002 3 0 0
>>>>> 
>>>>> Hopefully this is more helpful...
>>>>> Dennis
>>>>> 
>>>>> On Tue, Jun 21, 2011 at 3:35 PM, Dennis Murphy <djmuser at gmail.com> wrote:
>>>>>> 
>>>>>> Hi:
>>>>>> 
>>>>>> xtabs(value ~ year + block, data = df)
>>>>>>   block
>>>>>> year   a b c
>>>>>> 2000 1 0 5
>>>>>> 2001 2 4 6
>>>>>> 2002 3 0 0
>>>>>> 
>>>>>> HTH,
>>>>>> Dennis
>>>>>> 
>>>>>> On Tue, Jun 21, 2011 at 3:13 PM, Marius Hofert <m_hofert at web.de> wrote:
>>>>>>> 
>>>>>>> Dear expeRts,
>>>>>>> 
>>>>>>> In the minimal example below, I have a data.frame containing three
>>>>>>> "blocks" of years
>>>>>>> (the years are subsets of 2000 to 2002). For each year and block a
>>>>>>> certain "value" is given.
>>>>>>> I would like to create a matrix that has row names given by all years
>>>>>>> ("2000", "2001", "2002"),
>>>>>>> and column names given by all blocks ("a", "b", "c"); the entries are
>>>>>>> then given by the
>>>>>>> corresponding value or zero if not year-block combination exists.
>>>>>>> 
>>>>>>> What's a short way to achieve this?
>>>>>>> 
>>>>>>> Of course one can setup a matrix and use for loops (see below)... but
>>>>>>> that's not nice.
>>>>>>> The problem is that the years are not running from 2000 to 2002 for all
>>>>>>> three "blocks"
>>>>>>> (the second block only has year 2001, the third one has only 2000 and
>>>>>>> 2001).
>>>>>>> In principle, table() nicely solves such a problem (see below) and fills
>>>>>>> in zeros.
>>>>>>> This is what I would like in the end, but all non-zero entries should be
>>>>>>> given by df$value,
>>>>>>> not (as table() does) by their counts.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Marius
>>>>>>> 
>>>>>>> (df <- data.frame(year=c(2000, 2001, 2002, 2001, 2000, 2001),
>>>>>>>               block=c("a","a","a","b","c","c"), value=1:6))
>>>>>>> table(df[,1:2]) # complements the years and fills in 0
>>>>>>> 
>>>>>>> year <- c(2000, 2001, 2002)
>>>>>>> block <- c("a", "b", "c")
>>>>>>> res <- matrix(0, nrow=3, ncol=3, dimnames=list(year, block))
>>>>>>> for(i in 1:3){ # year
>>>>>>> for(j in 1:3){ # block
>>>>>>>     for(k in 1:nrow(df)){
>>>>>>>         if(df[k,"year"]==year[i] && df[k,"block"]==block[j]) res[i,j]
>>>>>>> <- df[k,"value"]
>>>>>>>     }
>>>>>>> }
>>>>>>> }
>>>>>>> res # does the job; but seems complicated
>>>>> 
>>>> 
>>>> 
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>> 
>>>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT
>