[R] Performance problems to fill up a dataframe

Adrian Dusa dusa.adrian at gmail.com
Mon Sep 24 12:43:42 CEST 2007


Try to assign some names to your initial variables:
dat <- data.frame(A=c(60001,60001,60050,60050,60050), B=c(27,129,618,27,1579))

And what you want is simply:
> table(dat)
       B
A       27 129 618 1579
  60001  1   1   0    0
  60050  1   0   1    1

Why do you need it as a dataframe anyway?
Hth,
Adrian

On Monday 24 September 2007, Florian Jansen wrote:
> Dear Listmembers,
>
> I'm trying to fill up a dataframe depending on an arbitrary list of
> references:
>
> Here is my code, which works:
>
> dat <- data.frame(c(60001,60001,60050,60050,60050),c(27,129,618,27,1579))
> LR <- sort(unique(dat[,1]))
> LC <- sort(unique(dat[,2]))
> m <- as.data.frame(matrix(data=NA, nrow=length(LR), ncol=length(LC),
> dimnames=list(LR,LC)))
>
> for(i in 1:nrow(dat)){
>   m[as.character(dat[i,1]), as.character(dat[i,2])] <- 1
>   }
> m[is.na(m)] <- 0
>
> Now I'm trying to prevent the loop, because it take ages for a list of
> 20000 entries, but I run out of ideas.
> Should I inflate my list beforehand and how? Can I adress the dataframe
> fields more effieciently?
>
> Thanks for your help.



-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
          +40 21 3120210 / int.101



More information about the R-help mailing list