[R] Loops and dataframes

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.ac.be
Fri Feb 25 13:10:46 CET 2005


or something like:

df <- data.frame(start=st, end=ed)
system.time(df[] <- lapply(df, function(x) x[1] <- x[2]), 
gcFirst=TRUE)
# or in general if you have a vectorized function `f()' and you wish 
to apply it
# in the ith and jth column of the data frame, e.g.,
# df[] <- lapply(df, function(x, f) f(x[i], x[j]), f=function(x, y) 
x*y)


Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Liaw, Andy" <andy_liaw at merck.com>
To: "'Firas Swidan'" <firas at cs.technion.ac.il>; 
<r-help at stat.math.ethz.ch>
Sent: Friday, February 25, 2005 12:33 PM
Subject: RE: [R] Loops and dataframes


> An addendum:  If you must use a data frame (e.g., you have mixed 
> data
> types), the following might help:
>
>> df <- list(start=st, end=ed)
>> system.time({for (i in 1:length(df[[1]])) df$start[i] <- df$end[i];
> +              df <- as.data.frame(df)}, gcFirst=TRUE)
> [1] 0.14 0.01 0.15   NA   NA
>
> I.e., keep it as a list until all manipulations are done, then 
> coerce to
> data frame.
>
>
> Andy
>
>
>> From: Liaw, Andy
>>
>> You are discovering part of the overhead of using a data
>> frame.  The way you
>> specify the subset of data frame to replace matters somewhat:
>>
>> > st <- rep(1,1e4)
>> > ed <- rep(2,1e4)
>> > df <- data.frame(start=st, end=ed)
>> > system.time(for (i in 1:dim(df)[1]) df[i,1] <- df[i,2],
>> gcFirst=TRUE)
>> [1] 35.96  0.10 36.37    NA    NA
>> > df <- data.frame(start=st, end=ed)
>> > system.time(for (i in 1:dim(df)[1]) df[[1]][i] <- df[[2]][i],
>> gcFirst=TRUE)
>> [1] 22.63  0.17 22.88    NA    NA
>> > df <- data.frame(start=st, end=ed)
>> > system.time(for (i in 1:dim(df)[1]) df$start[i] <- df$end[i],
>> gcFirst=TRUE)
>> [1] 19.29  0.13 19.46    NA    NA
>>
>>
>> If you have all numeric data, you might as well use a matrix
>> instead of data
>> frame:
>>
>> > m <- cbind(start=st, end=ed)
>> > str(m)
>>  num [1:10000, 1:2] 2 2 2 2 2 2 2 2 2 2 ...
>>  - attr(*, "dimnames")=List of 2
>>   ..$ : NULL
>>   ..$ : chr [1:2] "start" "end"
>> > system.time(for (i in 1:nrow(df)) m[i,1] <- m[i,2], gcFirst=TRUE)
>> [1] 0.06 0.00 0.08   NA   NA
>>
>>
>> Andy
>>
>>
>> > From: Firas Swidan
>> >
>> > Hi,
>> > I am experiencing a long delay when using dataframes inside
>> > loops and was
>> > wordering if this is a bug or not.
>> > Example code:
>> >
>> > > st <- rep(1,100000)
>> > > ed <- rep(2,100000)
>> > > for(i in 1:length(st)) st[i] <- ed[i] # works fine
>> > > df <- data.frame(start=st,end=ed)
>> > > for(i in 1:dim(df)[1]) df[i,1] <- df[i,2] #takes for ever
>> >
>> > R: R 2.0.0 (2004-10-04)
>> > OS: Linux, Fedora Core 2
>> > kernel: 2.6.10-1.14_FC2
>> > cpu: AMD Athlon XP 1600.
>> > mem: 500MB.
>> >
>> > The example above is only to illustrate the problem. I need
>> > loops to apply
>> > some functions on pairs (not necessarily successive) of rows in a
>> > dataframe.
>> >
>> > Thankful for any advices,
>> > Firas.
>> >
>> > ______________________________________________
>> > R-help at stat.math.ethz.ch mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide!
>> > http://www.R-project.org/posting-guide.html
>> >
>> >
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>>
>> --------------------------------------------------------------
>> ----------------
>> Notice:  This e-mail message, together with any attachments,
>> contains information of Merck & Co., Inc. (One Merck Drive,
>> Whitehouse Station, New Jersey, USA 08889), and/or its
>> affiliates (which may be known outside the United States as
>> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
>> Banyu) that may be confidential, proprietary copyrighted
>> and/or legally privileged. It is intended solely for the use
>> of the individual or entity named on this message.  If you
>> are not the intended recipient, and have received this
>> message in error, please notify us immediately by reply
>> e-mail and then delete it from your system.
>> --------------------------------------------------------------
>> ----------------
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list