[R] index instead of loop?

Thu Mar 8 01:57:40 CET 2012

Hello again.

Ben quant wrote
> 
> Hello,
> 
> In case anyone is interested in a faster solution for lots of columns.
> This
> solution is slower if you only have a few columns.  If anyone has anything
> faster, I would be interested in seeing it.
> 
> ### some mockup data
> z.dates =
> c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")
> 
> nms = c("A","B","C","D") # add more columns to see how the code below is
> fsater
> # these are the report dates that are the real days the data was
> available,
> so show the data the day after this date ('after' is a design decision)
> rd1 = matrix(c("20070514","20070814","20071115",   "20080213",
> "20080514",  "20080814",  "20081114",  "20090217",
>                "20070410","20070709","20071009",   "20080109",
> "20080407",  "20080708",  "20081007",  "20090112",
>                "20070426","--","--","--","--","--","--","20090319",
>                "--","--","--","--","--","--","--","--"),
>              nrow=8,ncol=4)
> dimnames(rd1) = list(z.dates,nms)
> 
> # this is the unadjusted raw data, that always has the same dimensions,
> rownames, and colnames as the report dates
> ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65,
> 
> 2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000,
>               NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138,
>               NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),
>             nrow=8,ncol=4)
> dimnames(ua) = list(z.dates,nms)
> 
> ################################ the fastest code I have found:
> 
> start_t_all = Sys.time()
> fix = function(x)
> {
>   year = substring(x, 1, 4)
>   mo = substring(x, 5, 6)
>   day = substring(x, 7, 8)
>   ifelse(year=="--", "NA", paste(year, mo, day, sep = "-"))
> }
> 
> rd = apply(rd1, 2, fix)
> dimnames(rd) = dimnames(rd)
> 
> wd1 <- seq(from =as.Date(min(z.dates)), to = Sys.Date(), by = "day")
> #wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly
> wd = sapply(wd1, as.character)
> 
> mat = matrix(NA,nrow=length(wd),ncol=ncol(ua))
> rownames(mat) = wd
> nms = as.Date(rownames(ua))
> 
> for(i in 1:length(wd)){
>   d = as.Date(wd[i])
>   diff = abs(nms - d)
>   rd_row_idx = max(which(diff == min(diff)))
>   rd_col_idx = which(rd[rd_row_idx,] < d)
> 
>   if((rd_row_idx - 1) > 0){
>     mat[i,] = ua[rd_row_idx - 1,]
>   }
>   if( length(rd_col_idx)){
>     mat[i,rd_col_idx] = ua[rd_row_idx,rd_col_idx]
>   }
> }
> colnames(mat)=colnames(ua)
> print(Sys.time()-start_t_all)
> 
> Regards,
> 
> Ben
> 
> On Tue, Mar 6, 2012 at 8:22 AM, Rui Barradas <rui1174@> wrote:
> 
>> Hello,
>>
>> > Just looking at this, but it looks like ix doesn't exist:
>> >        sapply(1:length(inxlist), function(i) if(length(ix[[i]]))
>> > fin1[ix[[i]], tkr + 1] <<- ua[i, tkr])
>> >
>> >  Trying to sort it out now.
>>
>> Right, sorry.
>> I've changed the name from 'ix' to 'inxlist' to make it more readable
>> just
>> before posting.
>> And since the object 'ix' still existed in the R global environment it
>> didn't throw an error...
>>
>> Your correction in the post that followed is what I meant.
>>
>> Correction (full loop, tested):
>>
>> for(tkr in 1:ncol(ua)){
>>        x  <- c(rd1[, tkr], as.Date("9999-12-31"))
>>         ix <- lapply(1:nr, function(i)
>>                         which(x[i] <= dt1 & dt1 < x[i + 1]))
>>         sapply(1:length(ix), function(i)
>>                 if(length(ix[[i]])) fin1[ix[[i]], tkr + 1] <<- ua[i,
>> tkr])
>> }
>>
>> Rui Barradas
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4450186.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help@ mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Maybe I'm not understanding the problem very well, but let me describe what
I'm thinking it is.

You have two tables, 'rd1' and 'ua' and a vector of dates, 'z.dates'.  The
result is a table such that:
1. From 'z.dates' make a vector of daily dates.
2. Each column is filled with numbers from 'ua' based on dates in 'rd1',
starting at the day given in step 1.
My doubt is that your last posted code seems to give a special role to
column 'A'.

> mat[225:232, ]
                               A       B              C      D
2007-11-10 636.16 2645     NaN NaN
2007-11-11 636.16 2645     NaN NaN
2007-11-12 636.16 2645     NaN NaN
2007-11-13 636.16 2645     NaN NaN
2007-11-14 636.16 2645     NaN NaN
2007-11-15 655.91 2645 -98.426 NaN
2007-11-16 655.91 2645 -98.426 NaN
2007-11-17 655.91 2645 -98.426 NaN

The values in column 'C' change following the date in column 'A'. That is
the third date in 'rd1',
more exactly, rd1[3, 1] == "20071115".

Shouldn't the values in mat[, "C"] start at 2009-03-20? The corresponding
value in 'ua' would then be 144.138.

(I still believe this can be made much faster.)

Rui Barradas

--
View this message in context: http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4455223.html
Sent from the R help mailing list archive at Nabble.com.