[R] --max-vsize and --max-nsize linux?

Christian Schulz ozric at web.de
Tue Jul 20 15:40:58 CEST 2004


Many thanks  for clear me up the 
vectorized approach what's  indeed the advantage of R.

regards, christian


Am Dienstag, 20. Juli 2004 15:24 schrieb Marc Schwartz:
> On Tue, 2004-07-20 at 07:55, Christian Schulz wrote:
> > Hi,
> >
> > somtimes i have trivial recodings like this:
> > > dim(tt)
> >
> > [1] 252382     98
> >
> > system.time(for(i in 2:length(tt)){
> >               tt[,i][is.na(tt[,i])] <- 0
> >     })
> >
> > ...and a win2000(XP2000+,1GB) machine makes it in several minutes, but
> > my linux notebook (XP2.6GHZ,512MB) don't get success after some hours.
> >
> > I recognize that the cpu load is most time relative small, but  the
> > hardisk have a lot of work.
> >
> > Is this a problem of --max-vsize and --max-nsize and i should play with
> > that, because i can't believe that the difference of RAM is the reason?
> >
> > Have anybody experience what is an "optimal" setting with i.e.
> > 512 MB  RAM in Linux?
> >
> > Many thanks for help and comments
> > regards,christian
>
> Christian,
>
> I am unclear as to the nature of your loop above.
>
> Note that:
> > length(tt)
>
> [1] 24733436
>
> which is  252382 * 98. Your looping approach is not efficient and
> incorrect.
>
> Note that when trying to run your loop 'as is', I get:
> > system.time(for(i in 2:length(tt)){
>
> +               tt[,i][is.na(tt[,i])] <- 0
> +     })
> Error: subscript out of bounds
> Timing stopped at: 3.54 1.81 5.5 0 0
>
> This is because 'i' eventually exceeds the number of columns (98) in
> 'tt', since you have 'i' going from 2 to 24733436.
>
>
> I am presuming that you simply want to set any 'NA' values in 'tt' to 0?
>
> Take note of using a vectorized approach:
>
>
> tt <- matrix(sample(c(1:10, NA), 252382 * 98, replace = TRUE),
>              ncol = 98)
>
> > dim(tt)
>
> [1] 252382     98
>
> > table(is.na(tt))
>
>    FALSE     TRUE
> 22484834  2248602
>
> Now use:
> > system.time(tt[is.na(tt)] <- 0)
>
> [1] 1.56 0.73 2.42 0.00 0.00
>
> > table(is.na(tt))
>
>    FALSE
> 24733436
>
>
> This is on a 3.2 Ghz system with 2 Gb of RAM.
>
> However, this is not a memory issue, it is an inefficient use of loops.
>
> HTH,
>
> Marc Schwartz




More information about the R-help mailing list