[R] --max-vsize and --max-nsize linux?

Marc Schwartz MSchwartz at MedAnalytics.com
Tue Jul 20 15:24:43 CEST 2004


On Tue, 2004-07-20 at 07:55, Christian Schulz wrote:
> Hi,
> 
> somtimes i have trivial recodings like this:
> 
> > dim(tt)
> [1] 252382     98
> 
> system.time(for(i in 2:length(tt)){
>               tt[,i][is.na(tt[,i])] <- 0
>     })
>     
> ...and a win2000(XP2000+,1GB) machine makes it in several minutes, but
> my linux notebook (XP2.6GHZ,512MB) don't get success after some hours.
> 
> I recognize that the cpu load is most time relative small, but  the hardisk 
> have a lot of work.
> 
> Is this a problem of --max-vsize and --max-nsize and i should play with that, 
> because i can't believe that the difference of RAM is the reason?
> 
> Have anybody experience what is an "optimal" setting with i.e.
> 512 MB  RAM in Linux?
> 
> Many thanks for help and comments
> regards,christian


Christian,

I am unclear as to the nature of your loop above. 

Note that:

> length(tt)
[1] 24733436

which is  252382 * 98. Your looping approach is not efficient and
incorrect.

Note that when trying to run your loop 'as is', I get:

> system.time(for(i in 2:length(tt)){
+               tt[,i][is.na(tt[,i])] <- 0
+     })
Error: subscript out of bounds
Timing stopped at: 3.54 1.81 5.5 0 0 

This is because 'i' eventually exceeds the number of columns (98) in
'tt', since you have 'i' going from 2 to 24733436.


I am presuming that you simply want to set any 'NA' values in 'tt' to 0?

Take note of using a vectorized approach:


tt <- matrix(sample(c(1:10, NA), 252382 * 98, replace = TRUE), 
             ncol = 98)

> dim(tt)
[1] 252382     98

> table(is.na(tt))

   FALSE     TRUE 
22484834  2248602 


Now use:

> system.time(tt[is.na(tt)] <- 0)
[1] 1.56 0.73 2.42 0.00 0.00

> table(is.na(tt))

   FALSE 
24733436 


This is on a 3.2 Ghz system with 2 Gb of RAM.

However, this is not a memory issue, it is an inefficient use of loops.

HTH,

Marc Schwartz




More information about the R-help mailing list