[R] caretNWS and training data set sizes

Max Kuhn mxkuhn at gmail.com
Mon Mar 10 17:41:12 CET 2008


What version of caret and caretNWS are you using? Also, what version
of the nws server and twisted are you using? What kind of machine (#
processors, how much physical memory etc)?

I haven't seen any real limitations with one exception: if you are
running P jobs on the same machine, you are replicating the memory
needs P times.

I've been running jobs with 4K to 90K samples and 1200 predictors
without issues, so I'll need a lot more information to help you.

Max


On Mon, Mar 10, 2008 at 12:04 PM, Tait, Peter <ptait at skura.com> wrote:
> Hi,
>
>  I am using the caretNWS package to train some supervised regression models (gbm, lasso, random forest and mars). The problem I have encountered started when my training data set increased in the number of predictors and the number of observations.
>
>  The training data set has 347 numeric columns. The problem I have is when there are more then 2500 observations the 5 sleigh objects start but do not use any CPU resources and do not process any data.
>
>  N=100                     cpu(%)       memory(K)
>  Rgui.exe                   0           91737
>  5x sleighs (RTerm.exe)    15-25         ~27000
>
>  N=2500
>  Rgui.exe                  0             160000
>  5x sleighs (RTerm.exe)    15-25         ~74000
>
>  N=5000
>  Rgui.exe                  50             193000
>  5x sleighs (RTerm.exe)    0             ~19000
>
>
>  A 10% sample of my overall data is ~22000 observations.
>
>  Can someone give me an idea of the limitations of the nws and caretNWS packages in terms of the number of columns and rows of the training matrices and if there are other tuning/training functions that work faster on large datasets?
>
>  Thanks for your help.
>  Peter
>
>
>  > version
>                _
>  platform       i386-pc-mingw32
>  arch           i386
>  os             mingw32
>  system         i386, mingw32
>  status
>  major          2
>  minor          6.2
>  year           2008
>  month          02
>  day            08
>  svn rev        44383
>  language       R
>  version.string R version 2.6.2 (2008-02-08)
>
>  > memory.limit()
>  [1] 2047
>
>  ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>



-- 

Max



More information about the R-help mailing list