[R] e1071/svm: never finishes

Sam Steingold sds at gnu.org
Fri Feb 10 17:00:27 CET 2012


> * Sam Steingold <fqf at tah.bet> [2012-02-10 10:01:54 -0500]:
>
> When I tried to run svm on the same data frame, memory usage as reported
> by top(1) doubled to 4GB almost right away and the function never
> returned (has been running for ~15 hours now). ^C does not stop it.
> This is most unusual, libsvm has always seemed very fast.

looks like it _is_ libsvm:

#0  0x00007ffff2aedc64 in Solver::select_working_set (this=0x7fffffff97f0, out_i=@0x7fffffff95a0, out_j=@0x7fffffff95b0) at svm.cpp:852
#1  0x00007ffff2aef91d in Solver::Solve (this=0x7fffffff97f0, l=285724, Q=..., p_=<optimized out>, y_=<optimized out>, alpha_=0x6023fb60, Cp=1, 
    Cn=1, eps=<optimized out>, si=0x7fffffff9980, shrinking=1) at svm.cpp:573
#2  0x00007ffff2af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fffffff9980, alpha=0x6023fb60, param=<optimized out>, prob=0x7fffffff9c30) at svm.cpp:1444
#3  svm_train_one (prob=0x7fffffff9c30, param=<optimized out>, Cp=1, Cn=1) at svm.cpp:1641
#4  0x00007ffff2af4a8e in svm_train (prob=<optimized out>, param=0x7fffffff9d40) at svm.cpp:2179
#5  0x00007ffff2aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, c=<optimized out>, y=<optimized out>, rowindex=<optimized out>, 
    colindex=<optimized out>, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0, 
    nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50, 
    shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40, 
    index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8, 
    cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at Rsvm.c:275
#6  0x00007ffff792cefc in ?? () from /usr/lib/R/lib/libR.so
#7  0x00007ffff795da1d in Rf_eval () from /usr/lib/R/lib/libR.so
#8  0x00007ffff795f540 in ?? () from /usr/lib/R/lib/libR.so
#9  0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#10 0x00007ffff795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#11 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#12 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#13 0x00007ffff79ad784 in Rf_usemethod () from /usr/lib/R/lib/libR.so
#14 0x00007ffff79ada47 in ?? () from /usr/lib/R/lib/libR.so
#15 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#16 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#17 0x00007ffff795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#18 0x00007ffff795f540 in ?? () from /usr/lib/R/lib/libR.so
#19 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#20 0x00007ffff795db9b in ?? () from /usr/lib/R/lib/libR.so
#21 0x00007ffff795dad9 in Rf_eval () from /usr/lib/R/lib/libR.so
#22 0x00007ffff795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#23 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#24 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#25 0x00007ffff795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#26 0x00007ffff7998055 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
#27 0x00007ffff79982e0 in ?? () from /usr/lib/R/lib/libR.so
#28 0x00007ffff7998370 in run_Rmainloop () from /usr/lib/R/lib/libR.so
#29 0x000000000040078b in main ()
#30 0x00007ffff72d930d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#31 0x00000000004007bd in _start ()


#0  0x00007ffff2aeeb67 in Kernel::dot (px=0x48eeb220, py=0x4b21890) at svm.cpp:295
#1  0x00007ffff2af7a25 in Kernel::kernel_rbf (this=<optimized out>, i=<optimized out>, j=<optimized out>) at svm.cpp:239
#2  0x00007ffff2af782c in SVC_Q::get_Q (this=0x7fffffff9870, i=187701, len=208039) at svm.cpp:1271
#3  0x00007ffff2aef9ab in Solver::Solve (this=0x7fffffff97f0, l=285724, Q=..., p_=<optimized out>, y_=<optimized out>, alpha_=0x6023fb60, Cp=1,
    Cn=1, eps=<optimized out>, si=0x7fffffff9980, shrinking=1) at svm.cpp:591
#4  0x00007ffff2af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fffffff9980, alpha=0x6023fb60, param=<optimized out>, prob=0x7fffffff9c30) at svm.cpp:1444
#5  svm_train_one (prob=0x7fffffff9c30, param=<optimized out>, Cp=1, Cn=1) at svm.cpp:1641
#6  0x00007ffff2af4a8e in svm_train (prob=<optimized out>, param=0x7fffffff9d40) at svm.cpp:2179
#7  0x00007ffff2aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, c=<optimized out>, y=<optimized out>, rowindex=<optimized out>,
    colindex=<optimized out>, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0,
    nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50,
    shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40,
    index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8,
    cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at Rsvm.c:275
#8  0x00007ffff792cefc in ?? () from /usr/lib/R/lib/libR.so
#9  0x00007ffff795da1d in Rf_eval () from /usr/lib/R/lib/libR.so
#10 0x00007ffff795f540 in ?? () from /usr/lib/R/lib/libR.so
#11 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#12 0x00007ffff795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#13 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#14 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#15 0x00007ffff79ad784 in Rf_usemethod () from /usr/lib/R/lib/libR.so
#16 0x00007ffff79ada47 in ?? () from /usr/lib/R/lib/libR.so
#17 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#18 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#19 0x00007ffff795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#20 0x00007ffff795f540 in ?? () from /usr/lib/R/lib/libR.so
#21 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#22 0x00007ffff795db9b in ?? () from /usr/lib/R/lib/libR.so
#23 0x00007ffff795dad9 in Rf_eval () from /usr/lib/R/lib/libR.so
#24 0x00007ffff795f6c9 in ?? () from /usr/lib/R/lib/libR.so
#25 0x00007ffff795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so
#26 0x00007ffff7960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#27 0x00007ffff795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so
#28 0x00007ffff7998055 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
#29 0x00007ffff79982e0 in ?? () from /usr/lib/R/lib/libR.so
#30 0x00007ffff7998370 in run_Rmainloop () from /usr/lib/R/lib/libR.so
#31 0x000000000040078b in main ()
#32 0x00007ffff72d930d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#33 0x00000000004007bd in _start ()



> This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu).
>
>> * Sam Steingold <fqf at tah.bet> [2012-02-09 21:43:30 -0500]:
>>
>> I did this:
>> nb <- naiveBayes(users, platform)
>> pl <- predict(nb,users)
>> nrow(users) ==> 314781
>> ncol(users) ==> 109
>>
>> 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
>> (tens of minutes).  why?
>>
>> 2. the predict results were completely off the mark (quite the opposite
>> of the expected overfitting).  suffice it to show the tables:
>>
>> pl:
>>
>>    android blackberry       ipad     iphone         lg      linux        mac 
>>          3          5         11         14     312723          5         11 
>>     mobile      nokia    samsung    symbian    unknown    windows 
>>       1864         17         16        112          0          0 
>>
>> platform:
>>    android blackberry       ipad     iphone         lg      linux        mac 
>>      18013       1221       2647       1328          4       2936      34336 
>>     mobile      nokia    samsung    symbian    unknown    windows 
>>         18         88         39        103       2660     251388 
>>
>> i.e., nb classified nearly everything as "lg" while in the actual data
>> "lg" is virtually nonexistent.
>>
>> 3. when I print "nb", I see "A-priori probabilities" (which are what I
>> expected) and "Conditional probabilities" which are confusing because
>> there are only two of them, e.g.:
>>
>>              android    0.048464998 0.43946764
>>              blackberry 0.001638002 0.04045564
>>              ipad       0.322251606 1.84940588
>>              iphone     0.030873494 0.23250250
>>              lg         0.000000000 0.00000000
>>              linux      0.023501362 0.34698919
>>              mac        0.082653774 1.22535027
>>              mobile     0.000000000 0.00000000
>>              nokia      0.000000000 0.00000000
>>              samsung    0.000000000 0.00000000
>>              symbian    0.000000000 0.00000000
>>              unknown    0.003759398 0.08219078
>>              windows    0.021158528 0.32916970
>>
>> the predictors are integers.
>> is the first column for the 0 predictors and the second for all non-0?
>> Is there a way to ask naiveBayes to differenciate between non-0 values?
>>
>> thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://pmw.org.il http://iris.org.il http://ffii.org
http://truepeace.org http://memri.org http://www.memritv.org
If a train station is a place where a train stops, what's a workstation?



More information about the R-help mailing list