[R] Logistic Regression with 200K features in R?

Eik Vettorazzi E.Vettorazzi at uke.de
Thu Dec 12 13:08:26 CET 2013


thanks Duncan for this clarification.
A double precision matrix with 2e11 elements (as the op wanted) would
need about 1.5 TB memory, that's more than a standard (windows 64bit)
computer can handle.

Cheers.

Am 12.12.2013 13:00, schrieb Duncan Murdoch:
> On 13-12-12 6:51 AM, Eik Vettorazzi wrote:
>> I thought so (with all the limitations due to collinearity and so on),
>> but actually there is a limit for the maximum size of an array which is
>> independent of your memory size and is due to the way arrays are
>> indexed. You can't create an object with more than 2^31-1 = 2147483647
>> elements.
>>
>> https://stat.ethz.ch/pipermail/r-help/2007-June/133238.html
> 
> That post is from 2007.  The limits were raised considerably when R
> 3.0.0 was released, and it is now 2^48 for disk-based operations, 2^52
> for working in memory.
> 
> Duncan Murdoch
> 
> 
>>
>> cheers
>>
>> Am 12.12.2013 12:34, schrieb Romeo Kienzler:
>>> ok, so 200K predictors an 10M observations would work?
>>>
>>>
>>> On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:
>>>> it is simply because you can't do a regression with more predictors
>>>> than
>>>> observations.
>>>>
>>>> Cheers.
>>>>
>>>> Am 12.12.2013 09:00, schrieb Romeo Kienzler:
>>>>> Dear List,
>>>>>
>>>>> I'm quite new to R and want to do logistic regression with a 200K
>>>>> feature data set (around 150 training examples).
>>>>>
>>>>> I'm aware that I should use Naive Bayes but I have a more general
>>>>> question about the capability of R handling very high dimensional
>>>>> data.
>>>>>
>>>>> Please consider the following R code where "mygenestrain.tab" is a 150
>>>>> by 200000 matrix:
>>>>>
>>>>> traindata <- read.table('mygenestrain.tab');
>>>>> mylogit <- glm(V1 ~ ., data = traindata, family = "binomial");
>>>>>
>>>>> When executing this code I get the following error:
>>>>>
>>>>> Error in terms.formula(formula, data = data) :
>>>>>     allocMatrix: too many elements specified
>>>>> Calls: glm ... model.frame -> model.frame.default -> terms ->
>>>>> terms.formula
>>>>> Execution halted
>>>>>
>>>>> Is this because R can't handle 200K features or am I doing something
>>>>> completely wrong here?
>>>>>
>>>>> Thanks a lot for your help!
>>>>>
>>>>> best Regards,
>>>>>
>>>>> Romeo
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
> 

-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790
--

Besuchen Sie uns auf: www.uke.de
_____________________________________________________________________

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg
Vorstandsmitglieder: Prof. Dr. Christian Gerloff (Vertreter des Vorsitzenden), Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Rainer Schoppik
_____________________________________________________________________

SAVE PAPER - THINK BEFORE PRINTING



More information about the R-help mailing list