[R] Logistic Regression with 200K features in R?

Romeo Kienzler romeo.kienzler at gmail.com
Thu Dec 12 12:55:03 CET 2013


Dear Eik,

thank you so much for your help!

best Regards,

Romeo

On 12/12/2013 12:51 PM, Eik Vettorazzi wrote:
> I thought so (with all the limitations due to collinearity and so on),
> but actually there is a limit for the maximum size of an array which is
> independent of your memory size and is due to the way arrays are
> indexed. You can't create an object with more than 2^31-1 = 2147483647
> elements.
>
> https://stat.ethz.ch/pipermail/r-help/2007-June/133238.html
>
> cheers
>
> Am 12.12.2013 12:34, schrieb Romeo Kienzler:
>> ok, so 200K predictors an 10M observations would work?
>>
>>
>> On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:
>>> it is simply because you can't do a regression with more predictors than
>>> observations.
>>>
>>> Cheers.
>>>
>>> Am 12.12.2013 09:00, schrieb Romeo Kienzler:
>>>> Dear List,
>>>>
>>>> I'm quite new to R and want to do logistic regression with a 200K
>>>> feature data set (around 150 training examples).
>>>>
>>>> I'm aware that I should use Naive Bayes but I have a more general
>>>> question about the capability of R handling very high dimensional data.
>>>>
>>>> Please consider the following R code where "mygenestrain.tab" is a 150
>>>> by 200000 matrix:
>>>>
>>>> traindata <- read.table('mygenestrain.tab');
>>>> mylogit <- glm(V1 ~ ., data = traindata, family = "binomial");
>>>>
>>>> When executing this code I get the following error:
>>>>
>>>> Error in terms.formula(formula, data = data) :
>>>>     allocMatrix: too many elements specified
>>>> Calls: glm ... model.frame -> model.frame.default -> terms ->
>>>> terms.formula
>>>> Execution halted
>>>>
>>>> Is this because R can't handle 200K features or am I doing something
>>>> completely wrong here?
>>>>
>>>> Thanks a lot for your help!
>>>>
>>>> best Regards,
>>>>
>>>> Romeo
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list