[R] Logistic Regression with 200K features in R?

Romeo Kienzler romeo.kienzler at gmail.com
Thu Dec 12 12:34:25 CET 2013


ok, so 200K predictors an 10M observations would work?


On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:
> it is simply because you can't do a regression with more predictors than
> observations.
>
> Cheers.
>
> Am 12.12.2013 09:00, schrieb Romeo Kienzler:
>> Dear List,
>>
>> I'm quite new to R and want to do logistic regression with a 200K
>> feature data set (around 150 training examples).
>>
>> I'm aware that I should use Naive Bayes but I have a more general
>> question about the capability of R handling very high dimensional data.
>>
>> Please consider the following R code where "mygenestrain.tab" is a 150
>> by 200000 matrix:
>>
>> traindata <- read.table('mygenestrain.tab');
>> mylogit <- glm(V1 ~ ., data = traindata, family = "binomial");
>>
>> When executing this code I get the following error:
>>
>> Error in terms.formula(formula, data = data) :
>>    allocMatrix: too many elements specified
>> Calls: glm ... model.frame -> model.frame.default -> terms -> terms.formula
>> Execution halted
>>
>> Is this because R can't handle 200K features or am I doing something
>> completely wrong here?
>>
>> Thanks a lot for your help!
>>
>> best Regards,
>>
>> Romeo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list