[R] How many samples ACTUALLY used in regression?

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Mar 18 16:07:33 CET 2013


On 18/03/2013 14:51, Cade, Brian wrote:
> Perhaps a crude but reliable way is to check the number of residuals, e.g.,
> length(my.model$resid).

Not very reliable (what about zero weights, for example?), and the 
component is usually 'residuals'.

No one has so far mentioned nobs(), which seems to me to be the closest.

> Brian
>
> Brian S. Cade, PhD
>
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO  80526-8818
>
> email:  cadeb at usgs.gov <brian_cade at usgs.gov>
> tel:  970 226-9326
>
>
>
> On Mon, Mar 18, 2013 at 8:39 AM, Marc Schwartz <marc_schwartz at me.com> wrote:
>
>>
>> On Mar 18, 2013, at 7:36 AM, Federico Calboli <f.calboli at imperial.ac.uk>
>> wrote:
>>
>>> Dear All,
>>>
>>> is there a simple way that covers all regression models to extract the
>> number of samples from a data frame/matrix actually used in a regression
>> model?
>>>
>>> For instance I might have a data of 100 rows and 4 colums (1 response +
>> 3 explanatory variables).  If 3 samples have one or more NAs in the
>> explanatory variable columns these samples will be dropped in any model:
>>>
>>> my.model = lm(y ~ x + w + z, my.data)
>>> my.model = glm(y ~ x + w + z, my.data, family = binomial)
>>> my.model = polr(y ~ x + w + z, my.data)
>>>>>>
>>> I don't seem to be able to find one single method that works in the
>> exact same way -- irrespective of the model type -- to interrogate my.model
>> to see how many samples of my.data were actually used.  Is there such
>> function or do I need to hack something together?
>>>
>>> Best wishes
>>>
>>> Federico
>>
>>
>> I don't know that this would be universal to all possible R model
>> implementations, but should work for those that at least abide by "certain
>> standards"[1] relative to the internal use of ?model.frame.
>>
>> In the case where model functions use 'model = TRUE' as the default in
>> their call (eg. lm(),  glm() and MASS::polr()), the returned model object
>> will have a component called 'model', such that:
>>
>>    nrow(my.model$model)
>>
>> returns the number of rows in the internally created data frame.
>>
>> Note that 'model = TRUE' is not the default for many functions, for
>> example Terry's coxph() in survival or Frank's lrm() in rms.
>>
>> Note also that the value of 'na.action' in the modeling function call may
>> have a potential effect on whether the number of rows in the retained
>> 'model' data frame is really the correct value.
>>
>> You can also use model.frame(), independently matching arguments passed to
>> the model function, to replicate what takes place internally in many
>> modeling functions. The result of model.frame() will be a data frame,
>> again, subject to similar limitations as above.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>> [1]: http://developer.r-project.org/model-fitting-functions.txt
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list