[R] Possibly more coefficients?

Ben Bolker bbolker at gmail.com
Mon Apr 2 01:54:11 CEST 2012


Sarah Goslee <sarah.goslee <at> gmail.com> writes:

> 
> I saw this earlier; double-posting is discouraged. If you don't get a
> reply, it's more likely that you wrote a poorly-formed question than
> that nobody saw it.
> 
> For instance, this is not a reproducible example, and we know nothing
> about your data, so nobody can judge whether the results you're
> getting are reasonable, or if there's a way to get more information.
> 
> That  makes your question unanswerable. If you want an answer, you'll
> need to follow the posting guide and provide the requested
> reproducible example.
> 
> Sarah
> 
> On Sun, Apr 1, 2012 at 7:30 PM,  <abigailclifton <at> me.com> wrote:
> > Hi there,
> >
> > I have this code:
> > Prepared_Data <-  na.omit(read.csv("Prepared_Data.csv", header=TRUE))
> > pd <- Prepared_Data[,-3]  ## data minus response variable
> >
> > lev <- sapply(pd,function(x) length(unique(x)))
> >
> > ## total parameters for n variables
> > par(las=1,bty="l")
> > plot(cumprod(lev),log="y")
> >
> > library(Matrix)
> > m <- sparse.model.matrix(~.^2,data=pd)
> > ncol(m)
> >
> > library(glmnet)
> > g1 <- glmnet(m,Prepared_Data$C3, family="binomial")
> > Coef(g1)
> >
> >
> >
> > Which prints out the coefficients of g1. 
> However there are very few numerical coefficients, and many
> dots. Is there any way to get numerical values for all 
> factors/terms, making it a more complete model
> without lots of gaps?
> >
> 

  I would add that this *is* reproducible, if you carried the history
of the question along with it; in an earlier version (possibly only
in private e-mail to me) you posted a link to data, description
of what you were trying to do, etc..  By stripping the description
down and removing the history at each step, you are dropping a lot
of useful context that would help you get an answer from the list.
  
  By the way, I strongly suspect that the reason that you are
getting all the 'dots' is that the penalized regression is (by
design) removing variables that aren't doing anything.  If you really
want all the coefficients, you should just fit the full model.
You can't have it both ways, though -- either you can massively
overfit a standard GLM and have parameters that don't mean anything,
or you can use a penalized/shrinkage approach that will get rid
of a lot of junk in a disciplined way ...



More information about the R-help mailing list