[R] GLM question

Andra Isan andra_isan at yahoo.com
Wed Aug 24 01:23:23 CEST 2011


Thanks a lot Joshua. 

That clearly solved my problem. I actually tried number 3 and it works perfectly fine. I used the prediction function as follows:

pred= predict(glm.fit,data = dat, type="response") 

(glm.fit is my fitted model)

to predict how it predicts on my whole data but obviously I have to do cross-validation to train the model on one part of my data and predict on the other part. So, I searched for it and I found a function cv.glm which is in package boot. So, I tired to use it as:

cv.glm = (cv.glm(dat, glm.fit, cost, K=nrow(dat))$delta)

but I am not sure how to do the prediction for the hold-out data. Is there any better way for cross-validation to learn a model on training data and test it on test data in R? 

Thanks,
Andra




--- On Mon, 8/22/11, Joshua Wiley <jwiley.psych at gmail.com> wrote:

> From: Joshua Wiley <jwiley.psych at gmail.com>
> Subject: Re: [R] GLM question
> To: "Andra Isan" <andra_isan at yahoo.com>
> Cc: r-help at r-project.org
> Date: Monday, August 22, 2011, 9:54 PM
> Hi Andra,
> 
> There are several problems with what you are doing (by the
> way, I
> point them out so you can learn and improve, not to be
> harsh or rude).
>  The good news is there is a solution (#3) that is easier
> than what
> you are doing right now!
> 
> 1) glm.fit() is a function so it is a good idea not to use
> it as a variable
> 
> 2) You are looping through your variables, when you could
> avoid the
> loop and use:
>   paste(x, collapse = " + ")
> 
> for example with the first ten letters of the alphabet:
> 
> > paste(LETTERS[1:10], collapse = " + ")
> [1] "A + B + C + D + E + F + G + H + I + J"
> 
> 3) If you store your data in a data frame like:
> 
> dat <- as.data.frame(cbind(Y = y, x))
> 
> you do not need to do anything other than:
> 
> glm(Y ~ ., data = dat, family = binomial)
> 
> because R will expand the "." to be every variable in the
> dataset that
> is not the outcome.  This would be my recommendation.
> 
> 4) If you really wanted to use your pasted string, try it
> like this:
> 
> f <- "mpg ~ hp" # create formula as string
> lm(as.formula(f), data = mtcars) # convert to formula and
> use in model
> 
> although there are many variants of this some of which may
> be better.
> Still, I would recommend #3 in your case over #4.
> 
> I hope this helps,
> 
> Josh
> 
> On Mon, Aug 22, 2011 at 9:43 PM, Andra Isan <andra_isan at yahoo.com>
> wrote:
> > Hi All,
> >
> > I am trying to fit my data with glm model, my data is
> a matrix of size n*100. So, I have n rows and 100 columns
> and my vector y is of size n which contains the labels (0 or
> 1)
> >
> > My question is:
> > instead of manually typing the model as
> >  glm.fit = glm(y~ x[,1]+x[,2]+...+x[,100],
> family=binomial())
> >
> > I have a for loop as follows that concatenates the x
> variables as follows:
> >
> > final_str=NULL
> > for (m in 1:100){
> > str = paste(x[,m],+,sep="")
> > final_str= paste(final_str,str,sep="")
> > }
> >
> > glm.fit = flm(y~final_str,family=binomial())
> > but final_str is treated as a string and it does not
> work. Could you please help me with fixing that?
> >
> > Thanks a lot,
> > Andra
> >
> > ______________________________________________
> > R-help at r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, ATS Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/
>



More information about the R-help mailing list