[R] Troubles with subset()

Milan Bouchet-Valat nalimilan at club.fr
Tue Dec 11 17:33:53 CET 2012

Le mardi 11 décembre 2012 à 17:18 +0100, Virgile Capo-Chichi a écrit :
> Hello all,
> I have the attached dataset which I read from SPSS and transformed a
> little bit using the script below. I am trying to run a logistic
> regression using glm() on a subset of my data. When I run the logistic
> regression on the whle dataset, it runs OK. As soon as I try to run on
> the subset, I get an error message related to different lengths of
> variables. Any idea why this might be so? Thanks for your inputs. V
This code doesn't work, because the MC0911 object does not exist.

It works fine, though, if I replace all occurrences of MC0911 with
MC0911_S, and CCU2 with ccu2.

The morale of the story is, do not use attach() but pass the data frame
via the 'data' argument to glm(), and when something is weird, close the
R session and start with a clean environment.


> Here is my code. Not sure if the data itself got itself through.
> =======================
> library (foreign)
> library (Hmisc)
> MC0911_S <- read.spss("C:/Users/samsung/Desktop/R
> Pilot/MC0911_S.sav", 
>   use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE)
> colnames(MC0911) <- tolower(colnames(MC0911))
> MC0911_S <-transform (MC0911_S, age=2009-a1)
> MC0911_S <-transform (MC0911_S, b34=b4/b3) 
> MC0911_S <-transform (MC0911_S, CCU=b34+b7+b8)
> MC0911_S$CCU2 [MC0911_S$CCU==4] <-"1"
> MC0911_S$CCU2 [MC0911_S$CCU!=4] <-"0"
> MC0911_S$a3<-as.factor(MC0911_S$a3)
> MC0911_S$a6<-as.factor(MC0911_S$a6)
> MC0911_S$CCU2<-as.numeric(MC0911_S$CCU2)
> attach(MC0911_S)
> glm(CCU2~age+a3+a6+a8+a9, family=binomial)
> detach()
> MC0911_2<-subset(MC0911, year>2009)
> attach(MC0911_2)
> glm(CCU2~age+a3+a6+a8+a9, family=binomial)
> detach()
> 2012/12/11 Milan Bouchet-Valat <nalimilan at club.fr>
>         Le mardi 11 décembre 2012 à 15:09 +0100, Virgile Capo-Chichi a
>         écrit :
>         > All,
>         > I have the attached dataset which I read from SPSS and
>         transformed a little
>         > bit using the attached script. I am trying to run a logistic
>         regression
>         > using glm() on a subset of my data. When I run the logistic
>         regression on
>         > the whle dataset, it runs OK. As soon as I try to run on the
>         subset, I get
>         > an error message related to different lengths of variables.
>         Any idea why
>         > this might be so? Thanks for your inputs. V
>         Please paste the code, as it was removed by the server.
>         Most likely, it comes from the fact that some terms in the
>         formula refer
>         to variables in the 'data' argument, and some do not.
>         Regards

More information about the R-help mailing list