[R] fitting a model for enumerated datatypes
Bill.Venables at cmis.csiro.au
Mon May 27 14:39:19 CEST 2002
Saket Joshi asks:
> -----Original Message-----
> From: Saket Joshi [mailto:joshi at engr.orst.edu]
> Sent: Monday, May 27, 2002 6:15 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] fitting a model for enumerated datatypes
> I have used the scan() function to read data from a file as follows:
> x <- scan("myfile", as.list(c(rep("", times = 9), class="", rep("", times
> = 3))), comment.char="")
[WNV] How about taking it in easily understood stages?
what <- as.list(rep("", 13))
names(what) <- "Class" # Note 'class' is not a really good name
x <- data.frame(scan("myfile", what = what))
> This reads in all the 13 fields. These 13 fields are enumerated or user
> datatypes. I guess they are called factors in this context although I dont
> know what exactly they are or are for.
[WNV] As you have read them they are character string vectors, not
factors. Turning your list into a data frame will (usually) convert them
into factors. It's not a bad idea to sort out in your own mind just what a
factor is, by the way. It is an important concept.
> Now I tried to use the lm()
> to fit a linear model as follows:
> y <- lm(x$class ~ ., x)
> But this is the error message I am getting:
> Error in eval(expr, envir, enclos) : Object
> "c..1....1....1....1....1....2....2....2....2....3....3....3..." not found
[WNV] I can't explain that, but as you have it x is not a data
frame and the components have no names that the lm function can use. This
might be the explanation, but it still looks odd to me. (Are you telling us
the whole story?) If you have made x into a data frame, though, all you
should need to do is
y <- lm(Class ~ ., x) # no need for x$Class on the lhs
and it should work fine, though with 12 factors you are probably
going to fit a main effects model with a fair number of degrees of freedom
> I looked at the lm() function code but could not find any eval() function
[WNV] The first thing to do is to issue a call:
which should give you a better clue of what went wrong, (but in this
case you would probably have need some advice anyway).
> My guess is that the problem occurs because none of the variables
> (either response or otherwise) are numeric, they are all user defined. Can
> someone help by telling me how to fit a linear model to these
> "non-numeric" fields or otherwise explain what the problem could be.
[WNV] You were nearly there, actually, but in this case "a miss is
as good as a mile" as my grandmother used to say...
> Thanks in advance.
> r-help mailing list -- Read
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help