[R] Problem with data conversion

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Dec 14 14:29:48 CET 2003


The message probably means that the variable is a character variable and 
not numerical (as you intended) nor factor.

Although you said there was a trip to epiinfo, you never said where the 
data came from.  Try dumping out the data, editing the file, and reading 
it with read.table.  There are other ways, but one of your steps has a bug 
and we have no idea what the steps actually were.

When you are finished, try

sapply(mfdf, class)

on your dataframe `mydf'.  You should see only numeric or factor 
variables.

On Sun, 14 Dec 2003 arinbasu at softhome.net wrote:

> Hi All: 
> 
> I came across the following problem while working with a dataset, and 
> wondered if there could be a solution I sought here. 
> 
> 
> My dataset consists of information on 402 individuals with the followng five 
> variables (age,sex, status = a binary variable with levels "case" or 
> "control", mma, dma). 
> 
> During data check, I found that in the raw data, the data entry operator had 
> mistakenly put a "0" for one participant, so now, the levels show 
> 
> > levels(status) 
> [1] "0" "control" "case" 
> 
> The variables mma, and dma are actually numerical variables but in the 
> dataframe, they are represented as "characters". I tried to change the type 
> of the variables (from character to numeric) using the edit function (and 
> bringing up the data grid where then I made changes), but the changes were 
> not saved. I tried 
> 
> mma1 <- as.numeric(mma) 
> 
> but I was not successful in converting mma from a character variable to a 
> numeric variable. 
> 
> So, to edit and "clean" the data, I exported the dataset as a text file to 
> Epi Info 2002 (version 2, Windows). I used the following code: 
> 
> mysubset <- subset(workingdat, select = c(age,sex,status, mma, dma))
> write.table(mysubset, file="mysubset.txt", sep="\t", col.names=NA) 
> 
> After I made changes in the variables using Epi Info (I created a new 
> variable called "statusrec" containing values "case" and "control"), I 
> exported the file as a ".rec" file (filename "mydata.rec"). I used the 
> following code to read the file in R: 
> 
> require(foreign)
> myData <- read.epiinfo("mydata.rec", read.deleted=NA) 
> 
> Now, the problem is this, when I want to run a logistic regression, R 
> returns the following error message: 
> 
> > glm(statusrec~mma, family=binomial(link=logit))
> Error in model.frame(formula, rownames, variables, varnames, extras, 
> extranames,  :
>        invalid variable type 
> 
> 
> I cannot figure out the solution. I want to run a logistic regression now 
> with the variable statusrec (which is a binary variable containing values 
> "case" and "control"), and another
> variable (say mma, which is now a numeric variable). What does the above 
> error message mean and what could be a possible solution? 
> 
> Would greatly appreciate your insights and wisdom. 
> 
>  -Arin Basu
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list