[R] Converting factors back to numbers. Trouble with SPSS import data

Paul Johnson pauljohn32 at gmail.com
Sun Feb 19 21:16:53 CET 2006


I'm using Fedora Core 4, R-2.2.

The basic question is: can one recover the numerical values used in
SPSS after importing data into R with read.spss from the foreign
library?  Here's why I ask.

My colleague sent an SPSS data set. I must replicate some results she
calculated in SPSS and one problem is that the numbers used in SPSS
for variable values are not easily recovered in R.

I'm comparing 2 imported datasets, "eldat" (read.spss with No
convert-to-factors) and
"eldatfac" (read.spss with convert-to-factors)

If I bring in the data without conversion to factors:

library(foreign)
eldat <- read.spss("18CitySCBSsorted.sav", use.value.labels=F,
                        to.data.frame=T)

I can see the variable HAPPY is coded 0, 1, 2, 3.  Those are the
numbers that SPSS
uses as contrast values when it runs a regression with HAPPY.

In contrast,  allow R to translate the variables with a few value
labels into factors.

library(foreign)
eldatfac <- read.spss("18CitySCBSsorted.sav",
max.value.labels=7,to.data.frame=T)

Consider the first 50 observations on the variable HAPPY

> f<- eldatfac$HAPPY[1:50]
> f
 [1] Happy          Happy          Very happy     Happy          Very happy
 [6] Very happy     Happy          Very happy     Happy          Very happy
[11] Happy          Happy          Not very happy Very happy     Very happy
[16] Happy          Happy          Very happy     Happy          Happy
[21] Not very happy Happy          Happy          Very happy     Happy
[26] Happy          Happy          Happy          Happy          Happy
[31] Happy          Happy          Happy          Happy          Happy
[36] Happy          Very happy     Very happy     Happy          Very happy
[41] Very happy     Very happy     Happy          Very happy     Very happy
[46] Happy          Happy          Happy          Very happy     Very happy
6 Levels: Not happy at all Not very happy Happy Very happy ... Refused

> levels(f)
[1] "Not happy at all" "Not very happy"   "Happy"            "Very happy"
[5] "Don't know"       "Refused"


I need the numerical values back in order to have a regression like
SPSS.  Isn't this what ?factor says one ought to do? Why are these all
missing?

> as.numeric(levels(f))[f]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA


> as.numeric(f)
 [1] 3 3 4 3 4 4 3 4 3 4 3 3 2 4 4 3 3 4 3 3 2 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 4 4
[39] 3 4 4 4 3 4 4 3 3 3 4 4

Comparing against the "as.numeric" output from the unconverted factor,
I can see the levels are just one digit different.

> g <- eldat$HAPPY[1:50]
> as.numeric(g)
 [1] 2 2 3 2 3 3 2 3 2 3 2 2 1 3 3 2 2 3 2 2 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 3 3
[39] 2 3 3 3 2 3 3 2 2 2 3 3

I'm more worried about the kinds of variables that are coded
irregularly 1, 3, 7, 11 in the SPSS scheme.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas




More information about the R-help mailing list