[R] Issues with factors with duplicate (empty) levels

Frederik Elwert frederik.elwert at rub.de
Wed Aug 26 11:55:50 CEST 2009


Hello!

I imported a DJI survey[1] from an SPSS file. When looking at some of
the variables, I noticed problems with the `table` function and similar.
It seems to be caused by duplicate levels which are generated from the
value labels. Not all values have labels, so those who don’t get an
empty string as the level, which leads to duplicates.

I hope the code and output below illustrates the problem. Is it possible
to prevent this? I’d still like to use the labels, so using numeric
vectors instead of factors is not the best solution.

Regards,
Frederik


> library(foreign)
> Data <- read.spss("js2003_16_29_db.sav", to.data.frame=TRUE,
reencode="latin1")
> table(Data$J203_A)

überhaupt nicht wichtig                                                 
                     35                    2256                       0 
                                                                        
                      0                       0                       0 
           sehr wichtig         Mehrfachnennung 
                   4660                       0 
> table(as.numeric(Data$J203_A))

   1    2    3    4    5    6    7 
  35   39   84  227  626 1280 4660 
> is.factor(Data$J203_A)
[1] TRUE
> levels(Data$J203_A)
[1] "überhaupt nicht wichtig" " "                      
[3] " "                       " "                      
[5] " "                       " "                      
[7] "sehr wichtig"            "Mehrfachnennung"        




[1] http://213.133.108.158/surveys/index.php?m=msw,0&sID=54
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090826/64f3d71d/attachment-0002.bin>


More information about the R-help mailing list