[R] Issues with factors with duplicate (empty) levels

Frederik Elwert frederik.elwert at rub.de
Thu Aug 27 14:44:36 CEST 2009


Hello again,

Just for your information, I think I found a way to work around the
problem described below. I don’t know if it’s the most elegant way, but
it seems to work.

Am Mittwoch, den 26.08.2009, 11:55 +0200 schrieb Frederik Elwert:
> Hello!
> 
> I imported a DJI survey[1] from an SPSS file. When looking at some of
> the variables, I noticed problems with the `table` function and similar.
> It seems to be caused by duplicate levels which are generated from the
> value labels. Not all values have labels, so those who don’t get an
> empty string as the level, which leads to duplicates.
> 
> I hope the code and output below illustrates the problem. Is it possible
> to prevent this? I’d still like to use the labels, so using numeric
> vectors instead of factors is not the best solution.
> 
> Regards,
> Frederik
> 
> 
> > library(foreign)
> > Data <- read.spss("js2003_16_29_db.sav", to.data.frame=TRUE,
> reencode="latin1")
> > table(Data$J203_A)
> 
> überhaupt nicht wichtig                                                 
>                      35                    2256                       0 
>                                                                         
>                       0                       0                       0 
>            sehr wichtig         Mehrfachnennung 
>                    4660                       0 
> > table(as.numeric(Data$J203_A))
> 
>    1    2    3    4    5    6    7 
>   35   39   84  227  626 1280 4660 
> > is.factor(Data$J203_A)
> [1] TRUE
> > levels(Data$J203_A)
> [1] "überhaupt nicht wichtig" " "                      
> [3] " "                       " "                      
> [5] " "                       " "                      
> [7] "sehr wichtig"            "Mehrfachnennung"        

	for (i in 1:ncol(Data)){
	    if (is.factor(Data[,i])){
                lvl <- levels(JS2003[,i])
	        if (" " %in% lvl){
	            empty <- lvl == " "
	            lvl[empty] <- (1:length(lvl))[empty]
    	        levels(Data[,i]) <- lvl
	        }
	    }
	}

> table(Data$J203_A)

überhaupt nicht wichtig                       2                       3 
                     35                      39                      84 
                      4                       5                       6 
                    227                     626                    1280 
           sehr wichtig         Mehrfachnennung 
                   4660                       0




More information about the R-help mailing list