[R] Studdy Missing Data, differentiate between a percent with in the valid answers and with in the different missing answers

Ericka Lundström e at it.dk
Mon Mar 3 16:00:08 CET 2008


On Mon, 03 Mar 2008 22:02:17 +1300, James Reilly wrote
> On 3/3/08 8:21 PM, Ericka Lundström wrote:
>  > I'm trying to emigrate from SPSS to R, thou I have some 
> problems whit > getting R to distinguish between the different 
> kind of missing. ... > Is there a smart way in R to 
> differentiate between missing and valid > and at the same time 
> treat both the categories within missing and > valid as 
> answers (like SPSS did above)
> 
> The Hmisc package has some support for special missing values, 
> for instance when reading in SAS datasets using sas.get. I 
> don't believe spss.get offers the same facility, though.
> 
> You can define special missing values for a variable manually, 
> which might seem a bit involved, but this could easily be 
> automated. For your example, try:
> 
> special <- dataFrame$TWO %in% c("?","X")
> attr(dataFrame$TWO, "special.miss") <-
>      list(codes=as.character(dataFrame$TWO[special]),
>      obs=(1:length(dataFrame$TWO))[special])
> class(dataFrame$TWO) <- c("factor", "special.miss")
> is.na(dataFrame$TWO) <- special
> 
> # Then describe gives new percentages
> 
> describe(dataFrame$TWO)
> dataFrame$TWO
>        n missing       ?       X  unique
>        3       4       2       2       2
> 
> No (2, 67%), yes (1, 33%)
> 
Dear James Reilly

Tanks a for your answer, now I can get - or make - ‘metacategories’ for
my data, which is wonderful! Thou I actually only needed two
‘metacategories’. One for missing answers and one for valid answers,
anyhow it looks like R are treating “X” and “?” as missing, or
subcategorise of missing. 

One thing I still need R to give me a percent with in the valid answers
(or unique) and a percent over all. Is that in anyway possible? Whit the
special.miss I doesn’t get percentages I only get distribution with in n
[No (2, 67%), yes (1, 33%)]. I don’t get an percent over all [? (2,
29%), No (2, 29%), X (2, 29%), yes (1, 14%)].
Isn’t there someone who has developed a Package for this feature?
Karsten Mueller asked about this 10 years ago 
 
https://stat.ethz.ch/pipermail/r-help/1998-October/002942.html

Hope some one have the time to help me. And again, thanks to James
Reilly for his answer!

All the best

Ericka Lujndström



More information about the R-help mailing list