[R] Survey Design / Rake questions

Farley, Robert FarleyR at metro.net
Fri Aug 29 03:04:13 CEST 2008


I'm feeling like I just don't get it.  My attempt at rake now fails
with:
Error in postStratify.survey.design(design, strata[[i]],
population.margins[[i]],  : 
  Stratifying variables don't match

The factors in the data frame looks fine.  Should I have the same
structure in the design?
> str(EBDesign$lineon)
 NULL
> str(EBSurvey$lineon)
 Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
> str(ByEBOn$StnName)
 Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
> all(levels(EBSurvey$lineon)==StnName)
[1] TRUE
> #
> str(EBDesign$NumStn)
 NULL
> str(EBSurvey$NumStn)
 Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
> str(ByEBNum$StnTraveld)
 Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
> all(levels(EBSurvey$NumStn)==StnTraveld)
[1] TRUE

A complete listing is below:
**************************************************
**************************************************
**************************************************
> sessionInfo()        # List loaded packages
R version 2.7.2 (2008-08-25) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] graphics  grDevices utils     datasets  stats     methods   base


other attached packages:
[1] survey_3.8-1   fortunes_1.3-5 moonsun_0.1    prettyR_1.3-2
foreign_0.8-29
> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav",
use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
>
#=======================================================================
========
> temp <- sub(' +$', '', SurveyData$direction_) 
> SurveyData$direction_ <- temp
>
#=======================================================================
========
> # Calc. # stations traversed from StnOn/StnOff
>
SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyDat
a$lineoff))
> #################################################### Kludge
> mean(SurveyData$NumStn)
[1] 6.785276
> SurveyData$NumStn <- pmax(1,SurveyData$NumStn)
> mean(SurveyData$NumStn)
[1] 6.789877
> ####################################################
> SurveyData$NumStn <- as.factor(SurveyData$NumStn)
>
#=======================================================================
========
> # Adjust one direction at a time.  Start W/ EB {learn subsetting
later}
> EBSurvey <- subset(SurveyData, direction_ == "EASTBOUND" )
> EBDesign <- svydesign(id=~sampn, weights=~expwgt, data=EBSurvey)
>
#=======================================================================
========
> # New Marignals {start w/ 2 dimensions: StnOn X Distance}   
> StnName <- as.factor(c( "Warner Center", "De Soto", "Pierce College",
"Tampa", "Reseda", "Balboa", "Woodley", "Sepulveda", "Van Nuys",
"Woodman", "Valley College", "Laurel Canyon", "North Hollywood"))
> EBOnNewTots       <- c(            1000,       600,             1200,
500,     1000,      500,       200,         250,       1000,       300,
100,          123.65,                0 )
> ByEBOn  <- data.frame(StnName, Freq=EBOnNewTots)
> #
> StnTraveld <- as.factor(1:12)
> EBNumStn   <- c(673.65,     800, 1000, 1000,  800,  700,  600, 500,
400, 200,  50, 50 )
> ByEBNum    <- data.frame(StnTraveld, Freq=EBNumStn)
> #
> RakedEBSurvey <- rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn,
ByEBNum) )
Error in postStratify.survey.design(design, strata[[i]],
population.margins[[i]],  : 
  Stratifying variables don't match
> #
> str(EBDesign$lineon)
 NULL
> str(EBSurvey$lineon)
 Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
> str(ByEBOn$StnName)
 Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
> all(levels(EBSurvey$lineon)==StnName)
[1] TRUE
> #
> str(EBDesign$NumStn)
 NULL
> str(EBSurvey$NumStn)
 Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
> str(ByEBNum$StnTraveld)
 Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
> all(levels(EBSurvey$NumStn)==StnTraveld)
[1] TRUE
> #
**************************************************
**************************************************
**************************************************

Robert Farley
Metro
www.Metro.net 


-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu] 
Sent: Thursday, August 28, 2008 11:43
To: Farley, Robert
Cc: r-help at r-project.org
Subject: Re: [R] Survey Design / Rake questions

On Mon, 25 Aug 2008, Farley, Robert wrote:

> I see a number of things that bother me.
>  1) str(ByEBNum$StnTraveld) says "int [1:12] 1 2 3 4 5 6 7 8 9 10 ..."
>         Even though "StnTraveld  <- c(as.factor(1:12))"

You don't want the c()
> a<-as.factor(1:12)
> str(a)
  Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(c(a))
  int [1:12] 1 2 3 4 5 6 7 8 9 10 ...

As the help for c() says  "all attributes except names are removed.", 
which includes the factor levels.

>  2) ByEBOn$StnName[1:5] seems to imply I have extra spaces in the
data.  Where would they have come from?

No, that's just R printing things in columns
> a<-factor(1:12, labels=c(1:11,"antidisestablishmentarianism"))
> a
  [1] 1                            2
  [3] 3                            4
  [5] 5                            6
  [7] 7                            8
  [9] 9                            10
[11] 11                           antidisestablishmentarianism
Levels: 1 2 3 4 5 6 7 8 9 10 11 antidisestablishmentarianism


>  3) I'd like to verify that the order (value) of "EBSurvey$lineon" 
> matches my definition in "StnName"

all(levels(EBSurvey$lineon)==StnName)

 	-thomas


Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list