[R] Survey Design / Rake questions

Thomas Lumley tlumley at u.washington.edu
Thu Aug 28 20:42:52 CEST 2008


On Mon, 25 Aug 2008, Farley, Robert wrote:

> I see a number of things that bother me.
>  1) str(ByEBNum$StnTraveld) says "int [1:12] 1 2 3 4 5 6 7 8 9 10 ..."
>         Even though "StnTraveld  <- c(as.factor(1:12))"

You don't want the c()
> a<-as.factor(1:12)
> str(a)
  Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(c(a))
  int [1:12] 1 2 3 4 5 6 7 8 9 10 ...

As the help for c() says  "all attributes except names are removed.", 
which includes the factor levels.

>  2) ByEBOn$StnName[1:5] seems to imply I have extra spaces in the data.  Where would they have come from?

No, that's just R printing things in columns
> a<-factor(1:12, labels=c(1:11,"antidisestablishmentarianism"))
> a
  [1] 1                            2
  [3] 3                            4
  [5] 5                            6
  [7] 7                            8
  [9] 9                            10
[11] 11                           antidisestablishmentarianism
Levels: 1 2 3 4 5 6 7 8 9 10 11 antidisestablishmentarianism


>  3) I'd like to verify that the order (value) of "EBSurvey$lineon" 
> matches my definition in "StnName"

all(levels(EBSurvey$lineon)==StnName)

 	-thomas

>
> Thanks for helping...
>
>
> ***************************************************************************
> ***************************************************************************
>> library(survey)
>> SurveyData <- read.spss("C:/Data/R/orange_delivery.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
>> #===============================================================================
>> temp <- sub(' +$', '', SurveyData$direction_)
>> SurveyData$direction_ <- temp
>> #===============================================================================
>> SurveyData$NumStn=abs(as.numeric(SurveyData$lineon)-as.numeric(SurveyData$lineoff))
>> mean(SurveyData$NumStn)
> [1] 6.785276
>> ### Kludge
>> SurveyData$NumStn <- pmax(1,SurveyData$NumStn)
>> mean(SurveyData$NumStn)
> [1] 6.789877
>> SurveyData$NumStn <- as.factor(SurveyData$NumStn)
>> ###
>> EBSurvey <- subset(SurveyData, direction_ == "EASTBOUND" )
>> XTTable <- xtabs(~direction_ , EBSurvey)
>> XTTable
> direction_
> EASTBOUND
>      345
>> WBSurvey <- subset(SurveyData, direction_ == "WESTBOUND" )
>> XTTable <- xtabs(~direction_ , WBSurvey)
>> XTTable
> direction_
> WESTBOUND
>      307
>> #
>> EBDesign <- svydesign(id=~sampn, weights=~expwgt, data=EBSurvey)
>> #   svytable(~lineon+lineoff, EBDesign)
>> StnName     <- c( "Warner Center", "De Soto", "Pierce College", "Tampa", "Reseda", "Balboa", "Woodley", "Sepulveda", "Van Nuys", "Woodman", "Valley College", "Laurel Canyon", "North Hollywood")
>> EBOnNewTots <- c(            1000,       600,             1200,     500,     1000,      500,       200,         250,       1000,       300,              100,          123.65,                0 )
>> StnTraveld  <- c(as.factor(1:12))
>> EBNumStn    <- c(673.65,     800, 1000, 1000,  800,  700,  600, 500, 400, 200,  50, 50 )
>> ByEBOn  <- data.frame(StnName,   Freq=EBOnNewTots)
>> ByEBNum <- data.frame(StnTraveld, Freq=EBNumStn)
>> RakedEBSurvey <- rake(EBDesign, list(~lineon, ~NumStn), list(ByEBOn, ByEBNum) )
> Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]],  :
>  Stratifying variables don't match
>>
>> str(EBSurvey$lineon)
> Factor w/ 13 levels "Warner Center",..: 3 1 1 1 2 13 1 5 1 5 ...
>> EBSurvey$lineon[1:5]
> [1] Pierce College Warner Center  Warner Center  Warner Center  De Soto
> 13 Levels: Warner Center De Soto Pierce College Tampa Reseda Balboa ... North Hollywood
>> str(ByEBOn$StnName)
> Factor w/ 13 levels "Balboa","De Soto",..: 11 2 5 8 6 1 12 7 10 13 ...
>> ByEBOn$StnName[1:5]
> [1] Warner Center  De Soto        Pierce College Tampa          Reseda
> 13 Levels: Balboa De Soto Laurel Canyon North Hollywood ... Woodman
>>
>> str(EBSurvey$NumStn)
> Factor w/ 12 levels "1","2","3","4",..: 10 12 4 12 8 1 8 8 12 4 ...
>> EBSurvey$NumStn[1:5]
> [1] 10 12 4  12 8
> Levels: 1 2 3 4 5 6 7 8 9 10 11 12
>> str(ByEBNum$StnTraveld)
> int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
>> ByEBNum$StnTraveld[1:5]
> [1] 1 2 3 4 5
>>
> ********************************************************************************************************************************************************
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: Thomas Lumley [mailto:tlumley at u.washington.edu]
> Sent: Saturday, August 23, 2008 09:38
> To: Farley, Robert
> Cc: r-help at r-project.org
> Subject: Re: [R] Survey Design / Rake questions
>
> On Fri, 22 Aug 2008, Farley, Robert wrote:
>
>> I *think* I'm making progress, but I'm still failing at the same step.  My rake call fails with:
>> Error in postStratify.survey.design(design, strata[[i]], population.margins[[i]],  :
>>  Stratifying variables don't match
>>
>> To my naïve eyes, it seems that my factors are "in the wrong order".  If so,
>> how do I "assert" an ordering in my survey dataframe, or copy an "image" from
>> the survey dataframe to my marginals dataframes?  I'd prefer to "pull" the
>> original marginals dataframe(s) from the survey dataframe so that I can
>> automate that in production.
>
> It looks like a problem with the NumStn factor. One copy has been converted to character and then factor, giving levels in alphabetical order; the other copy has been converted directly to factor, giving levels in numerical order.
>
> If you use as.factor(1:12) rather than as.character(1:12) it should work.
>
>      -thomas
>
>
>
>> If that's not my problem, where might I look for enlightenment?  Neither "?why" nor ?whatamimissing return citations.  :-)
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle


More information about the R-help mailing list