[R] subset problem

Farley, Robert FarleyR at metro.net
Thu Mar 15 22:05:10 CET 2012


Str() found the problem; there were 2 blanks at the end of the name.  

Why does this occur when I read in with trim.factor.names = TRUE  ?



> ####
> OBDataSumm <- read.spss("P:/Data/OBSurveys/OBSurvey-2010-2011/Final Delivery, Metro On-Board O-D Survey/LAMTA_OD_WEIGHTED_DATA_SETS_012512/LAMTA_OD_SUMMARY_WEIGHTED_012512.SAV", use.value.labels=TRUE, trim_values = TRUE,trim.factor.names = TRUE, max.value.labels=Inf, to.data.frame=TRUE)
Warning message:
In read.spss("P:/Data/OBSurveys/OBSurvey-2010-2011/Final Delivery, Metro On-Board O-D Survey/LAMTA_OD_WEIGHTED_DATA_SETS_012512/LAMTA_OD_SUMMARY_WEIGHTED_012512.SAV",  :
  P:/Data/OBSurveys/OBSurvey-2010-2011/Final Delivery, Metro On-Board O-D Survey/LAMTA_OD_WEIGHTED_DATA_SETS_012512/LAMTA_OD_SUMMARY_WEIGHTED_012512.SAV: Unrecognized record type 7, subtype 18 encountered in system file
> ####



Robert Farley
LACMTA

-----Original Message-----
From: jim holtman [mailto:jholtman at gmail.com] 
Sent: Wednesday, 14 March, 2012 20:36
To: Farley, Robert
Cc: R-help at r-project.org
Subject: Re: [R] subset problem

Supply an 'str' of your dataframe so we can see what its structure is.
 Do you have leading/trailing blanks in your ROUTE values.   Print
them out and see what their number of characters (nchar) are.  Are they factors?  You have not supplied enough information like a small subset.  I bet when you create a small subset that includes the ROUTE you are using in your subset, you will see the problem especially if you use 'dput' to create it.

On Wed, Mar 14, 2012 at 9:02 PM, Farley, Robert <FarleyR at metro.net> wrote:
> I'm having a simple problem with subset.  I'm choosing what I think is a valid selection, but I either get everything or an empty dataframe.  What am I doing wrong?
>
>>
>> describe(OBDataSumm)
> Description of OBDataSumm
>
> Numeric
>                      mean    median       var        sd   valid.n 
> SAMPN            2.155e+05 1.733e+05 1.151e+10 1.073e+05 3.378e+04 
> PATTERN              62.78        35      5285      72.7 3.378e+04 
> ASSN                  8139      8116 3.378e+07      5812 3.378e+04 
> ~~Clip~~~ ROUTE Value                                 Count Percent
> MT-802                             2297      6.8
> MT-801                             1552     4.59
> MT-804                             1159     3.43
> MT-803                              776      2.3
> MT-805                              744      2.2
> MT-.51                              504     1.49
> MT-.53                              413     1.22
> MT-.35                              377     1.12
> MT-.18                              363     1.07
> MT-252                              361     1.07 mode = MT-802    
> Valid n = 33782   165 categories - only first 1 ~~Clip~~~
>> dim(OBDataSumm)
> [1] 33782   130
>> SubRed <- subset(OBDataSumm, ROUTE == "MT-802")
>> dim(SubRed)
> [1]   0 130
>>
>
>
>
>
> Robert Farley
> LACMTA
> 1 Gateway Plaza
> Los Angeles, CA 90012-2952
> (213)922-2532
> FarleyR at Metro.net
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list