[R] recode problem - unexplained values

Marc Schwartz (via MN) mschwartz at mn.rr.com
Thu Sep 28 17:12:38 CEST 2006


On Thu, 2006-09-28 at 12:27 +1000, bgreen at dyson.brisnet.org.au wrote:
> I am hoping for some advice regarding the difficulties I have been having
> recoding variables which are contained in a csv file.  Table 1 (below) 
> shows there are two types of blanks - as reported in the first two
> columns. I am using windows XP & the latets version of R.
> 
> When blanks cells are replaced with a value of n using syntax: > affect
> [affect==""] <- "n"
> there are still 3 blank values (Table 2).   When as.numeric is applied,
> this also causes problems because values of 2,3 & 4 are generated rather
> than just 1 & 2.
> 
> TABLE 1
> 
> table(group,actions)
>      actions
> group           n   y
>     1 100   2   0   3
>     2  30   1   1   0
>     3  24   0   0   0
> 
> 
> 
> TABLE 2
> 
> >  table(group,actions)
>      actions
> group           n   y
>     1   0   2 100   3
>     2   0   1  31   0
>     3   0   0  24   0
> 
> 
> Below is another example - for some reason there are 2 types of 'aobh'
> values.
> 
> 
> > table(group, type)
>      type
> group aobh aobh   gbh   m  uw
>     1  104      1   0   0   0
>     2    0      0  15   0  17
>     3    0      0   0  24   0
> 
> 
> Any assistance is much appreciated,
> 
> 
> Bob Green

Bob,

A quick heads up, which is the presumption that "aobh" and "aobh  " are
different values simply as a consequence of leading/trailing spaces in
the source data file within the delimited fields. This is also the
likely reason for there being multiple missing/blank values in your
imported data set.

Presuming that you used one of the read.table() family functions (ie.
read.csv() ), take note of the 'strip.white' argument in ?read.table,
which defaults to FALSE. If you change it to TRUE, the function will
strip leading and trailing blanks, likely resolving this issue.

HTH,

Marc Schwartz



More information about the R-help mailing list