[R] Patterns on postal codes

Wed Jan 8 17:12:56 CET 2014

Hi,
You may also try:
library(gsubfn)
library(plyr)

dat1 <- data.frame(zipcode=zipcode, pattern= gsubfn(".",as.list(mapply(assign,c(LETTERS,letters,0:9),rep(c("A","N"),c(52,10)))),zipcode) )
dat2 <- data.frame(country=rep(c("US","Canada"),each=2),pattern= c("NNNNN-NNNN","NNNNN","ANAAAN", "ANA NAN"))
join(dat1,dat2,by="pattern")

A.K.

On Wednesday, January 8, 2014 1:44 AM, Frede Aakmann Tøgersen <frtog at vestas.com> wrote:
Hi

Something like this.

## 4 valid zips + 4 invalid zips
zipcode <- c("22942-0173", "32601", "N9YZE6", "S7V 1J9", "0022942-0173", "32-601", "NN9YZE6", "S7V  1J9")

tmp <- gsub("[[:space:]]", "_", zipcode)
tmp <- gsub("[[:alpha:]]", "A", tmp)
tmp <- gsub("[[:digit:]]", "N", tmp)

tmp
## [1] "NNNNN-NNNN"   "NNNNN"        "ANAAAN"       "ANA_NAN"      "NNNNNNN-NNNN"
## [6] "NN-NNN"       "AANAAAN"      "ANA__NAN"    

patterns <- c("NNNNN-NNNN", "NNNNN", "ANAAAN", "ANA_NAN")

zipcode[tmp %in% patterns]
## [1] "22942-0173" "32601"      "N9YZE6"     "S7V 1J9"  
zipcode[!tmp %in% patterns]
## [1] "0022942-0173" "32-601"       "NN9YZE6"      "S7V  1J9"    

Yours sincerely / Med venlig hilsen

Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance & Modeling

Technology & Service Solutions
T +45 9730 5135
M +45 2547 6050
frtog at vestas.com
http://www.vestas.com

Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender. 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Jeff Johnson
> Sent: 8. januar 2014 00:11
> To: r-help at r-project.org
> Subject: [R] Patterns on postal codes
> 
> Hi all,
> 
> I'm pretty new to R and have a question. I have a postal_code field which
> can have a variety of values such as:
> For US postal codes: 22942-0173 or 32601
> For Canada postal codes: N9YZE6 or S7V 1J9
> 
> What I want to do is represent these as patterns, such as:
> US: NNNNN-NNNN or NNNNN
> Canada: ANAAAN or ANA NAN
> where N = any number and A = any alpha character, space = space, etc (other
> characters such as ' should be represented as '.
> 
> Ultimately I want to count these to see how many have a pattern of
> NNNNN-NNNN, ANA NAN, etc so that I can visualize the outliers.
> 
> Does anyone know if there is a built-in function in R to do this?
> Currently, the str() function on the postal_code field shows a factor with
> 90,993 levels which isn't particularly helpful.
> 
> Thanks in advance!
> 
> --
> Jeff
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.