[R] Extracting numbers from a character variable of different types

David dwinsemius at comcast.net
Mon Mar 19 00:05:40 CET 2012



On Mar 18, 2012, at 3:17 PM, Daniel Malter <daniel at umd.edu> wrote:

> Assume your year value is 
> 
> x<-007/A
> 
> You want to replace all non-numeric characters (i.e. letters and
> punctuation) and all zeros with nothing.
> 
> gsub('[[:alpha:]]|[[:punct:]]|0','',x)
> 
> Let's say you have a vector with both month and year values (you can
> separate them). Now we need to identify the cells that have a month or year
> indicator
> 
> x<-c("007/A","007/a","003/M","003/m")
> 
> grep("/A|/a",x) #cells in x with year information
> grep("/M|/m",x) #cells in x with month information
> 
> To remove all characters, punctuation, and 0s from x, do:
> 
> gsub('[[:alpha:]]|[[:punct:]]|0','',x)
> 
> which you can also do specifically for the cells that identify months and
> years, respectively:
> 
> years<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/A|/a",x)])

The problem with this approach is that the years vector becomes disjoint from the months vector. It doesn't lend itself well to data.frame operations.

-- 
David
Sent from my iPhone


> #years
> years
> months<-gsub('[[:alpha:]]|[[:punct:]]|0','',x[grep("/M|/m",x)]) #months
> months
> 
> Convert the resulting character vectors into numeric vectors by
> as.numeric(as.character(years)) , for example.
> 
> HTH,
> Daniel
> 
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Extracting-numbers-from-a-character-variable-of-different-types-tp4482248p4482732.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list