[R] Character (1a, 1b) to numeric

Fox, John j|ox @end|ng |rom mcm@@ter@c@
Fri Jul 10 22:21:06 CEST 2020


Dear Bert,

Wouldn't you know it, but your contribution arrived just after I pressed "send" on my last message? So here's how your solution compares:

> microbenchmark(John = John <- xn[x], 
+                Rich = Rich <- xn[match(x, xc)], 
+                Jeff = Jeff <- {
+                   n <- as.integer( sub( "[a-i]$", "", x ) )
+                   d <- match( sub( "^\\d+", "", x ), letters[1:9] )
+                   d[ is.na( d ) ] <- 0
+                   n + d / 10
+                },
+                David = David <- as.numeric(gsub("a", ".3", 
+                                      gsub("b", ".5", 
+                                           gsub("c", ".7", x)))),
+                Bert = Bert <- {
+                   nums <- sub("[[:alpha:]]+","",x)  
+                   alph <- sub("\\d+","",x)  
+                   as.numeric(nums) + ifelse(alph == "",0, vals[alph])
+                },
+                times=1000L
+                )
Unit: microseconds
  expr       min         lq       mean    median         uq       max neval  cld
  John   261.739   373.9765   599.9411   536.571   569.3750  14489.48  1000 a   
  Rich   250.697   372.4450   542.3208   520.383   554.7215  10682.73  1000 a   
  Jeff 10879.223 13477.7665 15647.7856 15549.255 17516.7420 146155.28  1000  b  
 David 14337.510 18375.0100 20325.8796 20187.174 22161.0195  32575.31  1000    d
  Bert 12344.506 15753.2510 18024.2757 17702.838 19973.0465  32043.80  1000   c 
> all.equal(John, Rich)
[1] TRUE
> all.equal(John, David)
[1] "names for target but not for current"
> all.equal(John, Jeff)
[1] "names for target but not for current" "Mean relative difference: 0.1498243" 
> all.equal(John, Bert)
[1] "names for target but not for current"

To make the comparison fair, I moved the parts of the solutions that don't depend on the length of the data outside the benchmark. Your solution does have the virtue of providing the right answer.

Best,
 John

> On Jul 10, 2020, at 3:54 PM, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> 
> ... and continuing with this cute little thread...
> 
> I found the OP's specification a little imprecise -- are your values always a string that begins with *some sort" of numeric value followed by "some sort" of alpha code? That is, could the numeric value be several digits and the alpha code several letters? Probably not, and the existing solutions you have been provided are almost certainly all you need. But for fun, assuming this more general specification, here is a general way to split your alphanumeric codes up into numeric and alpha parts and then convert by using a couple of sub() 's.
> 
> > set.seed(131)
> > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, replace = TRUE) 
> > nums <- sub("[[:alpha:]]+","",xc)  ## extract numeric part
> > alph <- sub("\\d+","",xc)   ## extract alpha part
> > codes <- letters[1:3] ## whatever alpha codes are used
> > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to convert codes to
> > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph])
> > data.frame (xc = xc, xnew = xnew)
>    xc xnew
> 1  1a  1.3
> 2   2  2.0
> 3  1c  1.7
> 4  1c  1.7
> 5  1b  1.5
> 6  1a  1.3
> 7   2  2.0
> 8   2  2.0
> 9  1a  1.3
> 10 1a  1.3
> 11 2c  2.7
> 12 1b  1.5
> 13 1b  1.5
> 14  1  1.0
> 15 1c  1.7
> 
> Echoing others, no claim for optimality in any sense.
> 
> Cheers,
> Bert
> 
> 
> On Fri, Jul 10, 2020 at 12:28 PM David Carlson <dcarlson using tamu.edu> wrote:
> Here is a different approach:
> 
> xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> xn
> # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> 
> David L Carlson
> Professor Emeritus of Anthropology
> Texas A&M University
> 
> On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox using mcmaster.ca> wrote:
> 
> > Dear Jean-Louis,
> >
> > There must be many ways to do this. Here's one simple way (with no claim
> > of optimality!):
> >
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > >
> > > set.seed(123) # for reproducibility
> > > x <- sample(xc, 20, replace=TRUE) # "data"
> > >
> > > names(xn) <- xc
> > > z <- xn[x]
> > >
> > > data.frame(z, x)
> >      z  x
> > 1  2.5 2b
> > 2  2.5 2b
> > 3  1.5 1b
> > 4  2.3 2a
> > 5  1.5 1b
> > 6  1.3 1a
> > 7  1.3 1a
> > 8  2.3 2a
> > 9  1.5 1b
> > 10 2.0  2
> > 11 1.7 1c
> > 12 2.3 2a
> > 13 2.3 2a
> > 14 1.0  1
> > 15 1.3 1a
> > 16 1.5 1b
> > 17 2.7 2c
> > 18 2.0  2
> > 19 1.5 1b
> > 20 1.5 1b
> >
> > I hope this helps,
> >  John
> >
> >   -----------------------------
> >   John Fox, Professor Emeritus
> >   McMaster University
> >   Hamilton, Ontario, Canada
> >   Web: http::/socserv.mcmaster.ca/jfox
> >
> > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol using sent.com>
> > wrote:
> > >
> > > Dear All
> > >
> > > I have a character vector,  representing histology stages, such as for
> > example:
> > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > >
> > > and this goes on to 3, 3a etc in various order for each patient. I do
> > have of course a pre-established  classification available which does
> > change according to the histology criteria under assessment.
> > >
> > > I would want to convert xc, for plotting reasons, to a numeric vector
> > such as
> > >
> > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > >
> > > Unfortunately I have no clue on how to do that.
> > >
> > > Thanks for any help and apologies if I am missing the obvious way to do
> > it.
> > >
> > > JL
> > > --
> > > Verif30042020
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > PLEASE do read the posting guide
> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > PLEASE do read the posting guide
> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
>         [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list