[R] Character (1a, 1b) to numeric

Sat Jul 11 10:52:35 CEST 2020

Agreed, I meant to add this line (for unclassed factor levels 1-through-8):

> ((1:8 - 1)*(0.25))+1
[1] 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75

Depending on the circumstance, you can also consider using dummy
factors or even "NA" as a level; see the "factor" help page for
details.

Best, Bill.

W. Michels, Ph.D.

On Sat, Jul 11, 2020 at 12:16 AM Jean-Louis Abitbol <abitbol using sent.com> wrote:
>
> Hello Bill,
>
> Thanks.
>
> That has indeed the advantage of keeping the histology classification on the  plot instead of some arbitrary numeric scale.
>
> Best wishes, JL
>
> On Sat, Jul 11, 2020, at 8:25 AM, William Michels wrote:
> > Hello Jean-Louis,
> >
> > Noting the subject line of your post I thought the first answer would
> > have been encoding histology stages as factors, and "unclass-ing" them
> > to obtain integers that then can be mathematically manipulated. You
> > can get a lot of work done with all the commands listed on the
> > "factor" help page:
> >
> > ?factor
> > samples <- 1:36
> > values <- runif(length(samples), min=1, max=length(samples))
> > hist <- rep(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), times=1:8)
> > data1 <- data.frame("samples" = samples, "values" = values, "hist" = hist )
> > (data1$hist <- factor(data1$hist, levels=c("1", "1a", "1b", "1c", "2",
> > "2a", "2b", "2c")) )
> > unclass(data1$hist)
> >
> > library(RColorBrewer); pal_1 <- brewer.pal(8, "Pastel2")
> > barplot(data1$value, beside=T, col=pal_1[data1$hist])
> > plot(data1$hist, data1$value, col=pal_1)
> > pal_2 <- brewer.pal(8, "Dark2")
> > plot(unclass(data1$hist)/4, data1$value, pch=19, col=pal_2[data1$hist] )
> > group <- c(rep(0,10),rep(1,26)); data1$group <- group
> > library(lattice); dotplot(hist ~ values | group, data=data1, xlim=c(0,36) )
> >
> > HTH, Bill.
> >
> > W. Michels, Ph.D.
> >
> >
> >
> >
> > On Fri, Jul 10, 2020 at 1:41 PM Jean-Louis Abitbol <abitbol using sent.com> wrote:
> > >
> > > Many thanks to all. This help-list is wonderful.
> > >
> > > I have used Rich Heiberger solution using match and found something to learn in each answer.
> > >
> > > off topic, I also enjoyed very much his 2008 paper on the graphical presentation of safety data....
> > >
> > > Best wishes.
> > >
> > >
> > > On Fri, Jul 10, 2020, at 10:02 PM, Fox, John wrote:
> > > > Hi,
> > > >
> > > > We've had several solutions, and I was curious about their relative
> > > > efficiency. Here's a test with a moderately large data vector:
> > > >
> > > > > library("microbenchmark")
> > > > > set.seed(123) # for reproducibility
> > > > > x <- sample(xc, 1e4, replace=TRUE) # "data"
> > > > > microbenchmark(John = John <- xn[x],
> > > > +                Rich = Rich <- xn[match(x, xc)],
> > > > +                Jeff = Jeff <- {
> > > > +                 n <- as.integer( sub( "[a-i]$", "", x ) )
> > > > +                 d <- match( sub( "^\\d+", "", x ), letters[1:9] )
> > > > +                 d[ is.na( d ) ] <- 0
> > > > +                 n + d / 10
> > > > +                 },
> > > > +                David = David <- as.numeric(gsub("a", ".3",
> > > > +                                      gsub("b", ".5",
> > > > +                                           gsub("c", ".7", x)))),
> > > > +                times=1000L
> > > > +                )
> > > > Unit: microseconds
> > > >   expr       min        lq       mean     median         uq       max neval cld
> > > >   John   228.816   345.371   513.5614   503.5965   533.0635  10829.08  1000 a
> > > >   Rich   217.395   343.035   534.2074   489.0075   518.3260  15388.96  1000 a
> > > >   Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94  1000  b
> > > >  David 14256.673 18148.492 20185.7156 20170.3635 22067.6690  34998.95  1000   c
> > > > > all.equal(John, Rich)
> > > > [1] TRUE
> > > > > all.equal(John, David)
> > > > [1] "names for target but not for current"
> > > > > all.equal(John, Jeff)
> > > > [1] "names for target but not for current" "Mean relative difference:
> > > > 0.1498243"
> > > >
> > > > Of course, efficiency isn't the only consideration, and aesthetically
> > > > (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH,
> > > > Jeff's solution is more general in that it generates the correspondence
> > > > between letters and numbers. The argument for Jeff's solution would,
> > > > however, be stronger if it gave the desired answer.
> > > >
> > > > Best,
> > > >  John
> > > >
> > > > > On Jul 10, 2020, at 3:28 PM, David Carlson <dcarlson using tamu.edu> wrote:
> > > > >
> > > > > Here is a different approach:
> > > > >
> > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc))))
> > > > > xn
> > > > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7
> > > > >
> > > > > David L Carlson
> > > > > Professor Emeritus of Anthropology
> > > > > Texas A&M University
> > > > >
> > > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox using mcmaster.ca> wrote:
> > > > > Dear Jean-Louis,
> > > > >
> > > > > There must be many ways to do this. Here's one simple way (with no claim of optimality!):
> > > > >
> > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > > > >
> > > > > > set.seed(123) # for reproducibility
> > > > > > x <- sample(xc, 20, replace=TRUE) # "data"
> > > > > >
> > > > > > names(xn) <- xc
> > > > > > z <- xn[x]
> > > > > >
> > > > > > data.frame(z, x)
> > > > >      z  x
> > > > > 1  2.5 2b
> > > > > 2  2.5 2b
> > > > > 3  1.5 1b
> > > > > 4  2.3 2a
> > > > > 5  1.5 1b
> > > > > 6  1.3 1a
> > > > > 7  1.3 1a
> > > > > 8  2.3 2a
> > > > > 9  1.5 1b
> > > > > 10 2.0  2
> > > > > 11 1.7 1c
> > > > > 12 2.3 2a
> > > > > 13 2.3 2a
> > > > > 14 1.0  1
> > > > > 15 1.3 1a
> > > > > 16 1.5 1b
> > > > > 17 2.7 2c
> > > > > 18 2.0  2
> > > > > 19 1.5 1b
> > > > > 20 1.5 1b
> > > > >
> > > > > I hope this helps,
> > > > >  John
> > > > >
> > > > >   -----------------------------
> > > > >   John Fox, Professor Emeritus
> > > > >   McMaster University
> > > > >   Hamilton, Ontario, Canada
> > > > >   Web: http::/socserv.mcmaster.ca/jfox
> > > > >
> > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol using sent.com> wrote:
> > > > > >
> > > > > > Dear All
> > > > > >
> > > > > > I have a character vector,  representing histology stages, such as for example:
> > > > > > xc <-  c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c")
> > > > > >
> > > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established  classification available which does change according to the histology criteria under assessment.
> > > > > >
> > > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as
> > > > > >
> > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7)
> > > > > >
> > > > > > Unfortunately I have no clue on how to do that.
> > > > > >
> > > > > > Thanks for any help and apologies if I am missing the obvious way to do it.
> > > > > >
> > > > > > JL
> > > > > > --
> > > > > > Verif30042020
> > > > > >
> > > > > > ______________________________________________
> > > > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > > > and provide commented, minimal, self-contained, reproducible code.
> > > > >
> > > > > ______________________________________________
> > > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$
> > > > > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > > >
> > >
> > > --
> > > Verif30042020
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Verif30042020