[R] How to convert category (or range/group) into continuous?

Leonard Mada |eo@m@d@ @end|ng |rom @yon|c@eu
Thu Jan 20 19:44:38 CET 2022


Dear Marna,


I have revisited your initial mail and I am still unsure what your true 
statistical intention was. Unfortunately, you did not provide any 
feedback, if any of the solutions helped.


Looking at the data more carefully, I see that you try to plot the 
cumulative frequency.


There is an easy way to do this in R.


# generate some continuous data
x = runif(1000, 0, 2.5);
# lets partially discretize it:
x = round(x, 2);

# Cumulative frequency
x.cum = ecdf(x);
plot(x.cum, do.points=FALSE, lwd=2, xlim=c(0, 3));

# adding some horizontal lines
# using function from previous mail:
daT$group = as.factor(daT$group);
v = mid.factor(daT$group);
# this seems to be on a logarithmic scale:
# [Note: a geometric mean may have been more appropriate]
abline(v=v$mid, col="red");


I hope this helps,


Leonard


On 1/19/2022 4:39 AM, Leonard Mada wrote:
> Dear Marna,
>
>
> If you want to extract the middle of those intervals, please find 
> below an improved variant of Rui's code.
[edit: corrected name]
>
>
> Note:
> - it is more efficient to process the levels of a factor, instead of 
> all the individual strings;
> - I envision that there are benefits in a large data frame (> 1 
> million rows) - although I have not explicitly checked it;
> - the code also handles better the open/closed intervals;
> - the returned data structure may require some tweaking (currently 
> returns a data.frame);
>
>
>
> ### Middle of an Interval
> mid.factor = function(x, inf.to = NULL, split.str=",") {
>     lvl0 = levels(x); lvl = lvl0;
>     lvl = sub("^[(\\[]", "", lvl);
>     lvl = sub("[])]$", "", lvl); # tricky;
>     lvl = strsplit(lvl, split.str);
>     lvl = lapply(lvl, function(x) as.numeric(x));
>     if( ! is.null(inf.to)) {
>         FUN = function(x) {
>             if(any(x == Inf)) 1
>             else if(any(x == - Inf)) -1
>             else 0;
>         }
>         whatInf = sapply(lvl, FUN);
>         # TODO: more advanced;
>         lvl[whatInf == -1] = inf.to[1];
>         lvl[whatInf ==  1] = inf.to[2];
>     }
>     mid = sapply(lvl, mean);
>     lvl = data.frame(lvl=lvl0, mid=mid);
>     merge(data.frame(lvl=x), lvl, by="lvl");
> }
>
>
> # uses the daT data frame;
> # requires a factor:
> # - this is probably the case with the original data;
> daT$group = as.factor(daT$group);
> mid.factor(daT$group);
>
>
> I have uploaded this code also on my GitHub list of useful data tools:
>
> https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R
>
>
> Sincerely,
>
>
> Leonard
>
>



More information about the R-help mailing list