[R] Partition vector of strings into lines of preferred width

Andrew Simmons @kw@|mmo @end|ng |rom gm@||@com
Fri Oct 28 23:51:15 CEST 2022


I would suggest using strwrap(), the documentation at ?strwrap has
plenty of details and examples.
For paragraphs, I would usually do something like:

strwrap(x = , width = 80, indent = 4)

On Fri, Oct 28, 2022 at 5:42 PM Leonard Mada via R-help
<r-help using r-project.org> wrote:
>
> Dear R-Users,
>
> text = "
> What is the best way to split/cut a vector of strings into lines of
> preferred width?
> I have come up with a simple solution, albeit naive, as it involves many
> arithmetic divisions.
> I have an alternative idea which avoids this problem.
> But I may miss some existing functionality!"
>
> # Long vector of strings:
> str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]];
> lenWords = nchar(str);
>
> # simple, but naive solution:
> # - it involves many divisions;
> cut.character.int = function(n, w) {
>      ncm = cumsum(n);
>      nwd = ncm %/% w;
>      count = rle(nwd)$lengths;
>      pos = cumsum(count);
>      posS = pos[ - length(pos)] + 1;
>      posS = c(1, posS);
>      pos = rbind(posS, pos);
>      return(pos);
> }
>
> npos = cut.character.int(lenWords, w=30);
> # lets print the results;
> for(id in seq(ncol(npos))) {
>     len = npos[2, id] - npos[1, id];
>     cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n"));
> }
>
>
> The first solution performs an arithmetic division on all string
> lengths. It is possible to find out the total length and divide only the
> last element of the cumsum. Something like this should work (although it
> is not properly tested).
>
>
> w = 30;
> cumlen = cumsum(lenWords);
> max = tail(cumlen, 1) %/% w + 1;
> pos = cut(cumlen, seq(0, max) * w);
> count = rle(as.numeric(pos))$lengths;
> # everything else is the same;
> pos = cumsum(count);
> posS = pos[ - length(pos)] + 1;
> posS = c(1, posS);
> pos = rbind(posS, pos);
>
> npos = pos; # then print
>
>
> The cut() may be optimized as well, as the cumsum is sorted ascending. I
> did not evaluate the efficiency of the code either.
>
> But do I miss some existing functionality?
>
>
> Note:
>
> - technically, the cut() function should probably return a vector of
> indices (something like: rep(seq_along(count), count)), but it was more
> practical to have both the start and end positions.
>
>
> Many thanks,
>
>
> Leonard
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list