[Rd] R.sh and argument escaping

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Tue Mar 16 18:32:28 CET 2021


Hello R-devel!

The following sequence of commands results in an error message on a
POSIX system:

tab="`echo -ne "\t"`"
LC_ALL=C Rscript -e " $tab 1"
# ARGUMENT '~+~1' __ignored__

Tabs can sneak into the -e argument from indented multi-line arguments
in shell scripts:

Rscript -e '
	foo()
	bar()
	...
'

R.sh does a good job of escaping spaces and newlines, but since shells
are also supposed to split on a tab [*], it's a good idea to escape
tabs too:

Index: src/scripts/R.sh.in
===================================================================
--- src/scripts/R.sh.in	(revision 80090)
+++ src/scripts/R.sh.in	(working copy)
@@ -192,7 +192,7 @@
     -e)
       if test -n "`echo ${2} | ${SED} 's/^-.*//'`"; then
 	a=`(echo "${2}" && echo) | ${SED} -e 's/ /~+~/g' | \
-          ${SED} -e :a -e N -e '$!ba' -e 's/\n/~n~/g' -e 's/~n~$//g'`
+          ${SED} -e :a -e N -e '$!ba' -e 's/\n/~n~/g' -e 's/~n~$//g' -e 's/\t/~t~/g'`
         shift
       else
 	error "option '${1}' requires a non-empty argument"
Index: src/unix/system.c
===================================================================
--- src/unix/system.c	(revision 80090)
+++ src/unix/system.c	(working copy)
@@ -170,6 +170,9 @@
 	} else if(*q == '~' && *(q+1) == 'n' && *(q+2) == '~') {
 	    q += 2;
 	    *p++ = '\n';
+	} else if(*q == '~' && *(q+1) == 't' && *(q+2) == '~') {
+	    q += 2;
+	    *p++ = '\t';
 	} else *p++ = *q;
     }
     return p;

I have verified that with the patch above, Rscript -e " $tab 1" no
longer fails.

While we're at it, perhaps it could be a good idea to replace the magic
number 10000 with a the size of the character array above it:

Index: src/unix/system.c
===================================================================
--- src/unix/system.c	(revision 80090)
+++ src/unix/system.c	(working copy)
@@ -429,7 +432,7 @@
 	    } else if(!strcmp(*av, "-e")) {
 		ac--; av++;
 		Rp->R_Interactive = FALSE;
-		if(strlen(cmdlines) + strlen(*av) + 2 <= 10000) {
+		if(strlen(cmdlines) + strlen(*av) + 2 <= sizeof(cmdlines)) {
 		    char *p = cmdlines+strlen(cmdlines);
 		    p = unescape_arg(p, *av);
 		    *p++ = '\n'; *p = '\0';

It might also be a good idea to make it possible to represent the escape
sequences themselves in the unescaped stream in a fully reversible
transformation ('~' <-> '~~~', ' ' <-> '~+~', '\n' <-> '~n~',
'\t' <-> '~t~'), making it possible to round-trip character sequences
like '~+~' through the escaping and unescaping process (thankfully,
'~+~' is not frequently needed in R programs), though expressing that
as a sed command is beyond me. Right now, Rscript -e '"~+~"' doesn't
print "~+~".

Perhaps the bigger question to ask is whether this escaping is
unavoidable. Is it documented? Since the args variable is only appended
(not prepended), it is likely possible to rewrite the
'while test -n "${1}"; do' loop in terms of 'set -- "$@" ...', which is
POSIX-compatible and doesn't require any escaping:

set -- "${@}" dummy # append one argument to skip it later

for arg in "${@}"; do # it's safe to modify $@ in the for loop [**]
  # TODO: on first iteration only, empty the $@ and don't check $prev_arg
  case "${prev_arg}" in
# ...
    -g|--gui)
      if test -n "`echo "${arg}" | ${SED} 's/^-.*//'`"; then
        gui="${arg}"
        set -- "${@}" "${prev_arg}" "${arg}"
      else
        error "option '${prev_arg}' requires an argument"
      fi
      ;;
# ...
    -e)
      if ! test -n "`echo "${arg}" | ${SED} 's/^-.*//'`"; then
        error "option '${prev_arg}' requires a non-empty argument"
      fi
      set -- "${@}" -e "${arg}"
      ;;
# ...
  esac
  prev_arg="${arg}"
  # no shift needed
done

# Later: use "${@}" instead of ${args}

Or is it documented behaviour that arguments following an empty
argument are not escaped by the shell script but are passed to
"${R_HOME}/bin/exec${R_ARCH}/R"?

LC_ALL=C R -q -e " $tab 1"
# ARGUMENT '~+~1' __ignored__
# 
# >
# >
# >
LC_ALL=C R '' -q -e " $tab 1"
# ARGUMENT '' __ignored__
# 
# > 	1
# [1] 1
# >
# >


-- 
Best regards,
Ivan

[*]
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

[**]
"First, the list of words following in shall be expanded to generate a
list of items..."
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_03



More information about the R-devel mailing list