[R] Inefficiency of SAS Programming

Fri Feb 27 12:48:01 CET 2009

2009/2/27 Peter Dalgaard <p.dalgaard at biostat.ku.dk>:

> Presumably, something like
>
>     IF &N. =  1 THEN SUB_N = 1;
>     ELSE IF &N. < 5 THEN SUB_N = &N.-1;
>     ELSE IF &N. < 16 THEN SUB_N = &N.-2;
>     ELSE SUB_N = &N.-3;
>
> would work, provided that 2, 5, 16 are impossible values. Problem is that it
> actually makes the code harder to grasp, so experienced SAS programmers go
> for the dumb but readable code like the above.

 I'm not sure which is easier to grasp. When I first saw the original
version I thought it was an odd way of doing "SUB_N = &N.". Only then
did I have a closer look and spot the missing 2, 5, and 16. A comment
would have been very enlightening. But there was nothing relevant.

> In R, the cleanest I can think of is
>
> subn <- match(n, setdiff(1:19, c(2,5,16)))
>
> or maybe just
>
> subn <- match(n, c(1, 3:4, 6:15, 17:19))
>
> although
>
> subn <- factor(n, levels = c(1, 3:4, 6:15, 17:19))
>
> might be what is really wanted

 I think the important thing with any programming is to make sure what
you want is expressed in words somewhere. If not in the code, then in
the comments. And operations like this should be abstracted into
functions.

  All the examples of SAS code I've seen seem to fall into the old
practices of writing great long 'scripts', with minimal code-reuse and
encapsulation of useful functionality. If these SAS scripts are then
given to new SAS programmers then the chances are they will follow
these bad practices. Show them well-written R code (or C, or Python)
and maybe they can implement those good practices into their SAS work.
Assuming SAS can do that. I'm not sure.

Barry