[R] subsetting like in SAS

Mon Jan 17 22:01:22 CET 2005

I want to thank Petr Pikal, Robert Balshaw and Na Li for suggesting the 
use of "unique" or "!duplicated" on a subset of my data where unwanted 
variables have been removed. This worked perfectly.

Denis Chabot
On 13 Jan 2005 at 11:52, Denis Chabot wrote:

> Hi,
>
> Being in the process of translating some of my SAS programs to R, I
> encountered one difficulty. I have a solution, but it is not elegant
> (and not pleasant to implement).
>
> I have a large dataset with many variables needed to identify the
> origin of a sample, many to describe sample characteristics, others to
> describe site characteristics.
>
> I want only a (shorter) list of sites and their characteristics.
>
> If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to
> identify a site, in SAS you'd sort on those variables, then read the
> data with:
>
> data sites;
>  set alldata;
>  by origin ship_cat ship_nb trip set;
>  if first.set;
>  keep list-of-variables-detailing-sites;
> run;
>
> In R I did this with the Lag function of Hmisc, and the original data
> set also needs to be sorted first:
>
> oL <- Lag(origin)
> scL <- Lag(ship_cat)
> snL <- Lag(ship_nb)
> tL <- Lag(trip)
> sL <- Lag(set)
> same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL
> sites <- subset(alldata, !same,
> select=c(list-of-variables-detailing-sites)
>
> Could I do better than this?