> See, we just jave different expectations of what is to be seen in the
> help system, and are used to different formats. Yes, Stata thinks of
> data as a rectangular array (although it stores it in memory, unlike
> SAS). The inputs to -egen-, as well as the values produced, depend on
> the particular function -fcn- and are described in subsections on
> those individual functions. That is mentioned at the top of the page.
> There is a pretty much standard syntax of most Stata commands (command
> name followed by variables it is applied to or expression to be
> computed followed by if conditions on observations followed by comma
> options ), and -egen- more or less satisfies that syntax. A Stata user
> equipped with the basic concepts of the assignment command -generate-
> (which -egen- is said to extend) and variable lists (-varlist- here
> and there in the help file) would be able to make sense of this all.
> I would rather translate R's ave() to Stata's -by- expression. Not all
> of the -egen- functionality can be implemented via ave().

R has a by function which is a convenience wrapper for tapply. It will  
not necessarily produce an object with the same number of rows as the  
input, which is what I thought that egen was doing.

> Looks like terseness is a prerequisite to doing anything in R though.
> If I am telling you I am a newbie, the book abbreviations although
> standard to everybody on this list may not mean much to me. I could
> figure out "Regression Modeling Strategies" (although I was not
> thinking about it as a book on R -- I probably did not read it far
> enough :) ), and V&R is Venables & Ripley. Right?

Yes, and Chambers and Hastie wrote "Statistical Models in S".

The VR bundle is the way to get the MASS package (and IIRC three  

The documentation and contributed pages are here:

Harrell probably does not think of RMS as an R book either.

>> Terse is OK by me as long as I get told what goes in (allowable  
>> data types,
>> argument names and effects) and what comes out. What seemed to be  
>> lacking in
>> that Stata doc for egen was a description of the purpose or  
>> behavior and
>> then could find no description of the values produced. Perhaps it  
>> is because
>> Stata has an approach that everything is a rectangular array? Is  
>> everything
>> assumed to create a new column of data as in SAS?
>> At any rate it looked to this casual non-user, reading that  
>> document, that
>> egen creates a new variable aligned with its argument variables by  
>> applying
>> various functions within groupings. That is pretty much what ave  
>> does. "ave"
>> is not restricted to mean as a functional argument. As I said it  
>> was a
>> guess.
>> The texts I used to get up to speed in R are several downloaded  
>> from the
>> Contributed documents (including anything written by Venables), V&R  
>> MASS v
>> 2, Harrell's RMS, Sarkar's Lattice, Chambers&Hastie SMiS and  
>> reading a lot
>> of Q&A on this list.
>>> http://www.stata.com/help.cgi?egen -- it creates new
>> variables dealing
>>> with some special relatively non-standard tasks that don't boil down
>>> to a one-line arithmetic expressions. For that reason, there will be
>>> no equivalent to -egen- in general, as it has so many functions that
>>> are so different. -rowtotal- is of course just a shorthand for  
>>> sum(),
>>> except for treatment of missing values ( ifelse(is.na(x),0,x ). But
>>> -anycount- is a moderately complicated double cycle over variables  
>>> and
>>> list of values (40 lines of underlying Stata code, including parsing
>>> and labeling the resulting variables)... which will probably  
>>> become a
>>> triple R cycle including the cycle over observations, although the
>>> latter can probably be avoided.
>>> Yes, R documentation looks exteremely terse to me as a regular Stata
>>> user. I am used to seeing the concpets explained well, even in the
>>> help files, and certainly more so in the shelved books. As every
>>> option and every part of the syntax is devoted at least three to  
>>> five
>>> sentences, and the most common uses are exemplified, I can usually
>>> figure out how to run a particular task relatively quickly. (The  
>>> data
>>> management tricks, which is what Peter was asking about above, are
>>> probably an exception: you either know them, or you don't. In this
>>> example, I don't know the corresponding R tricks, although I can
>>> probably brute force the solution if I needed to.) The fraction of
>>> commands in R that I personally have been coming across that are
>>> comparably well documented is about a quarter. For other, it is  
>>> either
>>> a guesswork+CRANning+googling around or "Forget it, I'll just go  
>>> back
>>> to Stata to do it" after a few futile attempts. May be I just don't
>>> know where to look for the good stuff, but it is certainly outside R
>>> as a package+its documentation.
