[R] two questions for R beginners

Duncan Murdoch murdoch at stats.uwo.ca
Tue Mar 2 18:55:21 CET 2010


On 02/03/2010 11:53 AM, William Dunlap wrote:
> > -----Original Message-----
> > From: r-help-bounces at r-project.org 
> > [mailto:r-help-bounces at r-project.org] On Behalf Of John Sorkin
> > Sent: Tuesday, March 02, 2010 3:46 AM
> > To: Karl Ove Hufthammer; r-help at stat.math.ethz.ch
> > Subject: Re: [R] two questions for R beginners
> > 
> > Please take what follows not as an ad hominem statement, but 
> > rather as an attempt to improve what is already an excellent 
> > program, that has been built as a result of many, many hours 
> > of dedicated work by many, many unpaid, unsung volunteers.
> > 
> > It troubles me a bit that when a confusing aspect of R is 
> > pointed out the response is not to try to improve the 
> > language so as to avoid the confusion, but rather to state 
> > that the confusion is inherent in the language. I understand 
> > that to make changes that would avoid the confusing aspect of 
> > the language that has been discussed in this thread would 
> > take time and effort by an R wizard (which I am not), time 
> > and effort that would not be compensated in the traditional 
> > sense. This does not mean that we should not acknowledge the 
> > confusion. If we what R to be the de facto lingua franca of 
> > statistical analysis doesn't it make sense to strive for 
> > syntax that is as straight forward and consistent as possible? 
>
> Whenever one changes the language that way old code
> will break. 
I think in this case not much code would break.  Mostly when people have 
a matrix M and ask for M$column they'll get an error; the proposal is 
that they'll get the requested column.  (It is possible to have a list 
with names that is also a matrix with dimnames, but I think that is a 
pretty unusual construction.)  But I haven't been convinced that the 
proposal is a net improvement to the language. 

Duncan Murdoch

>  The developers can, with a lot of effort,
> fix their own code, and perhaps even user-written code
> on CRAN, but code that thousands of users have written
> will break.  There is a lot of code out there that was
> written by trial and error and by folks who no longer
> work at an institution: the code works but no one knows
> exactly why it works.  Telling folks they need to change
> that code because we have a cleaner but different syntax
> now is not good.  Why would one spend time writing a
> package that might stop working when R is "upgraded"?
>
> I think the solution is not to change current semantics
> but to write functions that behave better and encourage
> users to use them, gradually abandoning the old constructs.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com 
>
> > 
> > Again, please understand that my comment is made with deepest 
> > respect for the many people who have unselfishly contributed 
> > to the R project. Many thanks to each and every one of you.
> > 
> > John
> > 
> > 
> > >>> Karl Ove Hufthammer <karl at huftis.org> 3/2/2010 4:00 AM >>>
> > On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch 
> > <murdoch at stats.uwo.ca> 
> > wrote:
> > > Suppose X is a dataframe or a matrix.  What would you 
> > expect to get from 
> > > X[1]?  What about as.vector(X), or as.numeric(X)?
> > 
> > All this of course depends on type of object one is speaking 
> > of. There 
> > are plenty of surprises available, and it's best to use the 
> > most logical 
> > way of extracting. E.g., to extract the top-left element of a 2D 
> > structure (data frame or matrix), use 'X[1,1]'.
> > 
> > Luckily, R provides some shortcuts. For example, you can 
> > write 'X[2,3]' 
> > on a data frame, just as if it was a matrix, even though the 
> > underlying 
> > structure is completely different. (This doesn't work on a 
> > normal list; 
> > there you have to type the whole 'X[[2]][3]'.)
> > 
> > The behaviour of the 'as.' functions may sometimes be surprising, at 
> > least for me. For example, 'as.data.frame' on a named vector gives a 
> > single-column data frame, instead of a single-row data frame.
> > 
> > (I'm not sure what's the recommended way of converting a 
> > named vector to 
> > row data frame, but 'as.data.frame(t(X))' works, even though both 'X' 
> > and 't(X)' looks like a row of numbers.)
> > 
> > > The point is that a dataframe is a list, and a matrix 
> > isn't.  If users 
> > > don't understand that, then they'll be confused somewhere.  Making 
> > > matrices more list-like in one respect will just move the confusion 
> > > elsewhere.  The solution is to understand the difference.
> > 
> > My main problem is not understanding the difference, which is 
> > easy, but 
> > knowing which type of I have when I get the output a function in a 
> > package. If I know the object is a named vector or a matrix 
> > with column 
> > names, it's easy enough to type 'X[,"colname"]', and if it's a data 
> > frame one may use the shortcut 'X$colname'.
> > 
> > Usually, it *is* documented what the return value of a 
> > function is, but 
> > just looking at the output is much faster, and *usually* gives the 
> > correct answer.
> > 
> > For example, 'mean' applied on a data frame gives a named 
> > vector, not a 
> > data frame, which is somewhat surprising (given that the columns of a 
> > data frame may be of different types, while the elements of a 
> > vector may 
> > not). (And yes, I know that it's *documented* that it returns a named 
> > vector.) On the other hand, perhaps it is surprising that 
> > 'mean' works 
> > on data frames at all. :-)
> > 
> > -- 
> > Karl Ove Hufthammer
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html 
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> > Confidentiality Statement:
> > This email message, including any attachments, is for 
> > th...{{dropped:6}}
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list