[Rd] 'identical' and the warning "ignoring non-pairlist attributes"

Sun Dec 12 10:49:58 CET 2010

On Dec 12, 2010, at 07:49 , Niels Richard Hansen wrote:

> Peter, thanks for looking into this. I know very little about the R
> implementation at the level you are talking about. For the record
> it is pretty easy to avoid the warning by checking and _not_
> doing the inefficient subsetting of an empty data frame ...
> 
> - Niels

It's not something that you'd be expected to know about, but this is r-devel and sometimes we think aloud, hoping that it rings a bell with some other reader...

I don't think it is the empty data frame per se that tickles the bug. Rather, it is an issue of having generated so much activity creating character constants that there are nontrivial hash-chains plus maybe the fact that you are using identical() to compare language-level objects. The attributes of the  were 

(gdb) p Rf_PrintValue(ax)
<CHARSXP: "NA.43436">

(gdb) p Rf_PrintValue(ay)
<CHARSXP: "NA.64694">

and the (sub-) objects that were being compared at the time were

(gdb) p Rf_PrintValue(y)
<CHARSXP: "...">

(gdb) p Rf_PrintValue(x)
<CHARSXP: "a">

"a" and "..." are from argument lists of the functions that you are comparing, and I would assume that the "NA.43436" and "NA.64694" come from  rownames of the million-row data frame that you were creating.

(For the uninitiated: a hash table is used for by-name lookup. It works by computing a numerical "hash-index" based on the name, hoping to replace a linear search by a simple indexed lookup. If two or more names have the same hash index, a final linear search through a chained list of names is necessary.) 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com