[Rd] Suggestion: Dimension-sensitive attributes

Bengoechea Bartolomé Enrique (SIES 73) enrique.bengoechea at credit-suisse.com
Thu Jul 9 11:14:12 CEST 2009


> If "objattr", "dimattr" and "cellattr" are lists, they would offer save places for all attributes that should be kept on subsetting. 

My proposed design would be that:

	* "objattr" would be a list of attributes (just preserved on subsetting)
	* "dimattr" would be a list with as many elements as array dimensions. Each element can be any object whose length matches the corresponding array dimension's length and that can be itself subsetted with "[": so it could be a vector, a list, a data frame...
	* "cellattr" would be any object whose dimensions match the array dimensions: another array, a data frame...

> In my view this would be very useful, because that way a general solution for data description, like variabel names, variable labels, units, ... could be reached.

Indeed, that's the objective: attaching user-defined metadata that is automatically synchronized with subsetting operations to the actual data.

I've had dozens of use cases on my own R programs that needed this type of pattern, and seen it implemented in different ways in several classes (xts, timeSeries, AnnotatedDataFrame, etc.) As you point, this could offer a unified design for a common need.

Enrique

-----Original Message-----
From: Heinz Tuechler [mailto:tuechler at gmx.at] 
Sent: jueves, 09 de julio de 2009 10:56
To: Bengoechea Bartolomé Enrique (SIES 73); Tony Plate; r-devel at r-project.org
Cc: Henrik Bengtsson
Subject: Re: [Rd] Suggestion: Dimension-sensitive attributes

At 10:01 09.07.2009, SIES 73 wrote:
>I've also had several use cases where I needed "cell-like" attributes, 
>that is, attributes that have the same dimensions as the original array 
>and are subsetted in the same way --along all its dimensions.
>
>So we're talking about a way to add metadata to matrices/arrays at 3 
>possible levels:
>
>         1) at the "whole object" level: 
> attributes that are not dropped on subsetting
>         2) at the "dimension" level: attributes that behave like 
> "dimnames", i.e. subsetted along each dimension
>         3) at the "cell" level: attributes that are subsetted in the 
> same way as the original array
>
>My proposal would be simpler that Tony's
>suggestion: like "dimnames", just have reserved attribute names for 
>each case, say "objdata", "dimdata", and "celldata" (or "objattr", 
>"dimattr" and "cellattr").

If "objattr", "dimattr" and "cellattr" are lists, they would offer save places for all attributes that should be kept on subsetting. In my view this would be very useful, because that way a general solution for data description, like variabel names, variable labels, units, ... could be reached.


>On the other hand, Tony's pattern would allow as many attributes of 
>each type as necessary (some multiplicity is already possible with the 
>simpler design as dimdata or celldata could be lists of lists), at the 
>cost of a more complex scheme of attributes that needs to be "parsed" 
>each time.
>
>On Tony's suggestion, "attr.keep.on.subset" and "attr.dimname.like" 
>(and possible
>"attr.cell.like") could be kept on a single list with 3 elements, 
>something like:
>
> > attr(x, "attr.subset.with") <- list(object=..., dims=..., cells=...)
>
>Would something like this make sense for R-core --either for standard 
>arrays or as a new class-- or would it be better implemented in a 
>package?
>
>Enrique
>



More information about the R-devel mailing list