[Rd] Holding a large number of SEXPs in C++

Simon Urbanek simon.urbanek at r-project.org
Fri Oct 17 17:10:38 CEST 2014


On Oct 17, 2014, at 7:31 AM, Simon Knapp <sleepingwell at gmail.com> wrote:

> Background:
> I have an algorithm which produces a large number of small polygons (of the
> spatial kind) which I would like to use within R using objects from sp. I
> can't predict the exact number of polygons a-priori, the polygons will be
> grouped into regions, and each region will be filled sequentially, so an
> appropriate C++ 'framework' (for the point of illustration) might be:
> 
> typedef std::pair<double, double> Point;
> typedef std::vector<Point> Polygon;
> typedef std::vector<Polygon> Polygons;
> typedef std::vector<Polygons> Regions;
> 
> struct Holder {
>    void notifyNewRegion(void) const {
>        regions.push_back(Polygons());
>    }
> 
>    template<typename Iter>
>    void addSubPoly(Iter b, Iter e) {
>        regions.back().push_back(Polygon(b, e));
>    }
> 
> private:
>    Regions regions;
> };
> 
> where the reference_type of Iter is convertible to Point. In practice I use
> pointers in a couple of places to avoid resizing in push_back becoming too
> expensive.
> 
> To construct the corresponding sp::Polygon, sp::Polygons and
> sp::SpatialPolygons at the end of the algorithm, I iterate over the result
> turning each Polygon into a two column matrix and calling the C functions
> corresponding to the 'constructors' for these objects.
> 
> This is all working fine, but I could cut my memory consumption in half if
> I could construct the sp::Polygon objects in addSubPoly, and the
> sp::Polygons objects in notifyNewRegion. My vector typedefs would then all
> be:
> 
> typedef std::vector<SEXP>
> 
> 
> 
> 
> Question:
> What I'm not sure about (and finally my question) is: I will have datasets
> where I have more than 10,000 SEXPs in the Polygon and Polygons objects for
> a single region, and possibly more than 10,000 regions, so how do I PROTECT
> all those SEXPs (noting that the protection stack is limited to 10,000 and
> bearing in mind that I don't know how many there will be before I start)?
> 
> I am also interested in this just out of general curiosity.
> 
> 
> 
> 
> Thoughts:
> 
> 1) I could create an environment and store the objects themselves in there
> while keeping pointers in the vectors, but am not sure if this would be
> that efficient (guidance would be appreciated), or
> 
> 2) Just keep them in R vectors and grow these myself (as push_back is doing
> for me in the above), but that sounds like a pain and I'm not sure if the
> objects or just the pointers would be copied when I reassigned things
> (guidance would be appreciated again). Bare in mind that I keep pointers in
> the vectors, but omitted that for the sake of clarity.
> 
> 
> 
> 
> Is there some other R type that would be suited to this, or a general
> approach?
> 

Lists in R (LISTSXP aka pairlists) are suited to appending (since that is fast and trivial) and sequential processing. The only issue is that pairlists are slow for random access. If you only want to load the polygons and finalize, then you can hold them in a pairlist and at the end copy to a generic vector (if random access is expected). DB applications typically use a hybrid approach -  allocate vector blocks and keep them in pairlists, but that's probably an overkill for your use (if you really cared about performance you wouldn't use sp objects for this ;))

Note that you only have to protect the top-level object, so you don't need to protect the individual elements.

Cheers,
Simon


> Cheers and thanks in advance,
> Simon Knapp
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list