[Rd] Holding a large number of SEXPs in C++

Simon Knapp sleepingwell at gmail.com
Mon Nov 3 04:55:36 CET 2014


Thanks Simon and sorry for taking so long to give this a go. I had thought
of pair lists but got confused about how to protect the top level object
only, as it seems that appending requires creating a new "top-level
object". The following example seems to work (full example at
https://gist.github.com/Sleepingwell/8588c5ee844ce0242d05). Is this the way
you would do it (or at least 'a correct' way)?



struct PolyHolder {
    PolyHolder(void) {
        PROTECT_WITH_INDEX(currentRegion = R_NilValue, &icr);
        PROTECT_WITH_INDEX(regions = R_NilValue, &ir);
    }

    ~PolyHolder(void) {
        UNPROTECT(2);
    }

    void notifyEndRegion(void) {
        REPROTECT(regions = CONS(makePolygonsFromPairList(currentRegion),
regions), ir);
        REPROTECT(currentRegion = R_NilValue, icr);
    }

    template<typename Iter>
    void addSubPolygon(Iter b, Iter e) {
        REPROTECT(currentRegion = CONS(makePolygon(b, e), currentRegion),
icr);
    }

    SEXP getPolygons(void) {
        return regions;
    }

private:
    PROTECT_INDEX
        ir,
        icr;

    SEXP
        currentRegion,
        regions;
};



Thanks again,
Simon Knapp



CONS(newPoly, creates a new object
On Sat, Oct 18, 2014 at 2:10 AM, Simon Urbanek <simon.urbanek at r-project.org>
wrote:

>
> On Oct 17, 2014, at 7:31 AM, Simon Knapp <sleepingwell at gmail.com> wrote:
>
> > Background:
> > I have an algorithm which produces a large number of small polygons (of
> the
> > spatial kind) which I would like to use within R using objects from sp. I
> > can't predict the exact number of polygons a-priori, the polygons will be
> > grouped into regions, and each region will be filled sequentially, so an
> > appropriate C++ 'framework' (for the point of illustration) might be:
> >
> > typedef std::pair<double, double> Point;
> > typedef std::vector<Point> Polygon;
> > typedef std::vector<Polygon> Polygons;
> > typedef std::vector<Polygons> Regions;
> >
> > struct Holder {
> >    void notifyNewRegion(void) const {
> >        regions.push_back(Polygons());
> >    }
> >
> >    template<typename Iter>
> >    void addSubPoly(Iter b, Iter e) {
> >        regions.back().push_back(Polygon(b, e));
> >    }
> >
> > private:
> >    Regions regions;
> > };
> >
> > where the reference_type of Iter is convertible to Point. In practice I
> use
> > pointers in a couple of places to avoid resizing in push_back becoming
> too
> > expensive.
> >
> > To construct the corresponding sp::Polygon, sp::Polygons and
> > sp::SpatialPolygons at the end of the algorithm, I iterate over the
> result
> > turning each Polygon into a two column matrix and calling the C functions
> > corresponding to the 'constructors' for these objects.
> >
> > This is all working fine, but I could cut my memory consumption in half
> if
> > I could construct the sp::Polygon objects in addSubPoly, and the
> > sp::Polygons objects in notifyNewRegion. My vector typedefs would then
> all
> > be:
> >
> > typedef std::vector<SEXP>
> >
> >
> >
> >
> > Question:
> > What I'm not sure about (and finally my question) is: I will have
> datasets
> > where I have more than 10,000 SEXPs in the Polygon and Polygons objects
> for
> > a single region, and possibly more than 10,000 regions, so how do I
> PROTECT
> > all those SEXPs (noting that the protection stack is limited to 10,000
> and
> > bearing in mind that I don't know how many there will be before I start)?
> >
> > I am also interested in this just out of general curiosity.
> >
> >
> >
> >
> > Thoughts:
> >
> > 1) I could create an environment and store the objects themselves in
> there
> > while keeping pointers in the vectors, but am not sure if this would be
> > that efficient (guidance would be appreciated), or
> >
> > 2) Just keep them in R vectors and grow these myself (as push_back is
> doing
> > for me in the above), but that sounds like a pain and I'm not sure if the
> > objects or just the pointers would be copied when I reassigned things
> > (guidance would be appreciated again). Bare in mind that I keep pointers
> in
> > the vectors, but omitted that for the sake of clarity.
> >
> >
> >
> >
> > Is there some other R type that would be suited to this, or a general
> > approach?
> >
>
> Lists in R (LISTSXP aka pairlists) are suited to appending (since that is
> fast and trivial) and sequential processing. The only issue is that
> pairlists are slow for random access. If you only want to load the polygons
> and finalize, then you can hold them in a pairlist and at the end copy to a
> generic vector (if random access is expected). DB applications typically
> use a hybrid approach -  allocate vector blocks and keep them in pairlists,
> but that's probably an overkill for your use (if you really cared about
> performance you wouldn't use sp objects for this ;))
>
> Note that you only have to protect the top-level object, so you don't need
> to protect the individual elements.
>
> Cheers,
> Simon
>
>
> > Cheers and thanks in advance,
> > Simon Knapp
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list