[Rd] C function with unknown output length

Herve Pages hpages at fhcrc.org
Wed Jun 6 21:20:31 CEST 2007

Vincent Goulet wrote:
> Hi all,
> Could anyone point me to one or more examples in the R sources of a C  
> function that is called without knowing in advance what will be the  
> length (say) of the output vector?
> To make myself clearer, we have a C function that computes  
> probabilities until their sum gets "close enough" to 1. Hence, the  
> number of probabilities is not known in advance.

Hi Vincent,

Let's say you want to write a function get_matches(const char * pattern, const char * x)
that will find all the occurrences of string 'pattern' in string 'x' and "return"
their positions in the form of an array of integers.
Of course you don't know in advance how many occurrences you're going to find.

One possible strategy is to:

  - Add an extra arg to 'get_matches' for storing the positions and make
    'get_matches' return the number of matches (i.e. the length of *pos):

      int get_matches(int **pos_ptr, const char * pattern, const char * x)

    Note that pos_ptr is a pointer to an int pointer.

  - In get_matches(): use a local array of ints and start with an arbitrary
    initial size for it:

      int get_matches(...)
        int *tmp_pos, tmp_size, npos = 0;

        tmp_size = some initial guess of the number of matches
        tmp_pos = (int *) S_alloc((long) tmp_size, sizeof(int));

    Then start searching for matches and every time you find one, store its
    position in tmp_pos[npos] and increase npos.
    When tmp_pos is full (npos == tmp_size), realloc with:

        old_size = tmp_size;
        tmp_size = 2 * old_size; /* there are many different strategies for this */
        tmp_pos = (int *) S_realloc((char *) tmp_pos, (long) tmp_size,
                                    (long) old_tmp_size, sizeof(int));

    Note that there is no need to check that the call to S_alloc() or S_realloc()
    were successful because these functions will raise an error and end the call
    to .Call if they fail. In this case they will free the memory currently allocated
    (and so will do on any error or user interrupt).

    When you are done, just return with:

        *pos_ptr = tmp_pos;
        return npos;

  - Call get_matches with:

      int *pos, npos;

      npos = get_matches(&pos, pattern, x);

    Note that memory allocation took place in 'get_matches' but now you need
    to decide how and when the memory pointed by 'pos' will be freed.
    In the R environment, this can be addressed by using exclusively transient
    storage allocation (http://cran.r-project.org/doc/manuals/R-exts.html#Transient)
    as we did in get_matches() so the allocated memory will be automatically
    reclaimed at the end of the call to .C or .Call.
    Of course, the integers stored in pos have to be moved to a "safe" place
    before .Call returns. Typically this will be done with something like:

      SEXP Call_get_matches(...)
        npos = get_matches(&pos, pattern, x);
        PROTECT(pos_sxp = NEW_INTEGER(npos));
        memcpy(INTEGER(pos_sxp), pos, npos * sizeof(int));
        return pos_sxp; /* end of call to .Call */

There are many variations around this. One of them is to "share" pos and npos between
get_matches and its caller by making them global variables (in this case it is
recommended to use 'static' in their declarations but this requires that get_matches
and its caller are in the same .c file).

Hope this helps.


> I would like to have an idea what is the best way to handle this  
> situation in R.
> Thanks in advance!
> ---
>    Vincent Goulet, Associate Professor
>    École d'actuariat
>    Université Laval, Québec
>    Vincent.Goulet at act.ulaval.ca   http://vgoulet.act.ulaval.ca
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list