[R] R C API resize matrice

King Jiefei @zwj|08 @end|ng |rom gm@||@com
Sat Jun 15 17:45:38 CEST 2019


Hi Morgan,

Thanks for the context, it seems like you want to compress your matrix and
you expect the "new" matrix should contain the same amount of information
as the "old" one, it is correct?

If this is the case, since you are using C++ code, a safer but imperfect
solution is to find a C++ data structure to achieve your goal. For using
the result in R without a cost of memory allocation, you can then create a
matrix via ALTREP. ALTREP is a set of new APIs provided by R since 3.5. The
idea of ALTREP is to wrap a non-R object (e.g. std::vector) and to use it
as a vector in R (In your case, it is a vector with dim attribute).
Therefore, for your result, it will behave like a matrix in R but is
actually a C++ object. The only cost you have to pay is the allocation of
an ALTREP and an additional dim attribute.

Unfortunately, ALTREP is still under development and there is only limited
documentation. Here are some very useful resources:

https://purrple.cat/blog/2018/10/14/altrep-and-cpp/

https://github.com/ALTREP-examples

Here is my shameless self-promotion of the package AltWrapper, which
provide users the ability to use ALTREP with pure R language. It is still
at the early stage so you need to use `devtools::load_all()` to install it.

https://github.com/Jiefei-Wang/AltWrapper

Here is a simple example to show how to use this package to resize a matrix
without doing any copy:

Functions preparation:

```
## report the length of the "new" matrix
length_func <- function(x) {
  return(x$length)
}
## Get an element from the data.
## Since the index is for the "new" matrix
## You need to recompute the index to get
## the data from the old matrix
get_element_func <- function(x, i) {
  trueDim = x$trueDim
  targetDim = x$targetDim
  ## Find the correct coordinate
  ind_y = floor((i - 1) / targetDim[1]) + 1
  ind_x = i - (ind_y - 1) * targetDim[1]
  ## Recompute the ith element
  i_new = ind_x + (ind_y - 1) * trueDim[1]
  return(x$data[i_new])
}

## Changing an attribute of an ALTREP object will cause
## a duplication of the object, this might be fixed in future.
## here is a quick workaround of it. We just return the same object.
duplicate_func <- function(x, deep) {
  C_create_altrep("compressedMatrix", x)
}

C_set_altrep_class("compressedMatrix", "real")
C_set_altrep_length_method("compressedMatrix", length_func)
C_set_altrep_get_element_method("compressedMatrix", get_element_func)
C_set_altrep_subset_method("compressedMatrix", get_subset_func)
C_set_altrep_duplicate_method("compressedMatrix", duplicate_func)
```

Usage:
```
A = matrix(1:9, 3, 3)
targetDim = c(2, 2)
A_compressed = C_create_altrep(
  "compressedMatrix",
  list(
    data = A,
    trueDim = dim(A),
    targetDim = targetDim,
    length = targetDim[1] * targetDim[2]
  )
)
attr(A_compressed, "dim") = targetDim
```

Results:
```
> A
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> A_compressed[, ]
     [,1] [,2]
[1,]    1    4
[2,]    2    5
```

The variable `A_compressed` does not have its own data, it relies on the
data from the variable `A` and works like a 2-by-2 matrix. However, due to
the incompletion of ALTREP, you are not able to call `A_compressed`
directly, this can be expected to be fixed in future R release.

Please let me know if you have any questions.

Best,
Jiefei


On Sat, Jun 15, 2019 at 7:19 AM Morgan Morgan <morgan.emailbox using gmail.com>
wrote:

> Hi Jiedei,
>
> Thank you for your reply.
>
> To give you a bit more context, I wrote a function that find all the
> positions (index) of all small matrices inside a larger matrix. At the
> beginning I pre-allocate let's say a 100 by 2 matrix. However a lot of
> values might remain empty in this matrix of postions so I have to resize it
> down to keep only the relevant value. Does it make sense?
>
> Please let me know what you think and if there is a safer way to do it?
>
> Please let me know if you want more information or have any questions.
>
> Best regards
> Morgan
>
> On Sat, 15 Jun 2019 00:15 King Jiefei, <szwjf08 using gmail.com> wrote:
>
>> Hi,
>>
>> I don't think there is a native R API to do what you want here, but if
>> the matrix is only used by you and not be exported to the other user, you
>> can hack R data structure to achieve that goal.
>>
>> Because there is not too much context of your question, I will assume the
>> whole point of resizing a matrix is to avoid the overhead of memory
>> allocation, not to represent the same matrix with different dimension since
>> your 'new' matrix has a different number of elements.
>>
>> Roughly speaking, a matrix in R is nothing but a vector with a dim
>> attribute, you can verify it by R code:
>> ```
>> > A=matrix(1:6,2,3)
>> > A
>>      [,1] [,2] [,3]
>> [1,]    1    3    5
>> [2,]    2    4    6
>> > attributes(A)
>> $dim
>> [1] 2 3
>>
>> > attributes(A)=NULL
>> > A
>> [1] 1 2 3 4 5 6
>> ```
>> Therefore, in order to resize the matrix, you need to change the dim
>> attribute( to a smaller size). Unfortunately, R does its best to prevent
>> you from doing such dangerous operation( and you should know this is*
>> not correct!*), you have to go to the C level to hack R internal data
>> structure. Let's say you want to resize the matrix A to a 2-by-2 matrix,
>> here is what you need to do:
>>
>> C code:
>> The code sets the second value of the dim attribute to 2.
>> ```
>> // [[Rcpp::export]]
>> void I_know_it_is_not_correct(SEXP x,SEXP attrName) {
>> INTEGER(Rf_getAttrib(x, attrName))[1]=2;
>> }
>> ```
>>
>> R code:
>> ```
>> > a=matrix(1:6,2,3)
>> > I_know_it_is_not_correct(a,as.symbol("dim"))
>> > a
>>      [,1] [,2]
>> [1,]    1    3
>> [2,]    2    4
>> > attributes(a)
>> $dim
>> [1] 2 2
>> ```
>>
>> You get what you want. Please use it with your caution.
>>
>> Best,
>> Jiefei
>>
>>
>> On Fri, Jun 14, 2019 at 2:41 PM Morgan Morgan <morgan.emailbox using gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Is there a way to resize a matrix defined as follows:
>>>
>>> SEXP a = PROTECT(allocMatrix(INTSXP, 10, 2));
>>> int *pa  = INTEGER(a)
>>>
>>> To row = 5 and col = 1 or do I have to allocate a second matrix "b" with
>>> pointer *pb and do a "for" loop to transfer the value of a to b?
>>>
>>> Thank you
>>> Best regards
>>> Morgan
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list