[R] R C API resize matrice

King Jiefei @zwj|08 @end|ng |rom gm@||@com
Sat Jun 15 17:51:27 CEST 2019


Hi Morgan,

In the example, please ignore
`C_set_altrep_subset_method("compressedMatrix", get_subset_func)`. You do
not have to define it to run the example.

Best,
Jiefei

On Sat, Jun 15, 2019 at 11:45 AM King Jiefei <szwjf08 using gmail.com> wrote:

> Hi Morgan,
>
> Thanks for the context, it seems like you want to compress your matrix and
> you expect the "new" matrix should contain the same amount of information
> as the "old" one, it is correct?
>
> If this is the case, since you are using C++ code, a safer but imperfect
> solution is to find a C++ data structure to achieve your goal. For using
> the result in R without a cost of memory allocation, you can then create a
> matrix via ALTREP. ALTREP is a set of new APIs provided by R since 3.5. The
> idea of ALTREP is to wrap a non-R object (e.g. std::vector) and to use it
> as a vector in R (In your case, it is a vector with dim attribute).
> Therefore, for your result, it will behave like a matrix in R but is
> actually a C++ object. The only cost you have to pay is the allocation of
> an ALTREP and an additional dim attribute.
>
> Unfortunately, ALTREP is still under development and there is only limited
> documentation. Here are some very useful resources:
>
> https://purrple.cat/blog/2018/10/14/altrep-and-cpp/
>
> https://github.com/ALTREP-examples
>
> Here is my shameless self-promotion of the package AltWrapper, which
> provide users the ability to use ALTREP with pure R language. It is still
> at the early stage so you need to use `devtools::load_all()` to install it.
>
> https://github.com/Jiefei-Wang/AltWrapper
>
> Here is a simple example to show how to use this package to resize a
> matrix without doing any copy:
>
> Functions preparation:
>
> ```
> ## report the length of the "new" matrix
> length_func <- function(x) {
>   return(x$length)
> }
> ## Get an element from the data.
> ## Since the index is for the "new" matrix
> ## You need to recompute the index to get
> ## the data from the old matrix
> get_element_func <- function(x, i) {
>   trueDim = x$trueDim
>   targetDim = x$targetDim
>   ## Find the correct coordinate
>   ind_y = floor((i - 1) / targetDim[1]) + 1
>   ind_x = i - (ind_y - 1) * targetDim[1]
>   ## Recompute the ith element
>   i_new = ind_x + (ind_y - 1) * trueDim[1]
>   return(x$data[i_new])
> }
>
> ## Changing an attribute of an ALTREP object will cause
> ## a duplication of the object, this might be fixed in future.
> ## here is a quick workaround of it. We just return the same object.
> duplicate_func <- function(x, deep) {
>   C_create_altrep("compressedMatrix", x)
> }
>
> C_set_altrep_class("compressedMatrix", "real")
> C_set_altrep_length_method("compressedMatrix", length_func)
> C_set_altrep_get_element_method("compressedMatrix", get_element_func)
> C_set_altrep_subset_method("compressedMatrix", get_subset_func)
> C_set_altrep_duplicate_method("compressedMatrix", duplicate_func)
> ```
>
> Usage:
> ```
> A = matrix(1:9, 3, 3)
> targetDim = c(2, 2)
> A_compressed = C_create_altrep(
>   "compressedMatrix",
>   list(
>     data = A,
>     trueDim = dim(A),
>     targetDim = targetDim,
>     length = targetDim[1] * targetDim[2]
>   )
> )
> attr(A_compressed, "dim") = targetDim
> ```
>
> Results:
> ```
> > A
>      [,1] [,2] [,3]
> [1,]    1    4    7
> [2,]    2    5    8
> [3,]    3    6    9
> > A_compressed[, ]
>      [,1] [,2]
> [1,]    1    4
> [2,]    2    5
> ```
>
> The variable `A_compressed` does not have its own data, it relies on the
> data from the variable `A` and works like a 2-by-2 matrix. However, due to
> the incompletion of ALTREP, you are not able to call `A_compressed`
> directly, this can be expected to be fixed in future R release.
>
> Please let me know if you have any questions.
>
> Best,
> Jiefei
>
>
> On Sat, Jun 15, 2019 at 7:19 AM Morgan Morgan <morgan.emailbox using gmail.com>
> wrote:
>
>> Hi Jiedei,
>>
>> Thank you for your reply.
>>
>> To give you a bit more context, I wrote a function that find all the
>> positions (index) of all small matrices inside a larger matrix. At the
>> beginning I pre-allocate let's say a 100 by 2 matrix. However a lot of
>> values might remain empty in this matrix of postions so I have to resize it
>> down to keep only the relevant value. Does it make sense?
>>
>> Please let me know what you think and if there is a safer way to do it?
>>
>> Please let me know if you want more information or have any questions.
>>
>> Best regards
>> Morgan
>>
>> On Sat, 15 Jun 2019 00:15 King Jiefei, <szwjf08 using gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I don't think there is a native R API to do what you want here, but if
>>> the matrix is only used by you and not be exported to the other user, you
>>> can hack R data structure to achieve that goal.
>>>
>>> Because there is not too much context of your question, I will assume
>>> the whole point of resizing a matrix is to avoid the overhead of memory
>>> allocation, not to represent the same matrix with different dimension since
>>> your 'new' matrix has a different number of elements.
>>>
>>> Roughly speaking, a matrix in R is nothing but a vector with a dim
>>> attribute, you can verify it by R code:
>>> ```
>>> > A=matrix(1:6,2,3)
>>> > A
>>>      [,1] [,2] [,3]
>>> [1,]    1    3    5
>>> [2,]    2    4    6
>>> > attributes(A)
>>> $dim
>>> [1] 2 3
>>>
>>> > attributes(A)=NULL
>>> > A
>>> [1] 1 2 3 4 5 6
>>> ```
>>> Therefore, in order to resize the matrix, you need to change the dim
>>> attribute( to a smaller size). Unfortunately, R does its best to prevent
>>> you from doing such dangerous operation( and you should know this is*
>>> not correct!*), you have to go to the C level to hack R internal data
>>> structure. Let's say you want to resize the matrix A to a 2-by-2 matrix,
>>> here is what you need to do:
>>>
>>> C code:
>>> The code sets the second value of the dim attribute to 2.
>>> ```
>>> // [[Rcpp::export]]
>>> void I_know_it_is_not_correct(SEXP x,SEXP attrName) {
>>> INTEGER(Rf_getAttrib(x, attrName))[1]=2;
>>> }
>>> ```
>>>
>>> R code:
>>> ```
>>> > a=matrix(1:6,2,3)
>>> > I_know_it_is_not_correct(a,as.symbol("dim"))
>>> > a
>>>      [,1] [,2]
>>> [1,]    1    3
>>> [2,]    2    4
>>> > attributes(a)
>>> $dim
>>> [1] 2 2
>>> ```
>>>
>>> You get what you want. Please use it with your caution.
>>>
>>> Best,
>>> Jiefei
>>>
>>>
>>> On Fri, Jun 14, 2019 at 2:41 PM Morgan Morgan <morgan.emailbox using gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there a way to resize a matrix defined as follows:
>>>>
>>>> SEXP a = PROTECT(allocMatrix(INTSXP, 10, 2));
>>>> int *pa  = INTEGER(a)
>>>>
>>>> To row = 5 and col = 1 or do I have to allocate a second matrix "b" with
>>>> pointer *pb and do a "for" loop to transfer the value of a to b?
>>>>
>>>> Thank you
>>>> Best regards
>>>> Morgan
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list