[Rd] Confusion regarding allocating Matrices.

Simon Urbanek simon.urbanek at r-project.org
Sat Oct 24 23:25:46 CEST 2009


On Oct 24, 2009, at 2:58 PM, Abhijit Bera wrote:

> Ok I get it. So everytime it does a alloc and copy.
>
> I haven't finished the design yet. I'm just thinking about how  
> randomly the data might arrive; its real time data. So I will  
> allocate a large chunk of memory and keep track of when it fills up,  
> once the data exceeds I will alloc and copy the data (provided the  
> size is  within system limits). In this manner I should be able to  
> reduce the number of expensive operations of allocing and copying.
>

Many smart people have thought about those things before you, it's  
worthwhile to read about it --- I would suggest reading a bit about  
data structures and programming in C. What you describe is usually  
tackled by a allocating additional (usually linked) buffers as you go  
since that means you don't have to copy anything (except for that last  
step where you create the R object). It's also very trivial to  
implement.

Cheers,
Simon


>
>
> Abhijit Bera
>
> On Sat, Oct 24, 2009 at 10:06 PM, Douglas Bates  
> <bates at stat.wisc.edu> wrote:
>
>> On Fri, Oct 23, 2009 at 2:02 PM, Abhijit Bera <abhibera at gmail.com>  
>> wrote:
>>> Sorry, I made a mistake while writing the code. The declaration of  
>>> Data
>>> should have been first.
>>
>>> I still have some doubts:
>>
>> Because you are making some sweeping and incorrect assumptions about
>> the way that the internals of R operate.  R allows for arrays to be
>> dynamically resized but this is accomplished internally by allocating
>> new storage, copying the current contents to this new location and
>> installing the values of the new elements.  It is an expensive
>> operation, which is why it is discouraged.
>>
>> Your design is deeply flawed.  Go back to the drawing board.
>>
>
>
>
>>> When you say calloc and realloc are you talking about R's C  
>>> interface
>> Calloc
>>> and Realloc or the regular calloc and realloc?
>>
>> Either one.
>>
>>> I want to feed data directly into a R matrix and grow it as  
>>> required. So
>> at
>>> one time I might have 100 rows coming in from a data source. The  
>>> next
>> time I
>>> might have 200 rows coming in from a data source. I want to be  
>>> able to
>>> expand the R-matrix instead of creating a regular C float matrix  
>>> and then
>>> make an R-matrix based on the new size. I just want to have one R  
>>> object
>> and
>>> be able to expand it's size dynamically.
>>
>> R stores floating-point numbers as the C data type double, not float.
>> It may seem pedantic to point out distinctions like that but not when
>> you are writing programs.  Compilers are the ultimate pedants - they
>> are real sticklers for getting the details right.
>>
>> As I said, it just doesn't work the way that you think it does.  The
>> fact that there is an R object with a certain name before and after  
>> an
>> operation doesn't mean it is the same R object.
>>
>>> I was reading the language specs. It says that one could declare an
>> object
>>> in R like this:
>>>
>>> m=matrix(nrows=10,ncols=10)
>>>
>>> and then one could assign
>>>
>>> m[101]=1.00
>>>
>>> to expand the object.
>>>
>>> but this has one problem when I do a
>>>
>>> dim(m)
>>>
>>> I get
>>>
>>> NULL instead of 10 10
>>>
>>> So what is happening here?
>>>
>>>
>>> I am aware that R matrices are stored in column major order.
>>>
>>> Thanks for the tip on using float *dat= REAL(Data);
>>>
>>> Regards
>>>
>>> Abhijit Bera
>>>
>>>
>>>
>>> On Fri, Oct 23, 2009 at 7:27 PM, Douglas Bates <bates at stat.wisc.edu>
>> wrote:
>>>>
>>>> On Fri, Oct 23, 2009 at 9:23 AM, Douglas Bates  
>>>> <bates at stat.wisc.edu>
>>>> wrote:
>>>>> On Fri, Oct 23, 2009 at 8:39 AM, Abhijit Bera <abhibera at gmail.com>
>>>>> wrote:
>>>>>> Hi
>>>>>>
>>>>>> I'm having slight confusion.
>>>>>
>>>>> Indeed.
>>>>>
>>>>>> I plan to grow/realloc a matrix depending on the data available  
>>>>>> in a
>> C
>>>>>> program.
>>>>>
>>>>>> Here is what I'm tried to do:
>>>>>
>>>>>> Data=allocMatrix(REALSXP,3,4);
>>>>>> SEXP Data;
>>>>>
>>>>> Those lines should be in the other order, shouldn't they?
>>>>>
>>>>> Also, you need to PROTECT Data or bad things will happen.
>>>>>
>>>>>> REAL(Data)[8]=0.001123;
>>>>>> REAL(Data)[200000]=0.001125;
>>>>>> printf("%f %f\n\n\n\n",REAL(Data)[8],REAL(Data)[200000]);
>>>>
>>>> And I forgot to mention, it is not a good idea to write REAL(Data)
>>>> many times like this.  REAL is a function, not a macro and you are
>>>> calling the same function over and over again unnecessarily.  It is
>>>> better to write
>>>>
>>>> double *dat = REAL(Data);
>>>>
>>>> and use the dat pointer instead of REAL(Data).
>>>>
>>>>>> Here is my confusion:
>>>>>
>>>>>> Do I always require to allocate the exact number of data  
>>>>>> elements in
>> a
>>>>>> R
>>>>>> Matrix?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> In the above code segment I have clearly exceeded the number of
>>>>>> elements that have been allocated but my program doesn't crash.
>>>>>
>>>>> Remember that when programming in C you have a lot of rope with  
>>>>> which
>>>>> to hang yourself.   You have corrupted a memory location beyond  
>>>>> that
>>>>> allocated to the array but nothing bad has happened  - yet.
>>>>>
>>>>>> I don't find any specific R functions for reallocation incase  
>>>>>> my data
>>>>>> set
>>>>>> grows. How do I reallocate?
>>>>>
>>>>> You allocate a new matrix, copy the contents of the current  
>>>>> matrix to
>>>>> the new matrix, then release the old one.  It gets tricky in  
>>>>> that you
>>>>> should unprotect the old one and protect the new one but you  
>>>>> need to
>>>>> watch the order of those operations.
>>>>>
>>>>> This approach is not a very good one.  If you really need to  
>>>>> grow an
>>>>> array it is better to allocate and reallocate the memory within  
>>>>> your C
>>>>> code using calloc and realloc then, at the end of the  
>>>>> calculations,
>>>>> allocate an R matrix and copy the results over.
>>>>>
>>>>> Also, you haven't said whether you are growing the matrix by row  
>>>>> or by
>>>>> column or both.  If you are adding rows then you can't just  
>>>>> reallocate
>>>>> storage because R stores matrices in column-major order. The  
>>>>> positions
>>>>> of the elements in a matrix with n+1 rows are different from  
>>>>> those in
>>>>> a matrix with n rows.
>>>>>
>>>>>> Is it necessary to reallocate or is R handling
>>>>>> the memory management for the matrix that I have allocated?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Abhijit Bera
>>>>>>
>>>>>>       [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>
>>>>>
>>>
>>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list