[R] [FORGED] Newbie Question on R versus Matlab/Octave versus C

Alan Feuerbacher @|@n|00 @end|ng |rom comc@@t@net
Wed Jan 30 17:16:52 CET 2019


On 1/29/2019 11:50 PM, Jeff Newmiller wrote:

Thanks very much for providing these coding examples! I think this is a 
good way to learn some R.

Alan

> On Tue, 29 Jan 2019, Alan Feuerbacher wrote:
> 
>> On 1/28/2019 7:51 PM, Jeff Newmiller wrote:
>>> If you forge on with your preconceptions of how such a simulation 
>>> should be implemented then you will be able to reproduce your failure 
>>> just as spectacularly using R as you did using Octave.
>>
>> I think I've come to the same conclusion. :-)
>>
>>> It is crucial to employ vectorization of your algorithms if you want 
>>> good performance with either Octave or R. That vectorization may 
>>> either be over time or over separate simulations.
>>
>> Please explain further, if you don't mind. My background is not in 
>> programming, but in analog microchip circuit design (I'm now retired). 
>> Thus I'm a user of circuit simulators, not a programmer of them. Also, 
>> I'm running this stuff on my home computers, either Linux or Windows 
>> machines.
>>
>>> I am running simulations of a million cases of power plant 
>>> performance over 25 years in about a minute. I know someone who used 
>>> R to simulate a CFD river flow problem in a class in a few minutes, 
>>> while others using Fortran or Matlab were struggling to get 
>>> comparable runs completed in many hours. I believe the difference was 
>>> in how the data were structured and manipulated more than the 
>>> language that was being used. I think the strong capabilities for 
>>> presenting results using R makes using it advantageous over Octave, 
>>> though.
>>
>> After my failed attempt at using Octave, I realized that most likely 
>> the main contributing factor was that I was not able to figure out an 
>> efficient data structure to model one person. But C lent itself 
>> perfectly to my idea of how to go about programming my simulation. So 
>> here's a simplified pseudocode sort of example of what I did:
> 
> Don't model one person... model an array of people.
> 
>> To model a single reproducing woman I used this C construct:
>>
>> typedef struct woman {
>>  int isAlive;
>>  int isPregnant;
>>  double age;
>>  . . .
>> } WOMAN;
> 
> # e.g.
> Nwomen <- 100
> women <- data.frame( isAlive = rep( TRUE, Nwomen )
>                     , isPregnant = rep( FALSE, Nwomen )
>                     , age = rep( 20, Nwomen )
>                     )
> 
>> Then I allocated memory for a big array of these things, using the C 
>> malloc() function, which gave me the equivalent of this statement:
>>
>> WOMAN women[NWOMEN];  /* An array of NWOMEN woman-structs */
>>
>> After some initialization I set up two loops:
>>
>> for( j=0; j<numberOfYears; j++) {
>>  for(i=1; i< numberOfWomen; i++) {
>>    updateWomen();
>>  }
>> }
> 
> for ( j in seq.int( numberOfYears ) {
>    # let vectorized data storage automatically handle the other for loop
>    women <- updateWomen( women )
> }
> 
>> The function updateWomen() figures out things like whether the woman 
>> becomes pregnant or gives birth on a given day, dies, etc.
> 
> You can use your "fixed size" allocation strategy with flags indicating 
> whether specific rows are in use, or you can only work with valid rows 
> and add rows as needed for children... best to compute a logical vector 
> that identifies all of the birthing mothers as a subset of the data 
> frame, and build a set of children rows using the birthing mothers data 
> frame as input, and then rbind the new rows to the updated women 
> dataframe as appropriate. The most clear approach for individual 
> decision calculations is the use of the vectorized "ifelse" function, 
> though under certain circumstances putting an indexed subset on the left 
> side of an assignment can modify memory "in place" (the 
> functional-programming restriction against this is probably a foreign 
> idea to a dyed-in-the-wool C programmer, but R usually prevents you from 
> modifying the variable that was input to a function, automatically 
> making a local copy of the input as needed in order to prevent such 
> backwash into the caller's context).
> 
>> I added other refinements that are not relevant here, such as random 
>> variations of various parameters, using the GNU Scientific Library 
>> random number generator functions.
> 
> R has quite sophisticated random number generation by default.
> 
>> If you can suggest a data construct in R or Octave that does something 
>> like this, and uses your idea of vectorization, I'd like to hear it. 
>> I'd like to implement it and compare results with my C implementation.
>>
>>> If your problems truly need a compiled language, the Rcpp package 
>>> lets you mix C++ with R quite easily and then you get the best of 
>>> both worlds. (C and Fortran are supported, but they are a bit more 
>>> finicky to setup than C++).
>>
>> I don't know the answer to that, but perhaps you can help decide.
>>
>> Alan
>>
>>
>>> On January 28, 2019 4:00:07 PM PST, Alan Feuerbacher 
>>> <alanf00 using comcast.net> wrote:
>>>> On 1/28/2019 4:20 PM, Rolf Turner wrote:
>>>>>
>>>>> On 1/29/19 10:05 AM, Alan Feuerbacher wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I recently learned of the existence of R through a physicist friend
>>>>>> who uses it in his research. I've used Octave for a decade, and C
>>>> for
>>>>>> 35 years, but would like to learn R. These all have advantages and
>>>>>> disadvantages for certain tasks, but as I'm new to R I hardly know
>>>> how
>>>>>> to evaluate them. Any suggestions?
>>>>>
>>>>> * C is fast, but with a syntax that is (to my mind) virtually
>>>>>     incomprehensible.  (You probably think differently about this.)
>>>>
>>>> I've been doing it long enough that I have little problem with it,
>>>> except for pointers. :-)
>>>>
>>>>> * In C, you essentially have to roll your own for all tasks; in R,
>>>>>     practically anything (well ...) that you want to do has already
>>>>>     been programmed up.  CRAN is a wonderful resource, and there's
>>>> more
>>>>>     on github.
>>>>>
>>>>> * The syntax of R meshes beautifully with *my* thought patterns;
>>>> YMMV.
>>>>>
>>>>> * Why not just bog in and try R out?  It's free, it's readily
>>>> available,
>>>>>     and there are a number of good online tutorials.
>>>>
>>>> I just installed R on my Linux Fedora system, so I'll do that.
>>>>
>>>> I wonder if you'd care to comment on my little project that prompted
>>>> this? As part of another project, I wanted to model population growth
>>>> starting from a handful of starting individuals. This is exponential in
>>>>
>>>> the long run, of course, but I wanted to see how a few basic parameters
>>>>
>>>> affected the outcome. Using Octave, I modeled a single person as a
>>>> "cell", which in Octave has a good deal of overhead. The program
>>>> basically looped over the entire population, and updated each person
>>>> according to the parameters, which included random statistical
>>>> variations. So when the total population reached, say 10,000, and an
>>>> update time of 1 day, the program had to execute 10,000 x 365 update
>>>> operations for each year of growth. For large populations, say 100,000,
>>>>
>>>> the program did not return even after 24 hours of run time.
>>>>
>>>> So I switched to C, and used its "struct" declaration and an array of
>>>> structs to model the population. This allowed the program to complete
>>>> in
>>>> under a minute as opposed to 24 hours+. So in line with your comments,
>>>> C
>>>> is far more efficient than Octave.
>>>>
>>>> How do you think R would fare in this simulation?
>>>>
>>>> Alan
>>>>
>>>>
>>>> ---
>>>> This email has been checked for viruses by Avast antivirus software.
>>>> https://www.avast.com/antivirus
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
> 
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                        Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------



More information about the R-help mailing list