[R] Query about computational demand

jgarcia at ija.csic.es jgarcia at ija.csic.es
Mon Oct 6 16:46:39 CEST 2008


Hi all;
I've programmed a couple of C libraries which are loaded dynamically into
R (Linux). With one of these, I'm conducting Monte Carlo analysis, but
every individual execution of my model is about 15'. So, I'm running 1000
executions in about 11 days.

This is not enough for my needings, as I need about 50000 executions for a
sensible analisis of the parameter space. I'm not an expert programmer,
and so, I've got several doubts:

a) I leave R to manage all memory issues. Would a similar C code be faster
if it would be executed as an isolated code outside R?

b) I've been offered two options, to execute remotely the Monte Carlo runs.
Xeon x86_64:  with 8   processors
Itanium:      with 128 processors y 500Gb de RAM

With the common R programming (without any specific programming for
parallel computing), would my dynamically loaded C library and R benefit
of having several processors?

c) How could I report to the people who perhaps offer me the computational
resources the maximum amount of memory I need?

Thanks, and sorry if these are too basic questions.

Javier
------------------


> On 03/10/2008 7:19 PM, Tomas Lanczos wrote:
>> Thank You for Your answer, Duncan,
>>
>> Duncan Murdoch wrote:
>>> On 03/10/2008 4:33 AM, Tomas Lanczos wrote:
>>>> hello,
>>>>
>>>> I wish to create some 3d scatter diagrams visualising different
>>>> grouped data set by a given field in the database. I tried the
>>>> scatterplot3d package, as well as the plot3d and scatter3d functions
>>>> (both within the rgl resp. Rcmdr package). My first question is,
>>>> whether is it possibe to group data in the scatterplot3d and plot3d,
>>>> because I did not succeed to use the groups = ... function.
>>> There is no groups argument to plot3d, but you can set characteristics
>>> of each point separately.  So if you can calculate a colour for each
>>> point yourself, you can do something like
>>>
>>> plot3d(br_scatter[,c("cl", "br", "hco3")], col=colour)
>> I see, but it is something new for me. So, if I understood You well, You
>> advice to prepare another column containing colour codes (colour names?)
>> for each point?
>
> Yes, though it needn't be a column of br_scatter.  A vector of the right
> length will work.
>
> Duncan Murdoch
>>
>>> If you want different sizes for each point, you have to plot each
>>> group separately; the size= attribute can't be a vector.  You could
>>> also use text3d to plot character labels, e.g.
>>>
>>> plot3d(br_scatter[,c("cl", "br", "hco3")], type="n")
>>> text3d(br_scatter[,c("cl", "br", "hco3")],
>>> text=br_scatter$stratigraphy)
>> In some cases it should be nice, but I have hundreds of points, no space
>> left for labels, but I can use it later.
>>
>> Tomas
>>> Duncan Murdoch
>>>
>>>> The scatter3d behaves a bit wierdly with the groups function: it
>>>> works well with data imported from a CSV file, but when I tried to
>>>> apply it to a data imported from a PostgreSQL database (using the
>>>> Rdbi and RdbiPgSQL packages) it gives me this error message:
>>>>
>>>> ERROR:
>>>>    groups variable must be a factor.
>>>>
>>>> To be more clear here is a command I used with the scatter3d (exactly
>>>> the same for the both datasets):
>>>>
>>>> scatter3d(br_scatter$cl, br_scatter$br, br_scatter$hco3,
>>>> fit="linear", residuals=TRUE, bg="white", axis.scales=TRUE,
>>>> grid=TRUE, ellipsoid=FALSE, xlab="cl", ylab="br", zlab="hco3", groups
>>>> = br_scatter$stratigraphy)
>>>>
>>>> the dataset I used is here (the same is the data imported from the
>>>> CSV file and from a PostgreSQL table) looks like this (a part of it):
>>>>
>>>>      stratigraphy           br        hco3          cl
>>>> 1         sarmat 0.2327793352  507.006513  262.781114
>>>> 2         sarmat 0.3741990388 1021.788317  214.254486
>>>> 3          baden 0.3354024830 1268.847582  253.639356
>>>> 4         sarmat 0.0938626352   46.514244   38.995620
>>>> 5         sarmat 0.1163896676   18.300686   72.984568
>>>> 6         sarmat 0.2090008010   77.777917  131.989947
>>>> 7         sarmat 0.2815879055   53.802018  146.804052
>>>> 8          panon 0.0450540649   81.590560  274.980467
>>>> 9          baden 0.5619243092   61.752316  275.978980
>>>> 10        karpat 0.4655586704   16.019351  179.537807
>>>> 11    mezozoikum 0.6244993993  133.442504  152.986938
>>>> 12         panon 0.1539347217  132.679975   65.994974
>>>> 13        sarmat 0.0375450541   19.825743   24.996686
>>>> 14        sarmat 0.0375450541   20.588272   26.280086
>>>> 15         baden 0.0463055667   19.063215   26.494456
>>>> 16         baden 0.1864737685   40.414016   93.992841
>>>> 17        sarmat 0.9236083300   90.740903  597.954458
>>>> 18         panon 0.8022126552   57.189645  499.961921
>>>> 19         panon 0.4830796956   68.627574  280.001241
>>>> 20         panon 0.1163896676   73.202745   53.995887
>>>>
>>>> So why the exactly same "stratigraphy" field is a factor in the
>>>> dataset imported from a CSV file and why is not a factor in the
>>>> dataset imported from a PostgreSQL table
>>>>
>>>> Many thanks in advance
>>>>
>>>> Tomas
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list