[R] Splitting Area under curve into equal portions

Nathan S. Watson-Haigh nathan.watson-haigh at csiro.au
Thu Mar 26 06:59:03 CET 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Milton,

Not quite, that would be an equal number of data points in each colour group.
What I want is an unequal number of points in each group such that:
sum(work[group.members]) is approximately the same for each group of data points.

In the mean time, I came up with the following, and took a leaf out of your book
with the colouring for example:

<code>
n <- 2002
work <- vector()
for(x in 1:(n-2)) {
  work[x] <- ((n-1-x)*(n-x))/2
}
plot(work)

tasks <- vector('list')
tasks_per_slave <- 1
work_per_task <- sum(work) / (n_slaves * tasks_per_slave)

# Now define ranges of x of equal "work"
block_start <- 1
for(x in (1:(length(work)))) {
  if(x == length(work)) {
    # this will be the last block
    tasks[[length(tasks)+1]] <- list(x=block_start:length(work))
    break
  }
  work_in_block_to_x <- sum(work[block_start:(x)])

  if(work_in_block_to_x > work_per_task) {
    # use this value of x as the chunk end
    tasks[[length(tasks)+1]] <- list(x=block_start:x)

    # move the block_start position
    block_start <- x+1
  }
}

colours <- vector()
for(i in 1:length(tasks)) {
  colours <- append(colours,rep(i,length(tasks[[i]]$x)))
}

plot(work, col=colours)
</code>

Essentially, the area under the line for each of the coloured groups (i.e. the
total work associated with those values of x) should be approximately equal and
I believe the above code achieves this. Just found the cumsum() function. You
could look at it this way:

<code>
plot(cumsum(work), col=colours)
</code>

The coloured groupings coincide with splitting the cumulative total (y-axis)
into 4 approximately equal bits.

There must be a nicer way to do this!
Nathan


milton ruser wrote:
> Hi Nathan,
>  
> I am not sure that I understood what you need, and
> also I know that it is not a elegant solution, but may
> do the job.
>  
> n <- 1991
> work <- vector()
> for(x in 1:n) {
>  work[x] <- sum(1:(n-x+1))
> }
> plot(work)
> 
> number.groups <- 5
> last.i<-0
> number.groups.list<-NULL
> for (i in 1:(number.groups-1))
>  {
>  number.groups.list<-c(number.groups.list, rep(i,
> round(length(work)/number.groups,0)))
>  }
> number.groups.list<-c(number.groups.list, rep(number.groups,
> (length(work)-length(number.groups.list)) ))
> aggregate(work, list(number.groups.list), sum)
> plot(work, col=number.groups.list)
>  
> Regards a lot,
>  
> miltinho
> brazil
> 
> On Wed, Mar 25, 2009 at 9:48 PM, Nathan S. Watson-Haigh
> <nathan.watson-haigh at csiro.au> wrote:
> 
> I have some data generated as follows:
> 
> <code>
> n <- 2000
> work <- vector()
> for(x in 1:n) {
>  work[x] <- sum(1:(n-x+1))
> }
> plot(work)
> </code>
> 
> What I want to do
> -----------------
> I want to split work into a number of unequal chunks such that the
> sum of the
> values in each chunk is approximately equal.
> 
> 
> 
> The numbers in "work" are proportional to the amount of work to be
> performed for
> each value of x by a function I've written. i.e. For each value of
> x, there are
> work[x] * y calculations to be performed (where y is a constant).
> 
> I've written a parallel version of my function where I simply assign
> z number of
> x values to each slave. This is not ideal, since a slave that gets
> the 1:z
> smallest values of x will take longer to compute than the (n-z+1):n
> set of x
> values. For example, if I have 4 slaves available:
> 
> slave 1 processes x in 1:500
> slave 2 processes x in 501:1000
> slave 3 processes x in 1001:1500
> slave 4 processes x in 1501:2000
> 
> This means the total work performed by each slave is:
> 
> slave 1 sum(work[1:500])     = 771708500
> slave 2 sum(work[501:1000])  = 396458500
> slave 3 sum(work[1001:1500]) = 146208500
> slave 4 sum(work[1501:2000]) = 20958500
> 
> Manually plitting work into chunks where the sum of the values for
> the chunks is
> approximately equal, I get the following:
> 
> sum(work[1:184])
> [1] 335533384
>> sum(work[185:415])
> [1] 334897871
>> sum(work[416:745])
> [1] 334672085
>> sum(work[746:2000])
> [1] 330230660
> 
> I need to be able to do this automatically for any value of n and I
> think I
> should be able to do this by calculating the area under the curve
> and slicing it
> into equally sized regions, but don't really know how to get there
> from what
> I've said above!
> 
> Cheers,
> Nathan
> 

______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.




- --
- --------------------------------------------------------
Dr. Nathan S. Watson-Haigh
OCE Post Doctoral Fellow
CSIRO Livestock Industries
Queensland Bioscience Precinct
St Lucia, QLD 4067
Australia

Tel: +61 (0)7 3214 2922
Fax: +61 (0)7 3214 2900
Web: http://www.csiro.au/people/Nathan.Watson-Haigh.html
- --------------------------------------------------------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknLGacACgkQ9gTv6QYzVL5zsgCfU4sJwZtLVDsky9IgXn5JbvHy
COgAnihLhkuJm5vpgVpfcJGA2lP524in
=CjBV
-----END PGP SIGNATURE-----




More information about the R-help mailing list