[R] Optimization problem: selecting independent rows to maximizethe mean

Berton Gunter gunter.berton at gene.com
Wed Mar 1 22:07:07 CET 2006


This sounds either easy via a greedy algorithm or NP-hard. Moreover, it is
not clear to me that

1) A subset of 500 indpendent rows exists, where I presume "independent"
means pairwise nonoverlapping;

2) That the mean and sd can be simultaneously optimized as you describe--
what if the subset with maximum mean also has bigger than minimal sd?


-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mark
> Sent: Wednesday, March 01, 2006 12:40 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Optimization problem: selecting independent rows 
> to maximizethe mean
> 
> Dear R community,
> 
> I have a dataframe with 500,000 rows and 102 columns. The rows
> represent spatial polygons, some of which overlap others (i.e., not
> all rows are independent of each other).
> 
> Given a particular row, the first column contains a unique "RowID".
> The second column contains the "Variable" of interest. The remaining
> 100 columns ("Overlap1" ... "Overlap100") each contain a row ID that
> overlaps this row (but if this row overlaps fewer than 100 other rows
> then the remainder of the columns "OL1...OL100" contain NA).
> 
> Here's the problem: I need to select the subset of 500 independent
> rows that maximizes the mean and minimizes the stdev of "Variable".
> 
> Clearly this requires iterative selection and comparison of rows,
> because each newly-selected row must be compared to rows already
> selected to ensure it does not overlap them. At each step, a row
> already selected might be removed from the subset if it can be
> replaced with another that increases the mean and/or reduces the
> stdev.
> 
> The above description is a simplification of my problem, but 
> it's a start.
> 
> As I am new to R (and programming in general) I'm not sure how to
> start thinking about this, or even where to look. I'd appreciate any
> ideas that might help.
> 
> Thank you, Mark
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list