[R] plotting percent of incidents within different 'bins'

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Jan 6 02:45:04 CET 2005


I was doing something very similar. 

I found it tricky to work out how to find a confidence interval for the
'percentage' of the outcome (I call it proportion).

Some of my bins had all zeroes or all ones, so I couldn't work out how to
make a variance that was sensible. Also some bins had few values.

To each bin I decided to add one 0 and one 1 'outcome'. This tends the
mean outcome to 50% for low counts (this is kinda intuitivly correct), and
it also maximizes the (binomial) variance.

Do you have a similar problem? / Have you decided what to do?


Also I decided to bin my bins... I had about 50 distinct bins over a
'predictor' range of about 800, so I broke that 800 into 20 'ranges' and
binned the bins in each range.

(like your 'predictor' my original bins are ordered integer values).

Now I don't know how best to calculate the variance, as I can do this at
at least two levels in the data. 

I don't know if I should pool the original bins and recalculate the pooled
binomial variance, or calculate (bootstrap) the variance in the
proportions already calculated.

Let me know if you think my code would be usefull (its quite simple).

Dan.



On Wed, 5 Jan 2005, BXC (Bendix Carstensen) wrote:

>You want:
>
>tapply( Outcome, predictor, mean )
>
>Bendix Carstensen
>----------------------
>Bendix Carstensen
>Senior Statistician
>Steno Diabetes Center
>Niels Steensens Vej 2
>DK-2820 Gentofte
>Denmark
>tel: +45 44 43 87 38
>mob: +45 30 75 87 38
>fax: +45 44 43 07 06
>bxc at steno.dk
>www.biostat.ku.dk/~bxc
>----------------------
>
>
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch 
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
>> Stephen Choularton
>> Sent: Wednesday, January 05, 2005 8:35 PM
>> To: R Help
>> Subject: [R] plotting percent of incidents within different 'bins'
>> 
>> 
>> Hi
>>  
>> Say I have some data, two columns in a table being a binary 
>> outcome plus a predictor and I want to plot a graph that 
>> shows the percentage positives of the binary outcome within 
>> bands of the predictor, e.g.
>>  
>>  
>> Outcome           predictor
>>  
>> 0                      1
>> 1                      2
>> 1                      2
>> 0                      3          
>> 0                      3
>> 0                      2          
>> 1                      3
>> 1                      4
>> 1                      4
>> 0                      4
>> 0                      4
>> 0                      4
>> etc
>>  
>> In this case there are 4 cases in the band 1 - 2 of the 
>> predictor, 2 of them are true so the percent is 50% and there 
>> are 7 cases in the band 3
>> - 4, 3 of which are true making the percentage 43% .
>>  
>> Is there some function in R that will sum these outcomes by 
>> bands of predictor and produce a one by two  data set with 
>> the percentages in one column and the ordered bands in the 
>> other, or alternately is there some sort of special plot.???? 
>> that does it all for you?
>>  
>> Thanks
>>  
>> Stephen
>>  
>>  
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read 
>> the posting guide! http://www.R-project.org/posting-guide.html
>>
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list