[R] histogram

Tue Feb 7 06:10:14 CET 2012

On Feb 6, 2012, at 9:46 PM, Francis Keyes wrote:

> ok here are two 1-column data files.  sample1 and sampleRef (the
> reference distribution)
>

 > dat1 <- read.table(file="~/Downloads/sample1")
 > datRef <- read.table(file="~/Downloads/sampleRef")
 > str(dat1)
'data.frame':	11378 obs. of  1 variable:
  $ V1: num  -2.15 -2.87 1.79 -1.8 -1.77 ...
 > str(datRef)
'data.frame':	10000 obs. of  1 variable:
  $ V1: num  0.3 0.3 -2.15 -2.15 -2.28 ...
 > range(dat1)
[1] -3.10634  3.10214
 > range(datRef)
[1] -3.10634  3.10214

 > hdat1 <- hist(dat1$V1)
 > hdatRef <- hist(datRef$V1)
 > str(hdat1)
List of 7
  $ breaks     : num [1:15] -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 ...
  $ counts     : int [1:14] 180 922 1007 897 726 812 717 878 811 791 ...
  $ intensities: num [1:14] 0.0316 0.1621 0.177 0.1577 0.1276 ...
  $ density    : num [1:14] 0.0316 0.1621 0.177 0.1577 0.1276 ...
  $ mids       : num [1:14] -3.25 -2.75 -2.25 -1.75 -1.25 -0.75 -0.25  
0.25 0.75 1.25 ...
  $ xname      : chr "dat1$V1"
  $ equidist   : logi TRUE
  - attr(*, "class")= chr "histogram"
 > hdatRef$breaks
  [1] -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0   
2.5  3.0  3.5
 > hdat1$breaks
  [1] -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0   
2.5  3.0  3.5

They end up having the same breaks because their range was the same,  
so you didn't even need to go to the trouble of forcing hte breaks  
arguments to be the same. So you can either use the ratio of the  
counts or the ratio of the densities

 > barplot(hdat1$counts/hdatRef$count, names.arg=hdat1$mids, las=3)

You could have used cut and table as well but hist automatically  
returns a list with components that can be used, so it's just easier  
to use them.

-- 
David.

> On Mon, Feb 6, 2012 at 7:18 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>
>> On Feb 6, 2012, at 5:26 PM, Francis Keyes wrote:
>>
>>> Hi David,
>>>
>>> I have 2 tables, each with several columns and rows of data.  I am
>>> only interested in the data from column 6, which contains values in
>>> the range -PI to PI.  I want to plot the data from tableD with the y
>>> axis denoting percentage with respect to tableR.  So if data  
>>> points in
>>> the break 2 - 3 appear half as often in tableD as in tableR, the y
>>> axis should show 50 percent.  Does that make sense?
>>> I've been plotting the data like this to date:
>>>
>>> hist(tableD[,6],ylab="frequency", xlab="angle")
>>
>>
>> It all makes sense, (and it made sense before) , but your  
>> responsibility is
>> to provide data.
>>
>> (Contrats on plain text lesson successfully met.)
>>
>>
>>>
>>> Thanks a lot for your help
>>>
>>>
>>>
>>> On Mon, Feb 6, 2012 at 1:31 PM, David Winsemius <dwinsemius at comcast.net 
>>> >
>>> wrote:
>>>>
>>>>
>>>>
>>>> On Feb 6, 2012, at 12:23 PM, Francis Keyes wrote:
>>>>
>>>>> Thanks.  How do you suggest I use the reference population?   
>>>>> Sorry, I'm
>>>>> new to R and just don't see it.  If i can get a plot that is  
>>>>> counts or
>>>>> density relative to my reference data it would be ideal.
>>>>
>>>>
>>>>
>>>> It is difficult to specify "how" when we have no "what". The  
>>>> "what" is
>>>> your responsibility, not ours. My thought was to use the ratio of  
>>>> the
>>>> results of hist() on the two populations  which would then be  
>>>> offered back
>>>> to hist or barplot. ....which (of course) requires that the  
>>>> 'breaks'  be the
>>>> same. Provide an example of your R representations of the reference
>>>> population and tested population and all will become clear.
>>>>
>>>> (And learn to post in plain text, please.)
>>>>
>>>> --
>>>> David Winsemius, MD
>>>> West Hartford, CT
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 6, 2012 at 1:12 AM, David Winsemius <dwinsemius at comcast.net 
>>>>> >
>>>>> wrote:
>>>>>
>>>>> On Feb 5, 2012, at 8:31 PM, Francis Keyes wrote:
>>>>>
>>>>> With R and the hist function, is there a way to make a histogram  
>>>>> in
>>>>> which
>>>>> the y axis denotes propotion with respect to a separate sample  
>>>>> dataset
>>>>> of
>>>>> the same range instead of frequency?
>>>>>
>>>>> hist() returns an object with both "counts" and "density". If  
>>>>> you had a
>>>>> reference population it should be a fairly simple matter to use  
>>>>> one or the
>>>>> other of those.
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
> <sample1><sampleRef>

David Winsemius, MD
West Hartford, CT