[R] Getting the values out of histogram (lattice)

Deepayan Sarkar deepayan.sarkar at gmail.com
Mon Sep 5 11:26:10 CEST 2011


On Thu, Sep 1, 2011 at 10:29 AM, Rolf Turner <rolf.turner at xtra.co.nz> wrote:
>
> 'Scuse me, but I don't see anything in your example relating to what the OP
> asked for. She wanted to get at the ``actual data defining the histogram'',
> which
> I interpret as meaning the bar heights (the percentages, density values, or
> counts,
> depending on "type"). These do not appeared to be stored in the object
> returned
> by histogram().

A couple of additional comments:

1. The `official' way to get panel arguments is trellis.panelArgs(); e.g.,

> p <- histogram(~rnorm(100) | gl(2, 50), type = "density")
> str(trellis.panelArgs(p, 2))
List of 5
 $ x           : num [1:50] 0.277 1.144 1.13 -0.912 -0.892 ...
 $ breaks      : num [1:9] -2.561 -1.979 -1.398 -0.816 -0.234 ...
 $ type        : chr "density"
 $ equal.widths: logi TRUE
 $ nint        : num 8

2. hist.constructor() is needed for technical reasons, and can be
considered to be the same as hist() for this purpose. So the
computations performed by panel.histogram() can be reduced to


histogram.computations <-
    function(x, breaks, equal.widths = TRUE,
             type = "density", nint, ...)
{
    if (is.null(breaks))
    {
        breaks <-
            if (is.factor(x)) seq_len(1 + nlevels(x)) - 0.5
            else if (equal.widths) do.breaks(range(x, finite = TRUE), nint)
            else quantile(x, 0:nint/nint, na.rm = TRUE)
    }
    hist(x, breaks = breaks, plot = FALSE)
}

which may be used as follows to get the ``actual data defining the histogram'':

> a <- trellis.panelArgs(p, 2)
> h <- do.call(histogram.computations, a)
> str(h)
List of 7
 $ breaks     : num [1:9] -2.561 -1.979 -1.398 -0.816 -0.234 ...
 $ counts     : int [1:8] 1 4 6 14 7 8 6 4
 $ intensities: num [1:8] 0.0344 0.1375 0.2062 0.4812 0.2406 ...
 $ density    : num [1:8] 0.0344 0.1375 0.2062 0.4812 0.2406 ...
 $ mids       : num [1:8] -2.2704 -1.6885 -1.1065 -0.5246 0.0573 ...
 $ xname      : chr "x"
 $ equidist   : logi TRUE
 - attr(*, "class")= chr "histogram"

-Deepayan

>
> cheers,
>
> Rolf Turner
>
> On 01/09/11 10:59, Duncan Mackay wrote:
>>
>> Hi Monica
>>
>> An example abbreviated from ?histogram
>>
>> x = histogram( ~ height, data = singer)
>>
>> names(x)
>> # to see what is there
>> str(x)
>>
>> # information
>> x$panel.args.common
>> $breaks
>> [1] 59.36 61.28 63.20 65.12 67.04 68.96 70.88 72.80 74.72 76.64
>>
>> $type
>> [1] "percent"
>>
>> $equal.widths
>> [1] TRUE
>>
>> $nint
>> [1] 9
>>
>> # x$panel.args: name as number
>> x[[35]]
>> [[1]]
>> [[1]]$x
>> [1] 64 62 66 65 60 61 65 66 65 63 67 65 62 65 68 65 63 65 62 65 66 62 65
>> 63 65 66 65 62 65 66 65 61 65 66 65 62 63 67 60 67 66 62 65 62
>> [45] 61 62 66 60 65 65 61 64 68 64 63 62 64 62 64 65 60 65 70 63 67 66 65
>> 62 68 67 67 63 67 66 63 72 62 61 66 64 60 61 66 66 66 62 70 65
>> [89] 64 63 65 69 61 66 65 61 63 64 67 66 68 70 65 65 65 64 66 64 70 63 70
>> 64 63 67 65 63 66 66 64 64 70 70 66 66 66 69 67 65 69 72 71 66
>> [133] 76 74 71 66 68 67 70 65 72 70 68 64 73 66 68 67 64 68 73 69 71 69 76
>> 71 69 71 66 69 71 71 71 69 70 69 68 70 68 69 72 70 72 69 73 71
>> [177] 72 68 68 71 66 68 71 73 73 70 68 70 75 68 71 70 74 70 75 75 69 72 71
>> 70 71 68 70 75 72 66 72 70 69 72 75 67 75 74 72 72 74 72 72 74
>> [221] 70 66 68 75 68 70 72 67 70 70 69 72 71 74 75
>>
>> etc to suite your requirements
>>
>> HTH
>>
>> Regards
>>
>> Duncan
>>
>>
>> Duncan Mackay
>> Department of Agronomy and Soil Science
>> University of New England
>> ARMIDALE NSW 2351
>> Email: home mackay at northnet.com.au
>>
>>
>>
>> At 23:50 31/08/2011, you wrote:
>>
>>
>>
>>> Hi,
>>>
>>>
>>>
>>> I have a relatively big dataset and I want to construct
>>> some histograms using the histogram function in lattice. One thing I am
>>> interested in is to look at differences between density and percent. I
>>> know I can
>>> use the hist function but it seems that this function gives sometimes
>>> some
>>> wrong answers and the density is actually a percent since it is
>>> calculated as counts in the bin divided by the total no. of points. Let me
>>> explain.
>>>
>>>
>>>
>>> If I let the hist function to decide the breaks, or I use
>>> a small number, or one of the pre-determined methods to select breaks
>>> then
>>> everything seems to be in order. But if I decide to use ­ for example ­
>>> 100 as
>>> a breaks (I have over 90000 data points so the number of breaks is not
>>> necessarily too large I would think) the density for the first bin is
>>> over 1,
>>> although for all the other breaks the density is actually a percent since
>>> it is
>>> the count for that bin divided by the total no. of points I have. So ….
>>> Here it
>>> is something wrong or most probably I am doing something wrong.
>>>
>>>
>>>
>>> If I use the function histogram from lattice it is
>>> obvious that there is a difference between the percent param and the
>>> density
>>> param. I looked at the function code and I didn't understand it ­ to be
>>> honest.
>>> It seems it calls inside the hist function, or a slightly modify variant
>>> of
>>> hist. Reading about the object trellis I saw I can access different info
>>> about
>>> the graph it generates but nothing about the actual data that goes into
>>> defining the histogram. How can I access the data from it?
>>>
>>>
>>>
>>> I am not sure if my problem is platform specific ­ it should
>>> not be ­ but I have Rx64 2.13.1 on windows machine, in case it counts.
>>>
>>>
>>>
>>> I appreciate your help, thanks,
>>>
>>>
>>>
>>> Monica
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list