[R] Histogram omitting/collapsing groups

Aren Cambre aren at arencambre.com
Mon Jan 2 00:53:21 CET 2012


Thanks. That did it!

And I get it now--in your original example, aes(x = x, y = Freq), x
refers to the column name in as.data.frame(table(x)), not the x
vector(?) you created.

Aren

On Sun, Jan 1, 2012 at 4:44 PM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
> Sorry, that was probably a really confusing example...too many xs
> floating around.
>
> set.seed(10)
> rawdata <- sample(0:23, 10000, TRUE, prob = sin(0:23)+1)
>
> ## do theis step first for your data
> tableddata <- as.data.frame(table(rawdata))
> ## use these names in ggplot
> colnames(tableddata)
>
> require(ggplot2)
> p <- ggplot(tableddata, aes(x = rawdata, y = Freq)) +
>  geom_bar()
>
> Cheers,
>
> Josh
>
> On Sun, Jan 1, 2012 at 2:36 PM, Aren Cambre <aren at arencambre.com> wrote:
>> This is helpful, although I can't seem to adapt it to my own data.
>>
>> If I run your sample as is, I do get the nice graphs.
>>
>> However, this doesn't work:
>> (Assume you already have a data frame "dallas" with 2057980 rows. It
>> has column "offense_hour", and each row has a value between 0 and 23,
>> inclusive.)
>>> p <- ggplot(as.data.frame(table(dallas$offense_hour)), aes(x = dallas$offense_hour, y = Freq)) + geom_bar()
>>> print(p)
>> Error in data.frame(x = c(9, 8, 10, 9, 10, 15, 11, 13, 0, 16, 13, 20,  :
>>   arguments imply differing number of rows: 2057980, 24
>>
>> Seems like dallas$offense_hour corresponds to x in your example. I'm
>> confused why yours works even though your x has 10,000 values, yet
>> mine fails complaining that the row count is way off. Either way, the
>> length of x or dallas$offense_hour grossly exceeds 24.
>>
>> Aren
>>
>> On Sun, Jan 1, 2012 at 10:34 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
>>>
>>> Hi Aren,
>>>
>>> I was busy thinking about how to make what you wanted, and I missed
>>> that you were working with hours from a day.  That being the case, you
>>> may think about a circular graph.  The attached plots show two
>>> different ways of working with the same data.
>>>
>>> Cheers,
>>>
>>> Josh
>>>
>>> set.seed(10)
>>> x <- sample(0:23, 10000, TRUE, prob = sin(0:23)+1)
>>>
>>> require(ggplot2) # graphing package
>>>
>>> ## regular barplot
>>> p <- ggplot(as.data.frame(table(x)), aes(x = x, y = Freq)) +
>>>  geom_bar()
>>>
>>> ## using circular coordinates
>>> p2 <- p + coord_polar()
>>>
>>> ## print them
>>> print(p)
>>> print(p2)
>>>
>>>
>>> ## just if you're interested, the code to
>>> ## put the two plots side by side
>>> require(grid)
>>>
>>> dev.new(height = 6, width = 12)
>>> grid.newpage()
>>> pushViewport(vpList(
>>>  viewport(x = 0, width = .5,  just = "left", name = "barplot"),
>>>  viewport(x = .5, width = .5, just = "left", name="windrose")))
>>> seekViewport("barplot")
>>> grid.draw(ggplotGrob(p))
>>> seekViewport("windrose")
>>> grid.draw(ggplotGrob(p2))
>>>
>>>
>>> On Sun, Jan 1, 2012 at 7:59 AM, Aren Cambre <aren at arencambre.com> wrote:
>>> > On Sun, Jan 1, 2012 at 5:29 AM, peter dalgaard <pdalgd at gmail.com> wrote:
>>> >> Exactly. If what you want is a barplot, make a barplot; histograms are for continuous data.   Just remember that you may need to set the levels explicitly in case of empty groups: barplot(table(factor(x,levels=0:23))). (This is irrelevant with 100K data samples, but not with 100 of them).
>>> >>
>>> >> That being said, the fact that hist() tends to create breakpoints which coincide with data points due to discretization is arguably a bit of a design error, but it is age-old and hard to change now. One way out is to use truehist() from MASS, another is to explicitly set the breaks to intermediate values, as in hist(x, breaks=seq(-.5, 23.5, 1))
>>> >
>>> > Thanks, everybody. I'll definitely switch to barplot.
>>> >
>>> > As for continuous, it's all relative. Even the most continuous dataset
>>> > at a scale that looks pretty to humans may have gaps between the
>>> > values when you "zoom in" a lot.
>>> >
>>> > Aren
>>>
>>>
>>>
>>> --
>>> Joshua Wiley
>>> Ph.D. Student, Health Psychology
>>> Programmer Analyst II, Statistical Consulting Group
>>> University of California, Los Angeles
>>> https://joshuawiley.com/
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> Programmer Analyst II, Statistical Consulting Group
> University of California, Los Angeles
> https://joshuawiley.com/



More information about the R-help mailing list