[R] Histogram omitting/collapsing groups

Joshua Wiley jwiley.psych at gmail.com
Sun Jan 1 23:44:32 CET 2012


Sorry, that was probably a really confusing example...too many xs
floating around.

set.seed(10)
rawdata <- sample(0:23, 10000, TRUE, prob = sin(0:23)+1)

## do theis step first for your data
tableddata <- as.data.frame(table(rawdata))
## use these names in ggplot
colnames(tableddata)

require(ggplot2)
p <- ggplot(tableddata, aes(x = rawdata, y = Freq)) +
  geom_bar()

Cheers,

Josh

On Sun, Jan 1, 2012 at 2:36 PM, Aren Cambre <aren at arencambre.com> wrote:
> This is helpful, although I can't seem to adapt it to my own data.
>
> If I run your sample as is, I do get the nice graphs.
>
> However, this doesn't work:
> (Assume you already have a data frame "dallas" with 2057980 rows. It
> has column "offense_hour", and each row has a value between 0 and 23,
> inclusive.)
>> p <- ggplot(as.data.frame(table(dallas$offense_hour)), aes(x = dallas$offense_hour, y = Freq)) + geom_bar()
>> print(p)
> Error in data.frame(x = c(9, 8, 10, 9, 10, 15, 11, 13, 0, 16, 13, 20,  :
>   arguments imply differing number of rows: 2057980, 24
>
> Seems like dallas$offense_hour corresponds to x in your example. I'm
> confused why yours works even though your x has 10,000 values, yet
> mine fails complaining that the row count is way off. Either way, the
> length of x or dallas$offense_hour grossly exceeds 24.
>
> Aren
>
> On Sun, Jan 1, 2012 at 10:34 AM, Joshua Wiley <jwiley.psych at gmail.com> wrote:
>>
>> Hi Aren,
>>
>> I was busy thinking about how to make what you wanted, and I missed
>> that you were working with hours from a day.  That being the case, you
>> may think about a circular graph.  The attached plots show two
>> different ways of working with the same data.
>>
>> Cheers,
>>
>> Josh
>>
>> set.seed(10)
>> x <- sample(0:23, 10000, TRUE, prob = sin(0:23)+1)
>>
>> require(ggplot2) # graphing package
>>
>> ## regular barplot
>> p <- ggplot(as.data.frame(table(x)), aes(x = x, y = Freq)) +
>>  geom_bar()
>>
>> ## using circular coordinates
>> p2 <- p + coord_polar()
>>
>> ## print them
>> print(p)
>> print(p2)
>>
>>
>> ## just if you're interested, the code to
>> ## put the two plots side by side
>> require(grid)
>>
>> dev.new(height = 6, width = 12)
>> grid.newpage()
>> pushViewport(vpList(
>>  viewport(x = 0, width = .5,  just = "left", name = "barplot"),
>>  viewport(x = .5, width = .5, just = "left", name="windrose")))
>> seekViewport("barplot")
>> grid.draw(ggplotGrob(p))
>> seekViewport("windrose")
>> grid.draw(ggplotGrob(p2))
>>
>>
>> On Sun, Jan 1, 2012 at 7:59 AM, Aren Cambre <aren at arencambre.com> wrote:
>> > On Sun, Jan 1, 2012 at 5:29 AM, peter dalgaard <pdalgd at gmail.com> wrote:
>> >> Exactly. If what you want is a barplot, make a barplot; histograms are for continuous data.   Just remember that you may need to set the levels explicitly in case of empty groups: barplot(table(factor(x,levels=0:23))). (This is irrelevant with 100K data samples, but not with 100 of them).
>> >>
>> >> That being said, the fact that hist() tends to create breakpoints which coincide with data points due to discretization is arguably a bit of a design error, but it is age-old and hard to change now. One way out is to use truehist() from MASS, another is to explicitly set the breaks to intermediate values, as in hist(x, breaks=seq(-.5, 23.5, 1))
>> >
>> > Thanks, everybody. I'll definitely switch to barplot.
>> >
>> > As for continuous, it's all relative. Even the most continuous dataset
>> > at a scale that looks pretty to humans may have gaps between the
>> > values when you "zoom in" a lot.
>> >
>> > Aren
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> Programmer Analyst II, Statistical Consulting Group
>> University of California, Los Angeles
>> https://joshuawiley.com/



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list