[R] frequencies of a discrete numeric variable, including zeros

Michael Friendly friendly at yorku.ca
Wed Sep 3 14:52:43 CEST 2014


Thanks to all who replied to this thread.

To summarize, John Fox and William Dunlap's suggestion amounts to this 
plot, where it
becomes *crucial* to eliminate the zeros (otherwise they would not be 
distinguishable
from the counts of 1, with points()):

# Fox/Dunlap plot, using plot.table method
art.tab0 <- table(art)
plot(art.tab0, ylab="Frequency", xlab="Number of articles")
points(as.numeric(names(art.tab0)), art.tab0, pch=16)

Here, I actually prefer the barplot, using factor() to retain the zeros:

# coerce to a factor, then use table()
art.fac <- factor(art, levels = 0:19)
art.tab <- table(art.fac)
barplot(art.tab, ylab="Frequency", xlab="Number of articles")

However, the frequencies for small values of art dominate the display, 
and I'm contemplating a
Poisson regression anyway, so why not plot on a log scale:

# plot on log scale, but start at 1 to avoid log(0)
barplot(art.tab+1, ylab="log(Frequency+1)", xlab="Number of articles", 
log="y")

# plot on log scale, directly
barplot(log(art.tab+1), ylab="log(Frequency+1)", xlab="Number of articles")

The first method, using log="y" gives axis labels on the scale of frequency.

best,
-Michael


On 9/2/2014 3:49 PM, John Fox wrote:
> Hi Michael,
>
> I think that histograms are intrinsically misleading for discrete data, and
> that while bar graphs are an improvement, they also invite
> misinterpretation. I generally do something like this:
>
> f <- table(factor(art, levels=0:19))
> plot(as.numeric(names(f)), as.numeric(f), type="h",
>      xlab="art", ylab="frequency", axes=FALSE)
> axis(1, pos=0, at=0:19)
> axis(2)
> points(as.numeric(names(f)), f, pch=16)
> abline(h=0)
>
>
> Actually, I prefer omitting the points corresponding to 0 counts, which is
> even simpler:
>
> f <- table(art)
> plot(as.numeric(names(f)), as.numeric(f), type="h",
>      xlab="art", ylab="frequency", axes=FALSE)
> axis(1, pos=0, at=min(art):max(art))
> axis(2)
> points(as.numeric(names(f)), f, pch=16)
> abline(h=0)
>
>
> Best,
>   John
>
> -----------------------------------------------
> John Fox, Professor
> McMaster University
> Hamilton, Ontario, Canada
> http://socserv.socsci.mcmaster.ca/jfox/
>
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Michael Friendly
>> Sent: Tuesday, September 02, 2014 1:29 PM
>> To: R-help
>> Subject: [R] frequencies of a discrete numeric variable, including
>> zeros
>>
>> The data vector, art, given below using dput(),  gives a set of
>> discrete
>> numeric values for 915 observations,
>> in the range of 0:19.  I want to make some plots of the frequency
>> distribution, but the standard
>> tools (hist, barplot, table) don't give me what I want to make a custom
>> plot due to 0 frequencies
>> for some of the 0:19 counts.
>>
>> table() excludes the values of art that occur with zero frequency, and
>> these are excluded in
>> barplot()
>>   > table(art)
>> art
>>     0   1   2   3   4   5   6   7   8   9  10  11  12  16  19
>> 275 246 178  84  67  27  17  12   1   2   1   1   2   1   1
>>   > barplot(table(art))
>>
>>
>> A direct calculation, using colSums of outer() gives me the values I
>> want, but this seems unnecessarily
>> complicated for this simple task.
>>
>>   > (art.freq <- colSums(outer(art, 0:19, `==`)))
>>    [1] 275 246 178  84  67  27  17  12   1   2   1   1   2   0   0 0
>> 1   0   0   1
>>   >  barplot(art.freq, names.arg=0:19)
>>
>>
>> Moreover, I was surprised by the result of hist() on this data, because
>> the 0 & 1 counts from
>> the above were combined in this call:
>>
>>   > art.hist <- hist(art, breaks=0:19, plot=FALSE)
>>   > art.hist$breaks
>>    [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
>>   > art.hist$counts
>>    [1] 521 178  84  67  27  17  12   1   2   1   1   2   0   0   0 1
>> 0   0   1
>>
>> Is there some option I missed here?
>>
>> The data:
>>
>>   > dput(art)
>> c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>> 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>> 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
>> 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L,
>> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L,
>> 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 9L, 9L, 10L,
>> 11L, 12L, 12L, 16L, 19L)
>>
>> --
>> Michael Friendly     Email: friendly AT yorku DOT ca
>> Professor, Psychology Dept. & Chair, Quantitative Methods
>> York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
>> 4700 Keele Street    Web:http://www.datavis.ca
>> Toronto, ONT  M3J 1P3 CANADA
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.


-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA



More information about the R-help mailing list