[R] how to show percentage of individuals for two groups on histogram?

Eric Berger er|cjberger @end|ng |rom gm@||@com
Fri May 22 07:18:27 CEST 2020


Hi Ana,
This is a very common question about ggplot.
A quick search turns up lots of hits that answer your question. Here
are a couple
https://community.rstudio.com/t/trouble-scaling-y-axis-to-percentages-from-counts/42999
https://stackoverflow.com/questions/3695497/show-instead-of-counts-in-charts-of-categorical-variables

>From reading those discussions, the following should work (untested)

ggplot(a, aes(x = HBA1C, fill=pheno)) + geom_histogram(aes(y =
stat(density)), binwidth = 0.5) +
      scale_y_continuous(labels = scales::percent_format())

HTH,
Eric


On Fri, May 22, 2020 at 7:18 AM Jim Lemon <drjimlemon using gmail.com> wrote:
>
> Hi Ana,
> Just noticed a typo from a hasty cut-paste. Two lines should read:
>
> casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
>
> Jim
>
> On Fri, May 22, 2020 at 2:08 PM Jim Lemon <drjimlemon using gmail.com> wrote:
> >
> > Hi Ana,
> > My apologies for the pedestrian graphics, but it may help.
> >
> > # a bit of fake data
> > aafd<-data.frame(FID=paste0("fam",1000:2739),
> >  IID=paste0("G",1000,2739),FLASER=rep(1,1740),
> >  PLASER=c(rep(1,892),rep(2,848)),
> >  DIABDUR=sample(10:50,1740,TRUE),
> >  HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
> >  pheno=c(rep("control",892),rep("case",848)))
> > par(mfrow=c(2,1))
> > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> > par(mar=c(0,4,1,2))
> > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
> >  space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
> > text(mean(barpos),23,
> >  "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
> > box()
> > par(mar=c(3,4,0,2))
> > barplot(100*controlhist,names.arg=names(controlpct),
> >  space=0,ylab="Percentage",col="orange",ylim=c(0,25))
> > text(mean(barpos),23,
> >  "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
> > box()
> >
> > Jim
> >
> > On Fri, May 22, 2020 at 9:08 AM Ana Marija <sokovic.anamarija using gmail.com> wrote:
> > >
> > > the result would basically look something like this on in attach or
> > > the overlay of those two plots
> > >
> > >
> > > On Thu, May 21, 2020 at 5:23 PM Ana Marija <sokovic.anamarija using gmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have a data frame like this:
> > > > > head(a)
> > > >          FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> > > > 1 fam1000-03 G1000      1      1      38  10.2    1 control
> > > > 2 fam1001-03 G1001      1      1      15   7.3    1 control
> > > > 3 fam1003-03 G1003      1      2      17   7.0    1    case
> > > > 4 fam1005-03 G1005      1      1      36   7.7    1 control
> > > > 5 fam1009-03 G1009      1      1      23   7.6    1 control
> > > > 6 fam1052-03 G1052      1      1      32   7.3    1 control
> > > >
> > > > > dim(a)
> > > > [1] 1698    8
> > > >
> > > > I am doing histogram plot via:
> > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > > > position="dodge")
> > > >
> > > > there is 848 who have "case" in pheno column and 892 who have
> > > > "control" in pheno column.
> > > >
> > > > I would like to have on y-axis shown percentage of individuals which
> > > > have either "case" or "control" in pheno instead of count.
> > > >
> > > > Please advise,
> > > > Ana
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list