[R] Add Gauss normal curve ?

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Sat Apr 11 17:00:45 CEST 2020

```On 4/11/20 7:00 AM, varin sacha via R-help wrote:
> Dear R-experts,
>
> Here below my reproducible example. I would like to fit/add the Gauss normal curve to this data.
> I don't get it. There is no error message but I don't get what I am looking for.
> Many thanks for your help.
>
> ############################################################
> mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10"))
>
> nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608)
>
> plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates))))
>
> x <- seq(min(mydates), max(mydates), 0.1)
>
> curve(dnorm(x, mean(nc), sd(nc)), add=TRUE, col="red", lwd=2)

(I infer) The values in the `nc` vector are not taken from observations
that are interpretable as independent sampling from a continuous random
vector. They are counts, i.e. "new cases".

Furthermore, the "x" value in your plot is not the `nc` vector but
rather it is the the ""y"-vector, so even if it were appropriate to use
a Normal curve for fitting you would need to take the `nc` vector as
corresponding to a density along the time axis.

You could probably do as well by "eyeballing" where you want the
"normal" curve to sit, since there would be no theoretical support for
more refined curve fitting efforts. You might also need to scale the
density values so they would appear as something other than a flat line.

And the `curve` function does need an expression but it would be
plotting that result far to the left of your current plotting range
which is set by the integer values of those dates, i.e values in the
tens of thousands. Use the `lines` function for better control.

lines( x= as.numeric(mydates),

# 3000 was eyeball guess as to a scaling factor that
might work

# but needed a larger number to make the curves
commensurate

y=10000* dnorm( x= as.numeric(mydates),  #set a proper x scale

mean= as.numeric( mydates[ which.max(nc) ]),
#use location of max

sd= 7) )

Might need to use smaller value for the "standard deviation" and higher
scaling factor to improve the eyeball fit.You might like a value of
sd=4, but it would remain an unsupportable effort from a statistical
viewpoint.

--

David

> ############################################################
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help