[R] Add Gauss normal curve ?

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Sat Apr 11 17:00:45 CEST 2020


On 4/11/20 7:00 AM, varin sacha via R-help wrote:
> Dear R-experts,
>
> Here below my reproducible example. I would like to fit/add the Gauss normal curve to this data.
> I don't get it. There is no error message but I don't get what I am looking for.
> Many thanks for your help.
>
> ############################################################
> mydates <- as.Date(c("2020-03-15", "2020-03-16","2020-03-17","2020-03-18","2020-03-19","2020-03-20","2020-03-21","2020-03-22","2020-03-23","2020-03-24","2020-03-25","2020-03-26","2020-03-27","2020-03-28","2020-03-29","2020-03-30","2020-03-31","2020-04-01","2020-04-02","2020-04-03","2020-04-04","2020-04-05","2020-04-06","2020-04-07","2020-04-08","2020-04-09","2020-04-10"))
>
> nc<-c(1,1,2,7,3,6,6,20,17,46,67,71,56,70,85,93,301,339,325,226,608,546,1069,1264,1340,813,608)
>
> plot(as.Date(mydates),nc,pch=16,type="o",col="blue",ylim=c(1,1400), xlim=c(min(as.Date(mydates)),max(as.Date(mydates))))
>
> x <- seq(min(mydates), max(mydates), 0.1)
>
> curve(dnorm(x, mean(nc), sd(nc)), add=TRUE, col="red", lwd=2)


(I infer) The values in the `nc` vector are not taken from observations 
that are interpretable as independent sampling from a continuous random 
vector. They are counts, i.e. "new cases".


Furthermore, the "x" value in your plot is not the `nc` vector but 
rather it is the the ""y"-vector, so even if it were appropriate to use 
a Normal curve for fitting you would need to take the `nc` vector as 
corresponding to a density along the time axis.

You could probably do as well by "eyeballing" where you want the 
"normal" curve to sit, since there would be no theoretical support for 
more refined curve fitting efforts. You might also need to scale the 
density values so they would appear as something other than a flat line.

And the `curve` function does need an expression but it would be 
plotting that result far to the left of your current plotting range 
which is set by the integer values of those dates, i.e values in the 
tens of thousands. Use the `lines` function for better control.


lines( x= as.numeric(mydates),

                   # 3000 was eyeball guess as to a scaling factor that 
might work

                   # but needed a larger number to make the curves 
commensurate

        y=10000* dnorm( x= as.numeric(mydates),  #set a proper x scale

                        mean= as.numeric( mydates[ which.max(nc) ]),  
#use location of max

                        sd= 7) )


Might need to use smaller value for the "standard deviation" and higher 
scaling factor to improve the eyeball fit.You might like a value of 
sd=4, but it would remain an unsupportable effort from a statistical 
viewpoint.



-- 

David

> ############################################################
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list