[R] Help sourcing datasets (.csv)

Ebert,Timothy Aaron tebert @end|ng |rom u||@edu
Fri Jun 2 17:55:08 CEST 2023


Another suggestion:
     The statistics does not care where the numbers come from. The values 1, 2, 3 have a mean of 2 no matter if these are weights of a bird, plant heights, or concrete tensile strength. Your interpretation might change, but the mean is still 2.

Try synthetic data.
X<-rnorm(1000, mean=4, sd=2)
Y<-14+12*X
cor(X,Y)

That is too simple, but it is the start.
Y<- rnorm(1000, mean=14, sd=2) + 12*X
cor(X,Y)

look at the result in something like ggplot2
Dataf <- data.frame(X,Y)
ggplot(Dataf, aes(X, Y)) + geom.point() + stat_smooth(method=lm, se=FALSE)


This approach has a few advantages:
1) I know that X and Y are samples from the Gaussian (Normal) distribution.
2) I know that the data are homoscedastic.
3) I can change 1 and 2 in whatever way I want. Possibly useful if you want to understand how violations in model assumptions influence outcomes.
4) I can look closely at the influence of sample size when assumptions are met and when they are not.

Note that ANOVA and regression do not assume that the independent or dependent variables are normally distributed. The assumption of Normality is for the error term in the model. However, if both dependent and independent variables are normally distributed then it is likely that the error term will also be normally distributed.

What should I get here?
Y<- rnorm(1000, mean=14, sd=2) + X*rnorm(1000, mean=12, sd=27)



Tim
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Uwe Ligges
Sent: Friday, June 2, 2023 5:18 AM
To: james carrigan <james5431 using hotmail.com>; r-help using r-project.org
Subject: Re: [R] Help sourcing datasets (.csv)

[External Email]

See ?data


On 28.05.2023 10:53, james carrigan wrote:
> Dear Sir or Madam
> I'm trying to compile a collection of datasets that require use of the following hypothesis tests.
> Are there datasets within the R library that I can get access to?
> Kind regards
> James Carrigan
>
> Hypothesis Testing
> t.test(X,Y)
> - performs a two sample t-test between X and Y
> t.test(X,Y,paired=TRUE)
> - performs a paired t-test between X and Y prop.test(x = c(a, b), n =
> c(n1, n2)) - performs a 2-sample test for equality of proportions with
> continuity correction
>
> Sent from my iPad Sent from my iPhone
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
> %7C5f3292c3315b446b8b9008db634a37cb%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638212942641271785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
> ta=s5NqLzxYTlnA1BHldzka%2F2i%2FoefvsLmU%2FDuLJav5mMc%3D&reserved=0
> PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C5f
> 3292c3315b446b8b9008db634a37cb%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> 7C0%7C638212942641271785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xG
> MQDNZksGydmnYDLAFjZ%2BEZp4ne%2Bf5JK%2BO9qrH7zeU%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list