[R] Request for functions to calculate correlated factors influencing an outcome.

Lalitha Viswanathan lalitha.viswanathan79 at gmail.com
Sun May 3 19:46:49 CEST 2015


Hi
I am sorry, I saved the file removing the dot after the Disp (as I was
going wrong on a read.delim which threw an error about !header, etc...The
dot was not the culprit, but I continued to leave it out.
Let me paste the full code here.
x<-read.table("/Users/Documents/StatsTest/fuelEfficiency.txt", header=TRUE,
sep="\t")
x<-data.frame(x)
for (i in unique(x$Country)) { print (i); y <- subset(x, x$Country == i);
print(y); }
newx <- subset (x, select = c(Price, Reliability, Mileage, Weight, Disp,
HP))
cor(newx, method="pearson")
my.cor <-cor.test(newx$Weight, newx$Price, method="spearman")
my.cor <-cor.test(newx$Weight, newx$HP, method="spearman")
my.cor <-cor.test(newx$Disp, newx$HP, method="spearman")
Putting exact=NULL still doesn't remove the warning
my.cor <-cor.test(newx$Disp, newx$HP, method="kendall", exact=NULL)
I tried to find the correlation coeff for a various combination of
variables, but am unable to interpet the results. (Results pasted below in
an earlier post)

Followed it up with a normality test
shapiro.test(newx$Disp)
shapiro.test(newx$HP)

Then decided to do a kruskal.test(newx)
with the result
Kruskal-Wallis chi-squared = 328.94, df = 5, p-value < 2.2e-16

Question is : I am trying to find factors influencing efficiency (in this
case mileage)

What are the range of functions / examples I should be looking at, to find
a factor or combination of factors influencing efficiency?

Any pointers will be helpful

Thanks
Lalitha

On Sun, May 3, 2015 at 2:49 PM, Lalitha Viswanathan <
lalitha.viswanathan79 at gmail.com> wrote:

> Hi
> I have a dataset of the type attached.
> Here's my code thus far.
> dataset <-data.frame(read.delim("data", sep="\t", header=TRUE));
> newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight,
> Disp, HP));
> cor(newData, method="pearson");
> Results are
>                  Price Reliability    Mileage     Weight       Disp
>   HP
> Price        1.0000000          NA -0.6537541  0.7017999  0.4856769
>  0.6536433
> Reliability         NA           1         NA         NA         NA
>   NA
> Mileage     -0.6537541          NA  1.0000000 -0.8478541 -0.6931928
> -0.6667146
> Weight       0.7017999          NA -0.8478541  1.0000000  0.8032804
>  0.7629322
> Disp         0.4856769          NA -0.6931928  0.8032804  1.0000000
>  0.8181881
> HP           0.6536433          NA -0.6667146  0.7629322  0.8181881
>  1.0000000
>
> It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP and
> Price are strongly correlated.
> To find the statistical significance,
> I am trying  sample.correln<-cor.test(newData$Disp, newData$HP,
> method="kendall", exact=NULL)
> Kendall's rank correlation tau
>
> data:  newx$Disp and newx$HP
> z = 7.2192, p-value = 5.229e-13
> alternative hypothesis: true tau is not equal to 0
> sample estimates:
>       tau
> 0.6563871
>
> If I try the same with
> sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson",
> exact=NULL)
> I get Warning message:
> In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact = NULL)
> :
>   Cannot compute exact p-value with ties
> > sample.correln
>
> Spearman's rank correlation rho
>
> data:  newx$Disp and newx$HP
> S = 5716.8, p-value < 2.2e-16
> alternative hypothesis: true rho is not equal to 0
> sample estimates:
>       rho
> 0.8411566
>
> I am not sure how to interpret these values.
> Basically, I am trying to figure out which combination of factors
> influences efficiency.
>
> Thanks
> Lalitha
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list