[R] Request for functions to calculate correlated factors influencing an outcome.

Prashant Sethi theseth.prashant at gmail.com
Sun May 3 20:03:03 CEST 2015


Hi,

I'm not an expert in data analysis (a beginner still learning tricks of the
trade) but I believe in your case since you're trying to determine the
correlation of a dependent variable with a number of factor variables, you
should try doing the regression analysis of your model. The function you'll
use for that is the lm() function. You can use the forward building or the
backward elimination method to build your model with the most critical
factors included.

Maybe you can give it a try.

Thanks and regards,
Prashant Sethi
On 3 May 2015 23:18, "Lalitha Viswanathan" <lalitha.viswanathan79 at gmail.com>
wrote:

> Hi
> I am sorry, I saved the file removing the dot after the Disp (as I was
> going wrong on a read.delim which threw an error about !header, etc...The
> dot was not the culprit, but I continued to leave it out.
> Let me paste the full code here.
> x<-read.table("/Users/Documents/StatsTest/fuelEfficiency.txt", header=TRUE,
> sep="\t")
> x<-data.frame(x)
> for (i in unique(x$Country)) { print (i); y <- subset(x, x$Country == i);
> print(y); }
> newx <- subset (x, select = c(Price, Reliability, Mileage, Weight, Disp,
> HP))
> cor(newx, method="pearson")
> my.cor <-cor.test(newx$Weight, newx$Price, method="spearman")
> my.cor <-cor.test(newx$Weight, newx$HP, method="spearman")
> my.cor <-cor.test(newx$Disp, newx$HP, method="spearman")
> Putting exact=NULL still doesn't remove the warning
> my.cor <-cor.test(newx$Disp, newx$HP, method="kendall", exact=NULL)
> I tried to find the correlation coeff for a various combination of
> variables, but am unable to interpet the results. (Results pasted below in
> an earlier post)
>
> Followed it up with a normality test
> shapiro.test(newx$Disp)
> shapiro.test(newx$HP)
>
> Then decided to do a kruskal.test(newx)
> with the result
> Kruskal-Wallis chi-squared = 328.94, df = 5, p-value < 2.2e-16
>
> Question is : I am trying to find factors influencing efficiency (in this
> case mileage)
>
> What are the range of functions / examples I should be looking at, to find
> a factor or combination of factors influencing efficiency?
>
> Any pointers will be helpful
>
> Thanks
> Lalitha
>
> On Sun, May 3, 2015 at 2:49 PM, Lalitha Viswanathan <
> lalitha.viswanathan79 at gmail.com> wrote:
>
> > Hi
> > I have a dataset of the type attached.
> > Here's my code thus far.
> > dataset <-data.frame(read.delim("data", sep="\t", header=TRUE));
> > newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight,
> > Disp, HP));
> > cor(newData, method="pearson");
> > Results are
> >                  Price Reliability    Mileage     Weight       Disp
> >   HP
> > Price        1.0000000          NA -0.6537541  0.7017999  0.4856769
> >  0.6536433
> > Reliability         NA           1         NA         NA         NA
> >   NA
> > Mileage     -0.6537541          NA  1.0000000 -0.8478541 -0.6931928
> > -0.6667146
> > Weight       0.7017999          NA -0.8478541  1.0000000  0.8032804
> >  0.7629322
> > Disp         0.4856769          NA -0.6931928  0.8032804  1.0000000
> >  0.8181881
> > HP           0.6536433          NA -0.6667146  0.7629322  0.8181881
> >  1.0000000
> >
> > It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP and
> > Price are strongly correlated.
> > To find the statistical significance,
> > I am trying  sample.correln<-cor.test(newData$Disp, newData$HP,
> > method="kendall", exact=NULL)
> > Kendall's rank correlation tau
> >
> > data:  newx$Disp and newx$HP
> > z = 7.2192, p-value = 5.229e-13
> > alternative hypothesis: true tau is not equal to 0
> > sample estimates:
> >       tau
> > 0.6563871
> >
> > If I try the same with
> > sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson",
> > exact=NULL)
> > I get Warning message:
> > In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact =
> NULL)
> > :
> >   Cannot compute exact p-value with ties
> > > sample.correln
> >
> > Spearman's rank correlation rho
> >
> > data:  newx$Disp and newx$HP
> > S = 5716.8, p-value < 2.2e-16
> > alternative hypothesis: true rho is not equal to 0
> > sample estimates:
> >       rho
> > 0.8411566
> >
> > I am not sure how to interpret these values.
> > Basically, I am trying to figure out which combination of factors
> > influences efficiency.
> >
> > Thanks
> > Lalitha
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list