[R] Request for functions to calculate correlated factors influencing an outcome.

Lalitha Viswanathan lalitha.viswanathan79 at gmail.com
Mon May 4 10:40:11 CEST 2015


Hi
I used the MASS library
library(MASS)  (by reading about examples at
http://www.statmethods.net/stats/regression.html
<http://s.bl-1.com/h/ofLlK27?url=http://www.statmethods.net/stats/regression.html>
)
fit <- lm(Mileage~Disp+HP+Weight+Reliability,data=newx)
step <- stepAIC(fit, direction="both")
step$anova # display results

It showed the most relevant variables affecting Mileage.
While that is a start, I am looking for a model that fits the entire data
(including Mileage), not factors that influence Mileage.

Multi model inference / selection.

I was reading about glmulti.
Are there any other packages I could look at, for infering models that best
fit the data.

To use nlm / nls, I need a formula, as one of the parameters to best fit
the data and I am looking for functions that will help infer that formula
from the data.

Thanks
lalitha

On Sun, May 3, 2015 at 11:33 PM, Prashant Sethi <theseth.prashant at gmail.com>
wrote:

> Hi,
>
> I'm not an expert in data analysis (a beginner still learning tricks of
> the trade) but I believe in your case since you're trying to determine the
> correlation of a dependent variable with a number of factor variables, you
> should try doing the regression analysis of your model. The function you'll
> use for that is the lm() function. You can use the forward building or the
> backward elimination method to build your model with the most critical
> factors included.
>
> Maybe you can give it a try.
>
> Thanks and regards,
> Prashant Sethi
> On 3 May 2015 23:18, "Lalitha Viswanathan" <
> lalitha.viswanathan79 at gmail.com> wrote:
>
>> Hi
>> I am sorry, I saved the file removing the dot after the Disp (as I was
>> going wrong on a read.delim which threw an error about !header, etc...The
>> dot was not the culprit, but I continued to leave it out.
>> Let me paste the full code here.
>> x<-read.table("/Users/Documents/StatsTest/fuelEfficiency.txt",
>> header=TRUE,
>> sep="\t")
>> x<-data.frame(x)
>> for (i in unique(x$Country)) { print (i); y <- subset(x, x$Country == i);
>> print(y); }
>> newx <- subset (x, select = c(Price, Reliability, Mileage, Weight, Disp,
>> HP))
>> cor(newx, method="pearson")
>> my.cor <-cor.test(newx$Weight, newx$Price, method="spearman")
>> my.cor <-cor.test(newx$Weight, newx$HP, method="spearman")
>> my.cor <-cor.test(newx$Disp, newx$HP, method="spearman")
>> Putting exact=NULL still doesn't remove the warning
>> my.cor <-cor.test(newx$Disp, newx$HP, method="kendall", exact=NULL)
>> I tried to find the correlation coeff for a various combination of
>> variables, but am unable to interpet the results. (Results pasted below in
>> an earlier post)
>>
>> Followed it up with a normality test
>> shapiro.test(newx$Disp)
>> shapiro.test(newx$HP)
>>
>> Then decided to do a kruskal.test(newx)
>> with the result
>> Kruskal-Wallis chi-squared = 328.94, df = 5, p-value < 2.2e-16
>>
>> Question is : I am trying to find factors influencing efficiency (in this
>> case mileage)
>>
>> What are the range of functions / examples I should be looking at, to find
>> a factor or combination of factors influencing efficiency?
>>
>> Any pointers will be helpful
>>
>> Thanks
>> Lalitha
>>
>> On Sun, May 3, 2015 at 2:49 PM, Lalitha Viswanathan <
>> lalitha.viswanathan79 at gmail.com> wrote:
>>
>> > Hi
>> > I have a dataset of the type attached.
>> > Here's my code thus far.
>> > dataset <-data.frame(read.delim("data", sep="\t", header=TRUE));
>> > newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight,
>> > Disp, HP));
>> > cor(newData, method="pearson");
>> > Results are
>> >                  Price Reliability    Mileage     Weight       Disp
>> >   HP
>> > Price        1.0000000          NA -0.6537541  0.7017999  0.4856769
>> >  0.6536433
>> > Reliability         NA           1         NA         NA         NA
>> >   NA
>> > Mileage     -0.6537541          NA  1.0000000 -0.8478541 -0.6931928
>> > -0.6667146
>> > Weight       0.7017999          NA -0.8478541  1.0000000  0.8032804
>> >  0.7629322
>> > Disp         0.4856769          NA -0.6931928  0.8032804  1.0000000
>> >  0.8181881
>> > HP           0.6536433          NA -0.6667146  0.7629322  0.8181881
>> >  1.0000000
>> >
>> > It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP
>> and
>> > Price are strongly correlated.
>> > To find the statistical significance,
>> > I am trying  sample.correln<-cor.test(newData$Disp, newData$HP,
>> > method="kendall", exact=NULL)
>> > Kendall's rank correlation tau
>> >
>> > data:  newx$Disp and newx$HP
>> > z = 7.2192, p-value = 5.229e-13
>> > alternative hypothesis: true tau is not equal to 0
>> > sample estimates:
>> >       tau
>> > 0.6563871
>> >
>> > If I try the same with
>> > sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson",
>> > exact=NULL)
>> > I get Warning message:
>> > In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact =
>> NULL)
>> > :
>> >   Cannot compute exact p-value with ties
>> > > sample.correln
>> >
>> > Spearman's rank correlation rho
>> >
>> > data:  newx$Disp and newx$HP
>> > S = 5716.8, p-value < 2.2e-16
>> > alternative hypothesis: true rho is not equal to 0
>> > sample estimates:
>> >       rho
>> > 0.8411566
>> >
>> > I am not sure how to interpret these values.
>> > Basically, I am trying to figure out which combination of factors
>> > influences efficiency.
>> >
>> > Thanks
>> > Lalitha
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list