[R] Automating R for Hypothesis Testing

Rui Barradas ruipbarradas at sapo.pt
Thu May 10 17:30:05 CEST 2012


Hello,

I'm glad it helped.

As for your second question, I don't know, but I'm not very comfortable with
the way you're doing things.
Why subtract the coefficients of model 1 from model 2?
And why the dummy? Why set model 1 to zero?

Isn't it better to use anova's F? After all, it's designed for it, for the
linear model...
And if you really want/need the dummy, wouldn't a nested anova do it? (F
statistic, once again.)

anova(model1, model2)

is simple and statistically speaking seems to me much better. (I specially
don't like the subtraction bit.)

Rui Barradas

meredith wrote
> 
> Rui-
>   Thanks this definitely helps, just one quick question. How would you
> code the values of chi-fm and chi-fms to change based on the degrees of
> freedom of each model H(i)?
> 
> Meredith
> 
> 
> Rui Barradas wrote
>> 
>> Hello,
>> 
>> Yes, it does help. Now we can see your data and what you're doing.
>> What follows is a suggestion on what you could do, not full solution.
>> (You forgot to say what X1 is, but I don't think it's important to
>> understand the suggestion.)
>> (If I'm wrong, say something.)
>> 
>> 
>> milwaukeephos <- read.csv("milwaukeephos.csv", header=TRUE,
>> stringsAsFactors=FALSE)
>> # list of data.frames, one per month
>> ls1 <- split(milwaukeephos, milwaukeephos$month)
>> 
>> #--------- if you want to keep the models, not needed if you don't.
>> #          (yoy probably don't)
>> modelH <- vector("list", 12)
>> modelHa <- vector("list", 12)
>> modelH2 <- vector("list", 12)
>> modelH2a <- vector("list", 12)
>> #--------- values to record, these are needed, create them beforehand.
>> chi_fm <- numeric(12)
>> chi_fms <- numeric(12)
>> #
>> seq_months <- c(1:12, 1) # wrap months around.
>> for(i in 1:12){
>> 	month_this <- seq_months[i]
>> 	month_next <- seq_months[i + 1]
>> 
>> 	lload <- c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
>> 	lflow <- c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
>> 	modelH[[i]] <- lm(lload ~ lflow)
>> 	# If you don't want to keep the models, use modelH only
>> 	# ( without [[i]] )
>> 	# and do the same with X1
>> 
>> 	# rest of your code for first test goes here
>> 	chi_fm[i] <- bfm %*% var_fm %*% (bunres_fm - bres_fm)
>> 
>> 	# and the same for the second test
>> 	chi_fms[i] <- ...etc...
>> }
>> 
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> 
>> 
>> meredith wrote
>>> 
>>> dput:  http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
>>> milwaukeephos.csv 
>>> 
>>> # Feb-march
>>>> modelH_febmarch<-lm(llfeb_march~lffeb_march)
>>>>modelHa_febmarch<-lm(llfeb_march~X1feb_mar+lffeb_march)
>>>> anova(modelHa_febmarch)
>>>> coefficients(modelH_febmarch)
>>> (Intercept) lffeb_march 
>>>   -2.429890    1.172821 
>>>> coefficients(modelHa_febmarch)
>>> (Intercept)   X1feb_mar lffeb_march 
>>>  -2.8957776  -0.5272793   1.3016303 
>>>> bres_fm<-matrix(c(-2.429890,0,1.172821),nrow=3)
>>>> bunres_fm<-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
>>>>bfm<-t(bunres_fm-bres_fm)
>>>> fmvect<-seq(1,1,length=34)
>>>> X1a_febmar<-seq(0,0,length=9) # dummy variable step 1
>>>> X1b_febmar<-seq(1,1,length=25) # dummy variable step 2
>>>> X1feb_mar<-c(X1a_febmar,X1b_febmar) #dummy variable creation
>>> # Test Stat Equation for Chisq
>>>> fmxx<-cbind(fmvect,X1feb_mar,lffeb_march)
>>>> tfmx<-t(fmxx)
>>>> xcom_fm<-(tfmx %*% fmxx)
>>>> xinv_fm<-ginv(xcom_fm)
>>>> var_fm<-xinv_fm*0.307
>>>> chi_fm<-bfm %*% var_fm %*% (bunres_fm-bres_fm)
>>>> chi_fm # chisq value for recording
>>> if less than CV move onto to slope modification
>>>> modelH2_febmarch<-lm(llfeb_march~X3feb_march)
>>>> modelH2a_febmarch<-lm(llfeb_march~X3feb_march+X4feb_march)
>>>> anova(modelH2a_febmarch)
>>>> coefficients(modelH2_febmarch) # get coefficients to make beta vectors
>>>> for test
>>> (Intercept) X3feb_march 
>>>    5.342130    1.172821 
>>>> coefficients(modelH2a_febmarch)
>>> (Intercept) X3feb_march X4feb_march 
>>>   5.2936263   1.0353752   0.2407557 
>>> # Test Stat
>>>> bsres_fm<-matrix(c(5.342130,1.172821,0),nrow=3)
>>>> bsunres_fm<-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
>>>> bsfm<-t(bsunres_fm-bsres_fm)
>>>> #X matrix
>>>> fmxs<-cbind(fmvect,X3feb_march,X4feb_march)
>>>> tfmxs<-t(fmxs)
>>>> xcoms_fm<-(tfmxs %*% fmxs)
>>>> xinvs_fm<-ginv(xcoms_fm)
>>>> var_fms<-xinvs_fm*0.341
>>>> chi_fms<-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
>>>> chi_fms
>>> # Record Chisq value
>>> 
>>> Does this help?
>>> Here lffeb_march is the combination of Feb and March log flows
>>> and llfeb_march is the combination of Feb and March log loads
>>> X3: lffeb_march-mean(feb_march)
>>> X4: X1*X3
>>> 
>>> Thanks
>>> 
>>> Rui Barradas wrote
>>>> 
>>>> Hello,
>>>> 
>>>> I'm not at all sure if I understand your problem. Does this describe
>>>> it?
>>>> 
>>>> 
>>>> test first model for months 1 and 2
>>>> if test statistic less than critical value{
>>>> 	test second model for months 1 and 2
>>>> 	print results of the first and second tests? just one of them?
>>>> }
>>>> move on to months 2 and 3
>>>> etc, until months 12 and 1
>>>> 
>>>> 
>>>> Please post example data using dput(dataset).
>>>> Just copy it's output and paste it in your post.
>>>> 
>>>> And example code, what you're already doing.
>>>> (Possibly simplified)
>>>> 
>>>> Rui Barradas
>>>> 
>>>> 
>>>> meredith wrote
>>>>> 
>>>>> R Users-
>>>>>   I have been trying to automate a manual code that I have developed
>>>>> for calling in a .csv file, isolating certain rows and columns that
>>>>> correspond to specified months:
>>>>> something to the effect
>>>>> i=name.csv
>>>>> N=length(i$month)
>>>>> iphos1=0
>>>>> iphos2=0
>>>>> isphos3=0
>>>>> for i=1,N
>>>>>  if month=1
>>>>>     iphos1=iphos+1
>>>>>     iphos1(iphos1)=i
>>>>> 
>>>>> an so on to call out the months into there own arrays (unless there is
>>>>> a way I can wrap it into the next automation)
>>>>> 
>>>>> Next: I would like to run a simple linear regression combining each of
>>>>> the months 1 by 1:
>>>>> for instance I want to run a regression on a combined model from
>>>>> months 1 and 2 and a dummy model for 1 and 2, compare them using a
>>>>> Chi-sq distribution, if Chi-sq is less than the Critical value, we
>>>>> accept and go on to test another set of models with both 1 and 2. If
>>>>> it rejects, then we proceed to months 2 and 3.  If we move on to the
>>>>> second set on months 1 and 2, and the critical value is accepted, I
>>>>> want to print an accept or reject and move on to months 2 and 3, until
>>>>> finally comparing months 12-1 at the end.
>>>>> Is there a way to loop or automate this in R?
>>>>> 
>>>>> Thanks
>>>>> Meredith
>>>>> 
>>>> 
>>> 
>>> Rui Barradas wrote
>>>> 
>>>> Hello,
>>>> 
>>>> I'm not at all sure if I understand your problem. Does this describe
>>>> it?
>>>> 
>>>> 
>>>> test first model for months 1 and 2
>>>> if test statistic less than critical value{
>>>> 	test second model for months 1 and 2
>>>> 	print results of the first and second tests? just one of them?
>>>> }
>>>> move on to months 2 and 3
>>>> etc, until months 12 and 1
>>>> 
>>>> 
>>>> Please post example data using dput(dataset).
>>>> Just copy it's output and paste it in your post.
>>>> 
>>>> And example code, what you're already doing.
>>>> (Possibly simplified)
>>>> 
>>>> Rui Barradas
>>>> 
>>>> 
>>>> meredith wrote
>>>>> 
>>>>> R Users-
>>>>>   I have been trying to automate a manual code that I have developed
>>>>> for calling in a .csv file, isolating certain rows and columns that
>>>>> correspond to specified months:
>>>>> something to the effect
>>>>> i=name.csv
>>>>> N=length(i$month)
>>>>> iphos1=0
>>>>> iphos2=0
>>>>> isphos3=0
>>>>> for i=1,N
>>>>>  if month=1
>>>>>     iphos1=iphos+1
>>>>>     iphos1(iphos1)=i
>>>>> 
>>>>> an so on to call out the months into there own arrays (unless there is
>>>>> a way I can wrap it into the next automation)
>>>>> 
>>>>> Next: I would like to run a simple linear regression combining each of
>>>>> the months 1 by 1:
>>>>> for instance I want to run a regression on a combined model from
>>>>> months 1 and 2 and a dummy model for 1 and 2, compare them using a
>>>>> Chi-sq distribution, if Chi-sq is less than the Critical value, we
>>>>> accept and go on to test another set of models with both 1 and 2. If
>>>>> it rejects, then we proceed to months 2 and 3.  If we move on to the
>>>>> second set on months 1 and 2, and the critical value is accepted, I
>>>>> want to print an accept or reject and move on to months 2 and 3, until
>>>>> finally comparing months 12-1 at the end.
>>>>> Is there a way to loop or automate this in R?
>>>>> 
>>>>> Thanks
>>>>> Meredith
>>>>> 
>>>> 
>>> 
>> 
> 


--
View this message in context: http://r.789695.n4.nabble.com/Automating-R-for-Hypothesis-Testing-tp4618653p4623790.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list