[R] Adjusting length of series

Sun Jul 1 03:39:57 CEST 2012

Hello,
Try this:

Dcr<-lapply(1:5,function(x) rnorm(10,15)) 
names(Dcr)<- c("Dcre1","Dcre2","Dcre3","Dcre4","Dcre5")
#Works
regCred<-lm(Dcr[[1]]~Dcr[[2]]+Dcr[[3]])
 summary(regCred)
#Works
 regCred2<-lm(Dcre1~Dcre2+Dcre3,data=Dcr)
 summary(regCred)
# Do not work
regCred3<-lm(Dcr[[1]][1:5]~Dcr[[4]][1:5]+Dcre5,data=Dcr)
Error in model.frame.default(formula = Dcr[[1]][1:5] ~ Dcr[[4]][1:5] +  : 
  variable lengths differ (found for 'Dcre5')
#I guess this is what happened in your example, when different variable lengths are used

#If you had used,

regCred3<-lm(Dcr[[1]][1:5]~Dcr[[4]][1:5]+Dcre5[1:5],data=Dcr)
 summary(regCred3)
#it works
#this also works

regCred4<-lm(Dcre1[1:5]~Dcre2[1:5]+Dcre3[6:10],data=Dcr)

Or you could convert the list to dataframe

 Dcr2<-data.frame(Dcre1=Dcr$Dcre1,Dcre2=Dcr$Dcre2,Dcre3=Dcr$Dcre3)

#testing whether list and dataframe converted results are same
#From dataframe 

regCred5<-lm(Dcre1~Dcre2+Dcre3,data=Dcr2[1:5,])
> summary(regCred5)

Call:
lm(formula = Dcre1 ~ Dcre2 + Dcre3, data = Dcr2[1:5, ])

Residuals:
       1        2        3        4        5 
-0.01262  0.09888  0.07133 -0.08494 -0.07265 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 16.53707    0.99604  16.603  0.00361 **
Dcre2       -0.27890    0.04185  -6.665  0.02178 * 
Dcre3        0.21874    0.04643   4.711  0.04222 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

#Same model using list

regCred6<-lm(Dcre1[1:5]~Dcre2[1:5]+Dcre3[1:5],data=Dcr)
> summary(regCred6)

Call:
lm(formula = Dcre1[1:5] ~ Dcre2[1:5] + Dcre3[1:5], data = Dcr)

Residuals:
       1        2        3        4        5 
-0.01262  0.09888  0.07133 -0.08494 -0.07265 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 16.53707    0.99604  16.603  0.00361 **
Dcre2[1:5]  -0.27890    0.04185  -6.665  0.02178 * 
Dcre3[1:5]   0.21874    0.04643   4.711  0.04222 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.1173 on 2 degrees of freedom
Multiple R-squared: 0.9739,    Adjusted R-squared: 0.9478 
F-statistic: 37.28 on 2 and 2 DF,  p-value: 0.02612 

The difference is in the names of the coefficients.  

names(coef(regCred6))
[1] "(Intercept)" "Dcre2[1:5]"  "Dcre3[1:5]" 
which you can change by,
names(regCred6$coef)<-names(regCred5$coef)
 regCred6$coef
(Intercept)       Dcre2       Dcre3 
 16.5370694  -0.2788988   0.2187360 

Though, it won't change the names of coefficients in the summary.  I tried several ways, but so far not successful.  I think in that case, the easiest way is to assign the subset to a new variable and run the analysis.

e.g.

Dcre1new<-Dcre1[1:5]

Hope this was helpful.

A.K.

----- Original Message -----
From: "Lekgatlhamang, lexi Setlhare" <lexisetlhare at yahoo.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Saturday, June 30, 2012 6:04 PM
Subject: [R]  Adjusting length of series

Hi
I have a follow up question, relating to subsetting to list items. After using the list and min(sapply()) method to adjust the length of the variables, I specify a dynamic regression equation using the variables in the list. My list looks like this:
Dcr<- list(Dcre1=DCred1,Dcre2=DCred2,Dcre3=DCred3,Dbobc1=DBoBC1,Dbobc2=DBoBC2,Dbobc3=DBoBC3,...)
By specifying the list items with names, I thought I could end by referencing them (or subsetting the list) as, eg., Dcr$Dcre1 and get DCred1, Dcr$Dbobc1 and get DBoBC1, etc so that the explanatory variables of the equation can be easily associated with their respective original names. This way, I would avoid specifying the list as Dcr<-list(Dcr1, Dcr2, Dcr, 3..., Dcr15) and then subsetting the list using Dcr[[1]][1:29], Dcr[[[2]][1:29], ..., Dcr[[15]][1:29] because the list has many variables (15) and referencing the variables with numbers makes them lose their original names.
When I specify the list as Dcr<- list(Dcr1, Dcr2, ..., Dcr15), then the regression equation specified as:
# Regression
regCred<- lm(Dcr[[1]][1:29]~Dcr[[2]][1:29]+Dcr[[3]][1:29]+Dcr[[4]][1:29]+Dcr[[5]][1:29]+Dcr[[6]][1:29]+...)
runs without problems - the results are shown here below:
Call:
lm(formula = Dcr[[1]][1:29] ~ Dcr[[2]][1:29] + Dcr[[3]][1:29] + 
Dcr[[4]][1:29] + Dcr[[5]][1:29] + Dcr[[6]][1:29])
Residuals:
Min      1Q  Median      3Q     Max 
-86.293 -33.586  -9.969  40.147 117.965 
Coefficients:
Estimate Std. Error t value Pr(>|t|) 
(Intercept)    81.02064   13.28632   6.098 3.21e-06 ***
Dcr[[2]][1:29] -0.97407    0.11081  -8.791 8.20e-09 ***
Dcr[[3]][1:29] -0.27950    0.05899  -4.738 8.95e-05 ***
Dcr[[4]][1:29] -0.07961    0.04856  -1.639    0.115 
Dcr[[5]][1:29] -0.07180    0.05515  -1.302    0.206 
Dcr[[6]][1:29] -0.01562    0.02086  -0.749    0.462 

But when I specify the list with names as shown above, then the equation does not run - as shown by the following error message
> # Regression
> regCred<- lm(Dcr[[1]][1:29]~Dcr[[2]][1:29]+Dcr[[3]][1:29]+Dcr[[4]][1:29]+
+ Dcr[[5]][1:29]+Dcr$Dbobc3)
Error in model.frame.default(formula = Dcr[[1]][1:29] ~ Dcr[[2]][1:29] +  : 
variable lengths differ (found for 'Dcr$Dbobc3')
> Dcr[[5]][1:29]+Dcr$Dbobc3[1:29])
Error: unexpected ')' in "Dcr[[5]][1:29]+Dcr$Dbobc3[1:29])"

NB: In the equation with error message, only the last term is specified by referencing its name (ie., Dcr$Dbobc3[1:29]. Also note that the error occurs whether I append '[1:29]' to Dcr$Dbobc or not.
How do I resolve this?
Thanks. Lexi

NB: I tried typing the above in the same email Petr used to reply me, but the email could not be delivered due to size problems
    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--