[R] linear fit function with NA values

arun smartpink111 at yahoo.com
Sat Jul 27 22:57:38 CEST 2013


HI,
I couldn't get any error message with the data you provided.
return<- read.table(text="
      ATI        AMU
-1  0.734    9.003
0    0.999    2.001
1    3.097    -1.003
2        NA        NA
3        NA    3.541
",sep="",header=TRUE)

median<- read.table(text="
      ATI        AMU
-1  3.224    -2.003
0    2.999    -1.301
1    1.3        -1.003
2    4.000    2.442
3      -10    4.511
",sep="",header=TRUE)

 lapply(seq_len(ncol(return)),function(i) {lm(return[,i]~median[,i])}) 
[[1]]

Call:
lm(formula = return[, i] ~ median[, i])

Coefficients:
(Intercept)  median[, i]  
      4.696       -1.231  


[[2]]

Call:
lm(formula = return[, i] ~ median[, i])

Coefficients:
(Intercept)  median[, i]  
     3.3937      -0.1607  

lapply(seq_len(ncol(return)),function(i) {lm(return[,i]~median[,i],na.action=na.omit)}) #same as above.

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_0.6.2  reshape2_1.2.2

loaded via a namespace (and not attached):
[1] plyr_1.8    tools_3.0.1

BTW, It is better to ?dput() the example dataset.

A.K.



----- Original Message -----
From: iza.ch1 <iza.ch1 at op.pl>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Saturday, July 27, 2013 4:46 PM
Subject: Re: Re: [R] linear fit function with NA values

Hi

Thanks for your hints. I would like to describe my problem better and give an examle of the data that I use.

I conduct the event study and I need to create abnormal returns for the daily stock prices. I have for each stock returns from time period of 8 years. For some days I don't have the data for many reasons. in excel file they are just empty cells but I convert my data into 'zoo' and then it is transformed into NA. I get something like this

return


       ATI        AMU
-1   0.734     9.003
0    0.999     2.001
1    3.097     -1.003
2        NA        NA
3        NA     3.541

median
      ATI        AMU
-1   3.224     -2.003
0    2.999     -1.301
1    1.3        -1.003
2    4.000     2.442
3       -10     4.511

I want to regress first column return with first column median and second column return with second column median. when I do 
OLS<-lapply(seq_len(ncol(return)),function(i) {lm(return[,i]~median[,i])})
I get an error message. I would like my function to omit the NAs and for example for ATI returns to take into account only the values for -1,0,1 and regress it against the same values from ATI in median which means it would also take only (3.224, 2.999, 1.3)

Is it possible to do it?

Thanks a lot 

W dniu 2013-07-27 17:33:30 użytkownik arun <smartpink111 at yahoo.com> napisał:
> 
> 
> HI,
> set.seed(28)
> dat1<- as.data.frame(matrix(sample(c(NA,1:20),100,replace=TRUE),ncol=10))
> 
> set.seed(49)
> dat2<- as.data.frame(matrix(sample(c(NA,40:80),100,replace=TRUE),ncol=10))
>  lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i])}) #works bcz the default setting removes NA
> Regarding the options:
> ?lm()
> na.action: a function which indicates what should happen when the data
>           contain ‘NA’s.  The default is set by the ‘na.action’ setting
>           of ‘options’, and is ‘na.fail’ if that is unset.  The
>           ‘factory-fresh’ default is ‘na.omit’.  Another possible value
>           is ‘NULL’, no action.  Value ‘na.exclude’ can be useful.
> 
>  lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i],na.action=na.exclude)})
> #or
>  lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i],na.action=na.omit)})
> 
> lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i],na.action=na.fail)})
> #Error in na.fail.default(list(`dat2[, i]` = c(54L, 59L, 50L, 64L, 40L,  : 
>  # missing values in object
> 
> In your case, the error is different.  It could be something similar to the below case:
> dat1[,1]<- NA
> 
> lapply(seq_len(ncol(dat1)),function(i) {lm(dat2[,i]~dat1[,i],na.action=na.omit)})
> #Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
>  # 0 (non-NA) cases # here it is different
> 
>  lapply(seq_len(ncol(dat1)),function(i) {try(lm(dat2[,i]~dat1[,i]))}) #works in the above case.  It may not work in your case.
> 
> You need to provide a reproducible example to understand the situation better.
> A.K.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ----- Original Message -----
> From: iza.ch1 <iza.ch1 at op.pl>
> To: r-help at r-project.org
> Cc: 
> Sent: Saturday, July 27, 2013 8:47 AM
> Subject: [R] linear fit function with NA values
> 
> Hi
> 
> Quick question. I am running a multiple regression function for each column of two data sets. That means as a result I get several coefficients. I have a problem because data that I use for regression contains NA. How can I ignore NA in lm function. I use the following code for regression: 
> OLS<-lapply(seq_len(ncol(es.w)),function(i) {lm(es.w[,i]~es.median[,i])})
> as response I get
> Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
>   all values NA
> 
> thanks for help :)
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>



More information about the R-help mailing list