[R] add an automatized linear regression in a function

jeff6868 geoffrey_klein at etu.u-bourgogne.fr
Thu May 3 15:45:59 CEST 2012


Dear R users,

For the moment, I have a script and a function which calculates correlation
matrices between all my data files. Then, it chooses the best correlation
for each data and take it in order to fill missing data in the analysed file
(so the data from the best correlation file is put automatically into the
missing data gaps of the first file (because my files are containing missing
values (NAs))). If the best correlated file doesn't contain data , it takes
the data from the second best correlated file. 
The problem is that for the moment, it takes raw data from the best
correlated file. 

So I need to adapt this raw data to the file that is going to be filled. As
a consequence, I'd like to automatize the calculation of a linear regression
(after the selection of the best or the second best correlated data file)
between the two files.
Instead of taking the raw data from the best correlated file to fill the
first one, it should take the estimated data from the regression to fill it
(in order to have more precise filled data). 
The idea is so to do an lm() between these two files, to extract the
coefficients of the straight line (from the regression) and to calculate the
estimated data for all my file (NA included), and finally to fill the gaps
with this estimated data. Hope you've understand my problem.
Here's the function:

process.all <- function(df.list, mat){
        f <- function(station)
             na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
             
        g <- function(station){
        x <- df.list[[station]]
        if(any(is.na(x$data))){
                mat[row(mat) == col(mat)] <- -Inf
                nas <- which(is.na(x$data))
                ord <- order(mat[station, ], decreasing = TRUE)[-c(1,
ncol(mat))]
                for(i in nas){
                        for(y in ord){
                                if(!is.na(df.list[[y]]$data[i])){
                                        x$data[i] <- df.list[[y]]$data[i]
                                        break
                                }
                        }
                }
        }
        x
    }                
        
        n <- length(df.list)
        nms <- names(df.list)
        max.cor <- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
        df.list <- lapply(seq.int(n), f)
        df.list <- lapply(seq.int(n), g) 
        names(df.list) <- nms
        df.list
    }

I succeded for a small data.frame I've created, but I don't know how to do
it in this particular case.
Thanks a lot for your help!


--
View this message in context: http://r.789695.n4.nabble.com/add-an-automatized-linear-regression-in-a-function-tp4606047.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list