[R] select part of files from a list.files

jeff6868 geoffrey_klein at etu.u-bourgogne.fr
Thu May 24 12:51:56 CEST 2012


Hi again Joshua.

I tried your function. I think it's what I need. It works well in the small
example of my first post. But I have difficulties to adapt it to my data.
I'll try to give you another fake example with my real script and kind of
data (you can just copy and paste it to try):

ST1 <-
data.frame(sensor1=rnorm(1:10),sensor2=c(NA,NA,NA,NA,NA,rnorm(6:10)),sensor3=c(1,NA,NA,4:10),sensor4=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),date_time=(date()))
write.table(ST1,"ST1_2012.csv",sep=";",quote=F, row.names = TRUE)
ST2 <-
data.frame(sensor1=c(NA,NA,NA,NA,NA,6:10),sensor2=rnorm(1:10),sensor3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),sensor4=c(1,NA,NA,4:10),date_time=(date()))
write.table(ST2,"ST2_2012.csv",sep=";",quote=F, row.names = TRUE)
ST3 <-
data.frame(sensor1=c(1,NA,NA,4:10),sensor2=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),sensor3=rnorm(1:10),sensor4=c(NA,NA,NA,NA,NA,6:10),date_time=(date()))
write.table(ST3,"ST3_2012.csv",sep=";",quote=F, row.names = TRUE)
ST4 <-
data.frame(sensor1=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),sensor2=c(1,NA,NA,4:10),sensor3=c(NA,NA,NA,NA,NA,6:10),sensor4=rnorm(1:10),date_time=(date()))
write.table(ST4,"ST4_2012.csv",sep=";",quote=F, row.names = TRUE)

filenames <- list.files(pattern="\\_2012.csv$")

    Sensors <- paste("sensor", 1:4,sep="")

    Stations <-substr(filenames,1,3)

    nsensors <- length(Sensors)
    nstations <- length(Stations)

    nobs <- nrow(read.table(filenames[1], header=TRUE,sep=";"))

    yr2008 <- array(NA,dim=c(nobs, nsensors, nstations))

    for(i in seq_len(nstations)){
    tmp <- read.table(filenames[i], header=TRUE, sep=";")
    yr2008[ , , i] <- as.matrix(tmp[, Sensors])
    }

    dimnames(yr2008) <- list(seq.int(nobs), Sensors, Stations)
    
    cor1_5 <- lapply(Sensors, function(s) cor(yr2008[1:5, s,
],use="pairwise.complete.obs"))
    names(cor1_5) <- Sensors
    cor1_5

For the moment, it makes correlations between the same sensors of each file
(only for a part of my data), whatever the number of NA or numeric data.
I want it to do the same, but with your function: 
if (sum(!is.na(data[rows, ])) >= minpresent){
    data
  } else {NULL}
} 

I want it to give me the same correlation matrices for each sensors between
my files, but I want it to calculate the correlation coefficient only if I
have at least 3 numeric values (out of 5 in the example), and not whatever
the number of these numeric values (just 1 or 2 for example). If there're
less than 3 numeric values (1 or 2), give NA for correlation in the matrix.
And if there're only NAs in the sensor data, do nothing with it (keep it and
go to the next sensor).

I tried to combinate your function with mine but it doesn't work. Hope
you've understood. Thanks for your help!




--
View this message in context: http://r.789695.n4.nabble.com/select-part-of-files-from-a-list-files-tp4630769p4631185.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list