[R] ignore NA column in a DF (for calculation) without removing them

jeff6868 geoffrey_klein at etu.u-bourgogne.fr
Thu May 31 10:36:23 CEST 2012


Dear users,

I have for the moment a function which looks for the best correlation for
each file I have in my correlation matrix. I'm working on a list.files.
Here's the function:

get.max.cor <- function(station, mat){
        mat[row(mat) == col(mat)] <- -Inf        
        which( mat[station, ] == max(mat[station, ],na.rm=TRUE) )
     }

If I have a correlation matrix like this (no NA-value):

cor1 <- read.table(text="
ST208     ST209     ST210     ST211     ST212
ST208 1.0000000 0.8646358 0.8104837 0.8899451 0.7486417
ST209 0.8646358 1.0000000 0.9335584 0.8392696 0.8676857
ST210 0.8104837 0.9335584 1.0000000 0.8304132 0.9141465
ST211 0.8899451 0.8392696 0.8304132 1.0000000 0.8064669
ST212 0.7486417 0.8676857 0.9141465 0.8064669 1.0000000
", header=TRUE)

It works perfectly. If I have a correlation matrix with some NAs (but not
only NAs) like this:

cor2 <- read.table(text="
ST208     ST209     ST210     ST211     ST212
ST208 1.0000000 NA 0.9666491 0.9573701 0.9233598
ST209 NA 1.0000000 0.9744054 0.9577192 0.9346706
ST210 0.9666491 0.9744054 1.0000000 0.9460145 0.9582683
ST211 0.9573701 0.9577192 0.9460145 1.0000000 NA
ST212 0.9233598 0.9346706 0.9582683 NA 1.0000000
", header=TRUE)

It still works thanks to na.rm=TRUE, but when I have one file with no data,
and so only NAs in the column like this:
cor3 <- read.table(text="
ST208     ST209     ST210     ST211     ST212
ST208 1.0000000 NA 0.8104837 0.8899451 0.7486417
ST209 NA NA NA NA NA
ST210 0.8104837 NA 1.0000000 0.8304132 0.9141465
ST211 0.8899451 NA 0.8304132 1.0000000 0.8064669
ST212 0.7486417 NA 0.9141465 0.8064669 1.0000000
", header=TRUE)

It doesn't work of course, because there's no non-NA value and so, no max
correlation for this file.
That's why I have this error: 0 (non-na) cases.
I tried to remove the NA columns, but as I'm working on a list.files, the
number of files in the list and in the matrix will be not the same. I
searched on the web but I only found some topics about removing NA columns.
In my case, I would like to ignore these NA columns without removing them.

I would like to say to R: when you are looking for the highest correlation
for each file in the correlation matrix, if you see a file with no
correlation coeff (only NAs column), don't do anything with it, keep it like
this and go to the next file (next column or row).
I also tried to put else {NA} or else {NULL} to avoid this problem but it
still doesn't work.

Does somebody have an idea how to solve this problem?
Thank you very much.

Best regards
Geoffrey




--
View this message in context: http://r.789695.n4.nabble.com/ignore-NA-column-in-a-DF-for-calculation-without-removing-them-tp4631912.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list