[R] NA values trimming

nyk nick at nyk.ch
Mon Jul 6 00:12:56 CEST 2009


Thanks for your reply! This is what I was looking for!
I'm using
nas1 <- apply(data_matrix,1,function(x)sum(is.na(x))/nrow(data_matrix))
nas2 <- apply(data_matrix,2,function(x)sum(is.na(x))/ncol(data_matrix))

The thing about "significantly more" isn't really a helpful as I look at the
data now.
I better write a function that removes the row or column with the highest
fraction of NAs, which I'll repeat as many times as it takes to get useful
data. For example, I want to do heatmaps and dendrograms, but the data has
too many NA values, so I get "Error in hclustfun(distfun(x)) :  NA/NaN/Inf
in foreign function call (arg 11)"




David Winsemius wrote:
> 
> 
> On Jul 4, 2009, at 9:22 PM, nyk wrote:
> 
>>
>> I have a data matrix containing quite a lot of missing values (NA).  
>> I know
>> how to remove all column or rows containing NA values, but is there  
>> a some
>> standard method for removing not all NA containing rows/column, but  
>> only
>> those which have significantly more NAs than others?
> 
> You have not defined what you mean by "significantly more than the  
> others" so perhaps all you want to know is haw to count the NA's in a  
> vector:
> 
>  > x=c(1,2,3,NA, 5,6,NA)
>  > sum(is.na(x))
> [1] 2
>>
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/NA-values-trimming-tp24339399p24347436.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list