[R] problem replacing NA's in a dataset (10% remain after removal attempt)

Julie Shoemaker jshoemak at fas.harvard.edu
Thu Jul 19 20:05:41 CEST 2012


Hi all,
I'm attempting to gap-fill a dataset, replacing the missing values with 
each month's day or night median value.

The problem is that my code results in some, but not all the NA's being 
replaced and I cannot figure out how this is possible.  When I look at 
the individual line's where the NA's remain, they should have been 
captured in my code as far as I can tell.  Here is an example:

the dataset is 4464x14 called hourly.data
I've already replaced all NaN values with NA

#filPFD is a column of ambient light levels, it has no NA values, all 
values are real and either 0 or >0
#month is a column with values between 7 and 12 depending on the month 
the data was collected
#fillCH4 is a column containing CH4 flux data that I am trying to gap-fill
#night_median and day_median are 1x6 vectors with the median flux values 
for each month

temp<-hourly.data[hourly.data$month==7,]
darkmonth<-(temp$filPFD==0)
daymonth<-(temp$filPFD>0)
temp[is.na(temp[darkmonth,"fillCH4"]),"fillCH4"]<-night_median[1]
temp[is.na(temp[daymonth,"fillCH4"]),"fillCH4"]<-day_median[1]
hourly.data[hourly.data$month==7,"fillCH4"]<-temp$fillCH4


This code replaces the majority of the NA's, but maybe 10% remain. The 
cases that I have isolated, all have values of 7 for the "month" column 
and real values in the "filPFD" column.

Any thoughts?  Am I missing something obvious?  Is there any way these 
values could be coming up as NA but belong to some different 
classification such that they are not picked up by the is.na function?

Best,
Julie

__________________________________
Julie Shoemaker, PhD
Postdoctoral Research Associate
Harvard University
phone: (617) 384-7237
email: jshoemak at fas.harvard.edu



More information about the R-help mailing list