[R] Need a faster function to replace missing data
dieter.menne at menne-biomed.de
Fri May 22 15:01:05 CEST 2009
Tim Clark <mudiver1200 <at> yahoo.com> writes:
> I need some help in coming up with a function that will take two data sets,
determine if a value is missing in
> one, find a value in the second that was taken at about the same time, and
substitute the second value in for
> where the first should have been.
This is the type of job I would do with a database, not R (alone). The
main advantage is that you have to do the cleanup job only once and can
retrieve the data in a rather well-documented way later (it's possible
with R, I know).
>> Put the 5 minutes data into one table. I would two new columns giving the
delta to the next value for easier linear interpolation, but that's
secondary. Make sure to index the table.
>> Put the 1 seconds data into another table, adding values rounded to
5 seconds, and giving these an index.
>>From R/ODBC or with RSQLite, make a Join of all values in Table 1
that do have NULL values in the coordinates. If you do not want to
do a linear interpolation, you could even do this within the database
and SQL alone.
>> Compute the linear interpolation, and write the data back into
the database. If you want to be careful, you might mark the interpolated
values in a separate field as "computed"
When at a later time new data come in, you can run the procedure again
More information about the R-help