Type: | Package |
Title: | Spatio-Temporal Functional Imputation Tool |
Version: | 0.99.9 |
Date: | 2022-10-17 |
Description: | A general spatiotemporal satellite image imputation method based on sparse functional data analytic techniques. The imputation method applies and extends the Functional Principal Analysis by Conditional Estimation (PACE). The underlying idea for the proposed procedure is to impute a missing pixel by borrowing information from temporally and spatially contiguous pixels based on the best linear unbiased prediction. |
BugReports: | https://github.com/mingsnu/stfit/issues |
License: | GPL-3 |
LazyData: | true |
Depends: | R (≥ 3.5.0) |
Imports: | Rcpp, Matrix, doParallel, foreach, abind, fda, raster, rasterVis, RColorBrewer |
LinkingTo: | Rcpp |
Suggests: | testthat, dplyr |
RoxygenNote: | 7.1.0 |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
Packaged: | 2022-10-17 23:25:55 UTC; Weicheng |
Author: | Weicheng Zhu [aut, cre] |
Maintainer: | Weicheng Zhu <mingsnu@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-10-18 12:20:04 UTC |
stfit: Spatial-Temporal Functional Imputation Tool
Description
The stfit package provides functions to impute missing values for a sequence of observed images for the same location using functional data analysis technique
Absolute relative error
Description
Absolute relative error
Usage
ARE(y, ypred)
Arguments
y |
vector |
ypred |
vector |
Value
numeric number. A measure of difference between y and ypred.
Normalized Mean Square Estimation
Description
Normalized Mean Square Estimation
Usage
NMSE(y, ypred)
Arguments
y |
vector |
ypred |
vector |
Value
numeric number. A measure of difference between y and ypred.
Root Mean Square Estimation
Description
Root Mean Square Estimation
Usage
RMSE(y, ypred)
Arguments
y |
vector |
ypred |
vecotr |
Value
numeric number. A measure of difference between y and ypred.
Epanicnicov kernel function
Description
Epanicnicov kernel function
Usage
epan(x)
Arguments
x |
numeric vector |
Value
vector
Get image mask
Description
Get image mask
Usage
getMask(object, tol = 0.95)
Arguments
object |
A numeric matrix. Each row is an row stacked image. |
tol |
If the percentage of missing values for a pixel over time is greater than this value, this pixel is treated as a mask value. |
Get missing layer index
Description
Get missing layer index
Usage
getMissingLayers(rst.list)
Arguments
rst.list |
a RasterStack or RasterBrick object or a list of them |
Value
index of the missing layers
Landsat data example
Description
A dataset containing observation values of a 31x31 pixcels landsat image observed between year 1982 and 2015.
Usage
landsat106
landsat2
Format
A data frame with 990 rows and 963 columns:
year year
doy day of the year
pixeli pixel value for the i-th pixel of the image
An object of class tbl_df
(inherits from tbl
, data.frame
) with 990 rows and 963 columns.
Data visualization for landsat data
Description
Data visualization for landsat data
Usage
landsatVis(
mat,
img.nrow = 31,
byrow = FALSE,
colthm = rasterTheme(panel.background = list(col = "black"), region = brewer.pal(9,
"YlOrRd")),
...
)
Arguments
mat |
A matrix, each row corresponds to a vectorized image pixel values. |
img.nrow |
number of rows of the image |
byrow |
logical value indicating whether the pixcel values are stored by row or by column. Default to FALSE |
colthm |
Color theme for the plot, passing to the |
... |
All other options passed to |
Examples
landsatVis(landsat106[landsat106$year == 2015, -c(1:2)],
names.attr = as.character(landsat106$doy[landsat106$year == 2015]))
Local constant covariance estimation
Description
Local constant covariance estimation
Usage
lc_cov_1d(ids, time, resid, W, t1, t2)
Arguments
ids |
a vector indicating subject/group ids |
time |
integer vector of observed time points, the minimum time unit is 1 |
resid |
vector of residual values used for covariance calculation |
W |
weight vector, it contains both kernel and bandwidth information in general local polynomial estimation setting up |
t1 |
time point 1 |
t2 |
time point 2 |
Local constant covariance estimation
Description
Local constant covariance estimation
Usage
lc_cov_1d_est(ids, time, resid, W, tt)
Arguments
ids |
a vector indicating subject/group ids |
time |
integer vector of observed time points, the minimum time unit is 1 |
resid |
vector of residual values used for covariance calculation |
W |
weight vector, it contains both kernel and bandwidth information in general local polynomial estimation setting up |
tt |
time vector |
Local linear regression
Description
Local linear regression
Usage
llreg(x, y, x.eval = x, minimum.num.obs = 4, h = 60, Kern = epan)
Arguments
x |
independent variable |
y |
response variable |
x.eval |
dnew data to predict on |
minimum.num.obs |
minimum number of observations needed to run the regression |
h |
bandwidth |
Kern |
Kernel |
Value
predicted values at 'x.eval'
Local Polynomial Regression
Description
Local Polynomial Regression
Usage
lpreg(x, y, x.eval, minimum.num.obs = 4, span = 0.3, ...)
Arguments
x |
independent variable |
y |
response variable |
x.eval |
vector to predict on |
minimum.num.obs |
minimum number of observations needed to run the regression |
span |
see 'loess' function |
... |
other parameters passed to 'loess' function |
Value
predicted values at 'x.eval'
STFIT Mean Estimation
Description
The function is used for pixel-wise mean estimation.
Usage
meanEst(
doy,
mat,
doyeval = seq(min(doy), max(doy)),
msk = rep(FALSE, ncol(mat)),
outlier.tol = 0.5,
minimum.num.obs = 4,
cluster = NULL,
redo = TRUE,
clipRange = c(-Inf, Inf),
clipMethod = c("truncate", "nnr"),
img.nrow = NULL,
img.ncol = NULL
)
Arguments
doy |
vector of day of year (DOY) index |
mat |
data matrix. Each row contains a row stacked image pixel values. |
doyeval |
a vector of DOY on which to get the mean imputation |
msk |
an optional logistic vector. TRUE represent the corresponding pixel is always missing. |
outlier.tol |
the tolerance value in defining an image as outlier. The percent of outlier pixels in an image exceed this value is regarded as outlier image which will not be used in temporal mean estimation. |
minimum.num.obs |
minimum number of observations needed for mean estimation. Too few observations may lead to big estimation error. |
cluster |
an optional vector defining clusters of pixels. If NULL, mean estimation is conducted on each pixel, otherwise all pixels from the same cluster are combined for mean estimation. |
redo |
whether to recalculate the mean estimation if there is an outlier (only redo once). |
clipRange |
vector of length 2, specifying the minimum and maximum values of the prediction value |
clipMethod |
"nnr" or "truncate". "nnr" uses average of nearest neighbor pixels to impute; "truncate use the clipRange value to truncate. |
img.nrow |
number of rows for an image, only used when 'clipMethod' is "nnr" |
img.ncol |
number of columns for an image, only used when 'clipMethod' is "nnr" |
Details
There are several predefined methods for mean estimation: smooth_spline
,
llreg
, lpreg
and spreg
. User can use opt$get()
to check
the current registered method and use opt$set()
function to set the method.
For exmaple, one can run opt$set(smooth_spline)
first and then run the
meanEst
function to use smoothing spline regression for mean eatimation.
User can also customize the methods for mean estimation. For example, mean estimation
through fourier basis expansion:
.X = fda::eval.basis(1:365, fda::create.fourier.basis(rangeval=c(0,365), nbasis=11)) customfun <- function(x, y, x.eval=1:365, minimum.num.obs = 10){ nonna.idx = !is.na(y) if(sum(nonna.idx) < minimum.num.obs) return(rep(NA, 365)) ## lmfit = lm.fit(.X[unlist(lapply(x, function(x) which(x == x.eval))),], y[nonna.idx]) lmfit = lm.fit(.X[x[nonna.idx],], y[nonna.idx]) return(.X[x.eval,] } stfit::opts_stfit$set(temporal_mean_est = customfun)
Value
a list containing the following entries:
doyeval: same as input
doyeval
meanmat: estimated mean matrix, with number of rows equals length of
doyeval
and number of columns equalncol(mat)
idx: a list of image indexes
idx.allmissing: completely missing image indexes,
idx.partialmissing: partially observed image indexes,
idx.fullyobserved: fully observed image indexes,
idx.outlier: outlier image indexes.
outlier: a list of image outliers information
outidx: index of the outlier image
outpct: percentage of outlier pixels corresponding to
outidx
,outlst: a list of the same length as
outidx
, with each list the missing pixel index.
Options for stfit
Description
Options for stfit
Usage
opts_stfit
Format
An object of class list
of length 3.
Image Outlier Detection
Description
Image Outlier Detection
Usage
outlier(mat)
Arguments
mat |
data matrix. Each row is a row stacked image. |
Value
a list containing the following entries:
outidx: index of the outlier image
outpct: percentage of outlier pixels corresponding to
outidx
,outlst: a list of the same length as
outidx
, with each list the missing pixel index.
Examples
dfB = landsat106[landsat106$year >= 2000,]
matB = as.matrix(dfB[,-c(1:2)])
outlier(matB)
Missing value percentages
Description
Missing value percentages
Usage
pctMissing(x, mc.cores)
Arguments
x |
A |
mc.cores |
Numer of cores to use |
Value
A vector of percent of missing values for each layer
Remove outlier
Description
An outlier is defined as points outside the whiskers of the boxplot over the time domain (DOY).
Usage
rmOutlier(rst)
Arguments
rst |
a *Raster object |
Value
a *Raster object
STFIT Spatial Effect Estimation
Description
STFIT Spatial Effect Estimation
Usage
seffEst(
rmat,
img.nrow,
img.ncol,
h.cov = 2,
h.sigma2 = 2,
weight.cov = NULL,
weight.sigma2 = NULL,
nnr,
method = c("lc", "emp"),
partial.only = TRUE,
pve = 0.99,
msk = NULL,
msk.tol = 0.95,
var.est = FALSE
)
Arguments
rmat |
residual matrix |
img.nrow |
image row dimension |
img.ncol |
image column dimension |
h.cov |
bandwidth for spatial covariance estimation; ignored if |
h.sigma2 |
bandwidth for sigma2 estimation |
weight.cov |
weight matrix for spatial covariance estimation |
weight.sigma2 |
weight vector for spatial variance estimation |
nnr |
maximum number of nearest neighbor pixels to use for spatial covariance estimation |
method |
"lc" for local constant covariance estimation and "emp" for empirical covariance estimation |
partial.only |
calculate the spatical effect for partially observed images only, default is TRUE |
pve |
percent of variance explained of the selected eigen values. Default is 0.99. |
msk |
an optional logistic vector. TRUE represent the corresponding pixel is always missing. |
msk.tol |
if 'msk' is not given, the program will determine the mask using |
var.est |
Whether to estimate the variance of the temporal effect. Default is FALSE. |
Value
List of length 3 with entries:
seff_mat: estimated spatial effect matrix of the same shape as
rmat
.seff_var_mat: estimated spatial effect variance matrix of the same shape as
rmat
.idx: a list of two entries:
idx.allmissing: index of the completely missing images.
idx.imputed: index of the partially observed images, where spatial effects are estimated.
Smoothing spline regression
Description
Smoothing spline regression
Usage
smooth_spline(x, y, x.eval = x, minimum.num.obs = 4, ...)
Arguments
x |
independent variable |
y |
response variable |
x.eval |
vector to predict on |
minimum.num.obs |
minimum number of observations needed to run the regression |
... |
other parameters to be passed to |
Value
predicted values at 'x.eval'
spline regression
Description
spline regression
Usage
spreg(
x,
y,
x.eval,
minimum.num.obs = 4,
basis = c("fourier", "bspline"),
rangeval = c(min(x.eval) - 1, max(x.eval)),
nbasis = 11,
...
)
Arguments
x |
independent variable |
y |
response variable |
x.eval |
vector to predict on |
minimum.num.obs |
minimum number of observations needed to run the regression |
basis |
what basis to use, "fourier" and "bspline" are available |
rangeval |
see |
nbasis |
see |
... |
arguments passed to |
Value
predicted values at 'x.eval'
STFIT for Landsat data
Description
This function is used for Landsat data imputation, which includes five steps: mean estimation, outlier detection, temporal effect estimation, spatial effect estimation and imputation. In real application, one can use this as a template to create a five steps imputation procedure depending on the real data structure.
Usage
stfit_landsat(
year,
doy,
mat,
img.nrow,
img.ncol,
doyeval = 1:365,
h.tcov = 100,
h.tsigma2 = 300,
h.scov = 2,
h.ssigma2 = 2,
nnr = 10,
outlier.action = c("keep", "remove"),
outlier.tol = 0.2,
intermediate.save = TRUE,
intermediate.dir = "./intermediate_output/",
use.intermediate.result = TRUE,
teff = TRUE,
seff = TRUE,
doy.break = NULL,
cycle = FALSE,
t.grid = NULL,
t.grid.num = 50,
clipRange = c(0, 1800),
clipMethod = "nnr",
var.est = FALSE
)
Arguments
year |
vecotr of year |
doy |
vecotr of DOY (day of the year) |
mat |
a numeric matrix. Each row contains a row stacked image pixel values. |
img.nrow |
number of rows of the image |
img.ncol |
number of columns of the image |
doyeval |
a vector of DOY on which to get the mean and temporal imputation |
h.tcov |
bandwidth for temporal covariance estimation |
h.tsigma2 |
bandwith for temporal variance estimation |
h.scov |
bandwidth for spatial covariance estimation |
h.ssigma2 |
bandwidth for spatial variance estimation |
nnr |
maximum number of nearest neighbor pixels to use for spatial covariance estimation |
outlier.action |
"keep" to keep outliers; "remove" to replace outliers with imputed values |
outlier.tol |
The threshold to use to define outlier image. Default is 0.2, i.e. images with more than 20% outlier pixels are treated as outlier image. |
intermediate.save |
TRUE or FASLE; whether to save the intermediate results including mean, temporal effect and spacial effect imputation resutls. The intermediate results can be useful to avoid duplicating the computation for some imputation steps. |
intermediate.dir |
directory where to save the intermediate results |
use.intermediate.result |
whether to use the intermediate results in the 'intermediate.dir' folder. Default is TRUE. |
teff |
TRUE or FALSE, wheter to calculate the temporal effect. Default is TRUE. |
seff |
TRUE or FALSE, wheter to calculate the spatial effect. Default is TRUE. |
doy.break |
a vector of break points for |
cycle |
TRUE or FALSE. When |
t.grid |
a vector of grid points on which to calculate the temporal covariance function |
t.grid.num |
number of grid points to use for temporal covariance estimation.
Ignored if |
clipRange |
passed to |
clipMethod |
passed to |
var.est |
Whether to estimate the variance of the temporal and spatial effects. Default is FALSE. |
Value
List of length 4 with entries:
imat: imputed matrix of
mat
smat: standard error matrix of the same size as
mat
idx: a list of image indexes
idx.allmissing: completely missing image indexes,
idx.partialmissing: partially observed image indexes,
idx.fullyobserved: fully observed image indexes,
idx.outlier: outlier image indexes.
outlier: a list of image outliers information
outidx: image index with outlier pixels,
outpct: percentage of outlier pixels corresponding to
outidx
,outlst: a list of the same length as
outidx
, with each list the missing pixel index.
Examples
library(doParallel)
library(raster)
library(rasterVis)
library(RColorBrewer)
dfB = landsat106[landsat106$year >= 2000,]
matB = as.matrix(dfB[,-c(1:2)])
year = dfB$year
doy = dfB$doy
if(require(doParallel))
registerDoParallel(1)
res <- stfit_landsat(year, doy, matB, 31, 31, nnr=30,
use.intermediate.result = FALSE, intermediate.save = FALSE, var.est = TRUE)
## visualize the imputed results
idx = c(res$idx$idx.allmissing[150], res$idx$idx.partialmissing[c(30, 60, 90)])
rst_list = list()
for(i in 1:length(idx)){
rst_list[(i-1)*3+1] = raster(matrix(matB[idx[i],], 31))
rst_list[(i-1)*3+2] = raster(matrix(res$imat[idx[i],], 31))
rst_list[(i-1)*3+3] = raster(matrix(res$sdmat[idx[i],], 31))
}
s = stack(rst_list)
levelplot(s, index.cond=list(c(seq(1, 12, 3), seq(2, 12, 3), seq(3, 12, 3))),
par.setting = rasterTheme(panel.background=list(col="black"),
region = brewer.pal(9, 'YlOrRd')),
names.attr = c(rbind(paste0("Original ", idx),
paste0("Imputed ", idx),
paste0("Std. Error ", idx))),
layout = c(4,3))
STFIT Temporal Effect Estimation
Description
STFIT Temporal Effect Estimation
Usage
teffEst(
ids,
doy,
rmat,
doyeval = seq(min(doy), max(doy)),
h.cov = 100,
h.sigma2 = 300,
weight.cov = NULL,
weight.sigma2 = NULL,
pve = 0.99,
t.grid = NULL,
t.grid.num = 50,
var.est = FALSE
)
Arguments
ids |
ids for 'group', for data with repeated measurement over years, year is ids; for pixels belong to certain clusters, cluster is ids. |
doy |
vecotr of DOY (day of the year) |
rmat |
residual matrix with rows corresponding to |
doyeval |
a vector of DOY on which to get the temporal imputation |
h.cov |
bandwidth for temporal covariance estimation; ignored if |
h.sigma2 |
bandwidth for temporal variance estimation |
weight.cov |
weight vector for temporal covariance estimation |
weight.sigma2 |
weight vector for temporal variance estimation |
pve |
percentage of variance explained; used for number of eigen values selection. Default is 0.99. |
t.grid |
a vector of grid points on which to calculate the temporal covariance function |
t.grid.num |
number of grid points to use for temporal covariance estimation. Ignored if |
var.est |
Whether to estimate the variance of the temporal effect. Default is FALSE. |
Value
List of length 2 with entries:
teff_array: 3-d array with first dimention 'ids', second dimention 'doy' and third dimention pixel index.
teff_var_array: same structure as
teff_array
ifvar.est
is TRUE, otherwise NULL.
Weight matrix calculation
Description
Weight matrix calculation
Usage
weightMatrix(h)
Arguments
h |
'bandwith' |
Value
a weighting matrix
Weight vector calculation
Description
Weight vector calculation
Usage
weightVector(h)
Arguments
h |
bandwidth, should be positive numbers |
Value
a vector