Help for package gss

Version:

2.2-9

Date:

2025-04-13

Title:

General Smoothing Splines

Author:

Chong Gu [aut, cre]

Maintainer:

Chong Gu <chong@purdue.edu>

Depends:

R (≥ 3.0.0), stats

Description:

A comprehensive package for structural multivariate function estimation using smoothing splines.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Packaged:

2025-04-13 16:45:23 UTC; chong

NeedsCompilation:

yes

Repository:

CRAN

Date/Publication:

2025-04-13 18:00:02 UTC

Colorectal Cancer Mortality Rate in Indiana Counties

Description

County-wise death counts of colorectal cancer patients in Indiana during years 2000 through 2004.

Usage

data(ColoCan)

Format

A data frame containing 184 observations on the following variables.

`event`	Death counts.
`pop`	Population from Census 2000.
`sex`	Gender of population.
`wrt`	Proportion of Whites.
`brt`	Proportion of Blacks.
`ort`	Proportion of other minorities.
`lat`	Latitude.
`lon`	Longitude.
`geog`	Geographic location, derived from `lat` and `lon`.
`scrn`	Colorectal cancer screening rate.
`name`	County name.

Details

geog was generated from lat and lon using the code given in the example section.

Source

Dr. Tonglin Zhang.

References

Zhang, T. and Lin, G. (2009), Cluster detection based on spatial associations and iterated residuals in generalized linear mixed models. Biometrics, 65, 353–360.

Examples

## Converting latitude and longitude to x-y coordinates
## The 49th county is Marion, where Indianapolis is located.
## Not run: ltln2xy <- function(latlon,latlon0) {
  lat <- latlon[,1]*pi/180; lon <- latlon[,2]*pi/180
  lt0 <- latlon0[1]*pi/180; ln0 <- latlon0[2]*pi/180
  x <- cos(lt0)*sin(lon-ln0); y <- sin(lat-lt0)
  cbind(x,y)
}
data(ColoCan)
latlon <- as.matrix(ColoCan[,c("lat","lon")])
ltln2xy(latlon,latlon[49,])
## Clean up
rm(ltln2xy,ColoCan,latlon)
## End(Not run)

Diabetic Retinopathy

Description

Time to blindness of 197 diabetic retinopathy patients who received a laser treatment in one eye.

Usage

data(DiaRet)

Format

A data frame containing 197 observations on the following variables.

`id`	Patient ID.
`time1`	Follow-up time of left eye.
`time2`	Follow-up time of right eye.
`status1`	Censoring indicator of left eye.
`status2`	Censoring indicator of right eye.
`trt1`	Treatment indicator of left eye.
`trt2`	Treatment indicator of right eye.
`type`	Type of diabetes.
`age`	Age of patient at diagnosis.
`time.t`	Follow-up time of treated eye.
`time.u`	Follow-up time of untreated eye.
`status.t`	Censoring indicator of treated eye.
`status.u`	Censoring indicator of untreated eye.

Source

This is reformatted from the data frame diabetes in the R package timereg by Thomas H. Scheike.

References

Huster, W.J., Brookmeyer, R., and Self, S.G. (1989), Modelling paired survival data with covariates. Biometrics, 45, 145–56.

Water Acidity in Lakes

Description

Data extracted from the Eastern Lake Survey of 1984 conducted by the United States Environmental Protection Agency, concerning 112 lakes in the Blue Ridge.

Usage

data(LakeAcidity)

Format

A data frame containing 112 observations on the following variables.

`ph`	Surface ph.
`cal`	Calcium concentration.
`lat`	Latitude.
`lon`	Longitude.
`geog`	Geographic location, derived from `lat` and `lon`

Details

geog was generated from lat and lon using the code given in the Example section.

Source

Douglas, A. and Delampady, M. (1990), Eastern Lake Survey – Phase I: Documentation for the Data Base and the Derived Data sets. Tech Report 160 (SIMS), Dept. Statistics, University of British Columbia.

References

Gu, C. and Wahba, G. (1993), Semiparametric analysis of variance with tensor product thin plate splines. Journal of the Royal Statistical Society Ser. B, 55, 353–368.

Examples

## Converting latitude and longitude to x-y coordinates
## Not run: ltln2xy <- function(latlon,latlon0) {
  lat <- latlon[,1]*pi/180; lon <- latlon[,2]*pi/180
  lt0 <- latlon0[1]*pi/180; ln0 <- latlon0[2]*pi/180
  x <- cos(lt0)*sin(lon-ln0); y <- sin(lat-lt0)
  cbind(x,y)
}
data(LakeAcidity)
latlon <- as.matrix(LakeAcidity[,c("lat","lon")])
m.lat <- (min(latlon[,1])+max(latlon[,1]))/2
m.lon <- (min(latlon[,2])+max(latlon[,2]))/2
ltln2xy(latlon,c(m.lat,m.lon))
## Clean up
rm(ltln2xy,LakeAcidity,latlon,m.lat,m.lon)
## End(Not run)

Air Pollution and Road Traffic

Description

A subset of 500 hourly observations collected by the Norwegian Public Roads Administration at Alnabru in Oslo, Norway, between October 2001 and August 2003.

Usage

data(NO2)

Format

A data frame containing 500 observations on the following variables.

`no2`	Concentration of NO2, on log scale.
`cars`	Traffic volume of the hour, on log scale.
`temp`	Temperature 2 meters above ground, in Celsius.
`wind`	wind speed, meters/second.
`temp2`	Temperature difference between 25 and 2 meters above ground, in Celsius.
`wind2`	Wind direction, in degrees between 0 and 360.

Source

Statlib Datasets Archive at http://lib.stat.cmu.edu/datasets, contributed by Magne Aldrin.

Protein Expression in Human Immune System Cells

Description

Data concerning protein expression levels in human immune system cells under stimulations.

Usage

data(Sachs)

Format

A data frame containing 7466 cells, with flow cytometry measurements of 11 phosphorylated proteins and phospholipids, on the log10 scale of the original.

`praf`	Raf phosphorylated at S259.
`pmek`	Mek1/mek2 phosphorylated at S217/S221.
`plcg`	Phosphorylation of phospholipase `C-\gamma` on Y783.
`pip2`	Phophatidylinositol 4,5-biphosphate.
`pip3`	Phophatidylinositol 3,4,5-triphosphate.
`p44.42`	Erk1/erk2 phosphorylated at T202/Y204.
`pakts473`	AKT phosphorylated at S473.
`pka`	Phosphorylation of of protein kinase A substrates on 3 sites.
`pkc`	Phosphorylation of of protein kinase C substrates on S660.
`p38`	Erk1/erk2 phosphorylated at T180/Y182.
`pjnk`	Erk1/erk2 phosphorylated at T183/Y185.

Source

Sachs, K., Perez, O., Pe'er, D., Lauffenburger, D. A., and Nolan, G. P. (2005), Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308 (5732), 523–529.

AIDS Incubation

Description

A data set collected by Centers for Disease Control and Prevention concerning AIDS patients who were infected with the HIV virus through blood transfusion.

Usage

data(aids)

Format

A data frame containing 295 observations on the following variables.

`incu`	Time from HIV infection to AIDS diagnosis.
`infe`	Time from HIV infection to end of data collection (July 1986).
`age`	Age at time of blood transfusion.

Source

Wang, M.-C. (1989), A semiparametric model for randomly truncated data. Journal of the American Statistical Association, 84, 742–748.

Treatment of Bacteriuria

Description

Bacteriuria patients were randomly assigned to two treatment groups. Weekly binary indicator of bacteriuria was recorded for every patient over 4 to 16 weeks. A total of 72 patients were represented in the data, with 36 each in the two treatment groups.

Usage

data(bacteriuria)

Format

A data frame containing 820 observations on the following variables.

`id`	Identification of patients, a factor.
`trt`	Treatments 1 or 2, a factor.
`time`	Weeks after randomization.
`infect`	Binary indicator of bacteriuria (bacteria in urine).

Source

Joe, H. (1997), Multivariate Models and Dependence Concepts. London: Chapman and Hall.

References

Gu, C. and Ma, P. (2005), Generalized nonparametric mixed-effect models: computation and smoothing parameter selection. Journal of Computational and Graphical Statistics, 14, 485–504.

Buffalo Annual Snowfall

Description

Annual snowfall accumulations in Buffalo, NY from 1910 to 1973.

Usage

data(buffalo)

Format

A vector of 63 numerical values.

Source

Scott, D. W. (1985), Average shifted histograms: Effective nonparametric density estimators in several dimensions. The Annals of Statistics, 13, 1024–1040.

Evaluating Conditional PDF, CDF, and Quantiles of Smoothing Spline Conditional Density Estimates

Description

Evaluate conditional pdf, cdf, and quantiles of f(y1|x,y2) for smoothing spline conditional density estimates f(y|x).

Usage

cdsscden(object, y, x, cond, int=NULL)
cpsscden(object, q, x, cond)
cqsscden(object, p, x, cond)

Arguments

object

Object of class "sscden" or "sscden1".

x

Data frame of x values on which conditional density f(y1|x,y2) is to be evaluated.

y

Data frame or vector of y1 points on which conditional density f(y1|x,y2) is to be evaluated.

cond

One row data frame of conditioning variables y2.

q

Vector of points on which cdf is to be evaluated.

p

Vector of probabilities for which quantiles are to be calculated.

int

Vector of normalizing constants.

Details

The arguments x and y are of the same form as the argument newdata in predict.lm, but y in cdsscden can take a vector for 1-D y1.

cpsscden and cqsscden naturally only work for 1-D y1.

Value

cdsscden returns a list object with the following elements.

pdf

Matrix or vector of conditional pdf f(y1|x,y2), with each column corresponding to a distinct x value.

int

Vector of normalizing constants.

cpsscden and cqsscden return a matrix or vector of conditional cdf or quantiles of f(y1|x,y2).

Note

If variables other than factors or numerical vectors are involved in y1, the normalizing constants can not be computed.

Evaluating 1-D Conditional PDF, CDF, and Quantiles of Copula Density Estimates

Description

Evaluate conditional pdf, cdf, and quantiles of copula density estimates.

Usage

cdsscopu(object, x, cond, pos=1, int=NULL)
cpsscopu(object, q, cond, pos=1)
cqsscopu(object, p, cond, pos=1)

Arguments

object

Object of class "sscopu".

x

Vector of points on which conditional pdf is to be evaluated.

cond

Value of conditioning variables.

pos

Position of variable of interest.

int

Normalizing constant.

q

Vector of points on which conditional cdf is to be evaluated.

p

Vector of probabilities for which conditional quantiles are to be calculated.

Value

A vector of conditional pdf, cdf, or quantiles.

Evaluating Conditional PDF, CDF, and Quantiles of Smoothing Spline Density Estimates

Description

Evaluate conditional pdf, cdf, and quantiles for smoothing spline density estimates.

Usage

cdssden(object, x, cond, int=NULL)
cpssden(object, q, cond)
cqssden(object, p, cond)

Arguments

object

Object of class "ssden".

x

Data frame or vector of points on which conditional density is to be evaluated.

cond

One row data frame of conditioning variables.

int

Normalizing constant.

q

Vector of points on which conditional cdf is to be evaluated.

p

Vector of probabilities for which conditional quantiles are to be calculated.

Details

The argument x in cdssden is of the same form as the argument newdata in predict.lm, but can take a vector for 1-D conditional densities.

cpssden and cqssden naturally only work for 1-D conditional densities of a numerical variable.

Value

cdssden returns a list object with the following elements.

pdf

Vector of conditional pdf.

int

Normalizing constant.

cpssden and cqssden return a vector of conditional cdf or quantiles.

Note

If variables other than factors or numerical vectors are involved in x, the normalizing constant can not be computed.

Average Temperatures During December 1980 Through February 1981

Description

Average temperatures at 690 weather stations during December 1980 through February 1981.

Usage

data(clim)

Format

A data frame containing 690 observations on the following variables.

`temp`	Average temperature, in Celsius.
`geog`	Geographic location (latitude,longitude), in degrees, as a matrix.

Source

This is reformulated from the data frame climate in the R package assist by Yuedong Wang and Chunlei Ke.

Numerical Engine for ssden, sshzd, and sshzd1

Description

Perform numerical calculations for the ssden and sshzd suites.

Usage

sspdsty(s, r, q, cnt, qd.s, qd.r, qd.wt, prec, maxiter, alpha, bias)
mspdsty(s, r, id.basis, cnt, qd.s, qd.r, qd.wt, prec, maxiter, alpha,
bias, skip.iter)

sspdsty1(s, r, q, cnt, int, prec, maxiter, alpha)
mspdsty1(s, r, id.basis, cnt, int, prec, maxiter, alpha)

mspcdsty(s, r, id.basis, cnt, qd.s, qd.r, xx.wt, qd.wt, prec, maxiter, alpha, skip.iter)

mspcdsty1(s, r, id.basis, cnt, int.s, int.r, prec, maxiter, alpha, skip.iter)

msphzd(s, r, id.wk, Nobs, cnt, qd.s, qd.r, qd.wt, random, prec, maxiter, alpha, skip.iter)

msphzd1(s, r, id.wk, Nobs, cnt, int.s, int.r, rho, random, prec, maxiter, alpha,
        skip.iter)

sspcox(s, r, q, cnt, qd.s, qd.r, qd.wt, prec, maxiter, alpha, random, bias)
mspcox(s, r, id.basis, cnt, qd.s, qd.r, qd.wt, prec, maxiter, alpha, random, bias,
       skip.iter)

mspllrm(s, r, id.basis, cnt, qd.s, qd.r, xx.wt, qd.wt, random, prec, maxiter, alpha,
        skip.iter)

Arguments

s

Unpenalized terms evaluated at data points.

r

Basis of penalized terms evaluated at data points.

q

Penalty matrix.

id.basis

Index of observations to be used as "knots."

id.wk

Index of observations to be used as "knots."

Nobs

Total number of lifetime observations.

cnt

Bin-counts for histogram data.

qd.s

Unpenalized terms evaluated at quadrature nodes.

qd.r

Basis of penalized terms evaluated at quadrature nodes.

qd.wt

Quadrature weights.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

alpha

Parameter defining cross-validation score for smoothing parameter selection.

bias

List of arrays incorporating possible sampling bias.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration.

int

Integrals of basis terms.

int.s

Integrals of unpenalized terms.

int.r

Integrals of basis of penalized terms.

rho

rho function value on failure times.

xx.wt

Weights at unique x.

random

Input for parametric random effects in nonparametric mixed-effect models.

Details

sspdsty is used by ssden to compute cross-validated density estimate with a single smoothing parameter. mspdsty is used by ssden to compute cross-validated density estimate with multiple smoothing parameters.

msphzd is used by sshzd to compute cross-validated hazard estimate with single or multiple smoothing parameters.

References

Du, P. and Gu, C. (2006), Penalized likelihood hazard estimation: efficient approximation and Bayesian confidence intervals. Statistics and Probability Letters, 76, 244–254.

Du, P. and Gu, C. (2009), Penalized Pseudo-Likelihood Hazard Estimation: A Fast Alternative to Penalized Likelihood. Journal of Statistical Planning and Inference, 139, 891–899.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. and Wang, J. (2003), Penalized likelihood density estimation: Direct cross-validation and scalable approximation. Statistica Sinica, 13, 811–826.

Evaluating PDF, CDF, and Quantiles of Smoothing Spline Conditional Density Estimates

Description

Evaluate pdf, cdf, and quantiles for smoothing spline conditional density estimates.

Usage

dsscden(object, y, x)
psscden(object, q, x)
qsscden(object, p, x)
d.sscden(object, x, y)
d.sscden1(object, x, y, scale=TRUE)

Arguments

object

Object of class "sscden" or "sscden1".

x

Data frame of x values on which conditional density f(y|x) is to be evaluated.

y

Data frame or vector of points on which conditional density f(y|x) is to be evaluated.

q

Vector of points on which cdf is to be evaluated.

p

Vector of probabilities for which quantiles are to be calculated.

scale

Flag indicating whether to use approximate scaling without quadrature.

Details

The arguments x and y are of the same form as the argument newdata in predict.lm, but y in dsscden can take a vector for 1-D responses.

psscden and qsscden naturally only work for 1-D responses.

Value

A matrix or vector of pdf, cdf, or quantiles of f(y|x), with each column corresponding to a distinct x value.

Evaluating Copula Density Estimates

Description

Evaluate copula density estimates.

Usage

dsscopu(object, x, copu=TRUE)

Arguments

object

Object of class "sscopu".

x

Vector or matrix of point(s) on which copula density is to be evaluated.

copu

Flag indicating whether to apply copularization.

Value

A vector of copula density values.

Evaluating PDF, CDF, and Quantiles of Smoothing Spline Density Estimates

Description

Evaluate pdf, cdf, and quantiles for smoothing spline density estimates.

Usage

dssden(object, x)
pssden(object, q)
qssden(object, p)
d.ssden(object, x)
d.ssden1(object, x)

Arguments

object

Object of class "ssden".

x

Data frame or vector of points on which density is to be evaluated.

q

Vector of points on which cdf is to be evaluated.

p

Vector of probabilities for which quantiles are to be calculated.

Details

The argument x in dssden is of the same form as the argument newdata in predict.lm, but can take a vector for 1-D densities.

pssden and qssden naturally only work for 1-D densities.

Value

A vector of pdf, cdf, or quantiles.

Embryonic Stem Cell from Mouse

Description

Data concerning mouse embryonic stem cell gene expression and transcription factor association strength.

Usage

data(esc)

Format

A data frame containing 1027 genes with the following variables.

`y1`	Gene expression after 4 days.
`y2`	Gene expression after 8 days.
`y3`	Gene expression after 14 days.
`klf4`	Score of TFAS with KLF4.
`nanog`	Score of TFAS with NANOG.
`oct4`	Score of TFAS with OCT4.
`sox2`	Score of TFAS with SOX2.
`clusterID`	Cluster identification.

References

Cai, J., Xie, D., Fan, Z., Chipperfield, H., Marden, J., Wong, W. H., and Zhong, S. (2010), Modeling co-expression across species for complex traits: insights to the difference of human and mouse embryonic stem cells. PLoS Computational Biology, 6, e1000707.

Ouyang, Z., Zhou, Q., and Wong, W. H. (2009), chip-seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proceedings of the National Academy of Sciences of USA, 106, 21521–21526.

Eyesight Fixation in Eyetracking Experiments

Description

Eyesight Fixation during some eyetracking experiments in liguistic studies.

Usage

data(eyetrack)

Format

A data frame containing 13891 observations on the following variables.

`time`	Time, in ms.
`color`	Binary indicator, 1 if eyesight fixed on target or color competitor, a factor.
`object`	Binary indicator, 1 if eyesight fixed on target or object competitor, a factor.
`id`	Identification of homogeneous sessions, a factor.
`cnt`	Multiplicity count.

Source

Dr. Anouschka Foltz.

References

Gu, C. and Ma, P. (2011), Nonparametric regression with cross-classified responses. Manuscript.

Utility Functions for Error Families

Description

Utility functions for fitting Smoothing Spline ANOVA models with non-Gaussian responses.

Usage

mkdata.binomial(y, eta, wt, offset)
dev.resid.binomial(y, eta, wt)
dev.null.binomial(y, wt, offset)
cv.binomial(y, eta, wt, hat, alpha)
y0.binomial(y, eta0, wt)
proj0.binomial(y0, eta, offset)
kl.binomial(eta0, eta1, wt)
cfit.binomial(y, wt, offset)

mkdata.poisson(y, eta, wt, offset)
dev.resid.poisson(y, eta, wt)
dev.null.poisson(y, wt, offset)
cv.poisson(y, eta, wt, hat, alpha, sr, q)
y0.poisson(eta0)
proj0.poisson(y0, eta, wt, offset)
kl.poisson(eta0, eta1, wt)
cfit.poisson(y, wt, offset)

mkdata.Gamma(y, eta, wt, offset)
dev.resid.Gamma(y, eta, wt)
dev.null.Gamma(y, wt, offset)
cv.Gamma(y, eta, wt, hat, rss, alpha)
y0.Gamma(eta0)
proj0.Gamma(y0, eta, wt, offset)
kl.Gamma(eta0, eta1, wt)
cfit.Gamma(y, wt, offset)

mkdata.inverse.gaussian(y, eta, wt, offset)
dev.resid.inverse.gaussian(y, eta, wt)
dev.null.inverse.gaussian(y, wt, offset)
cv.inverse.gaussian(y, eta, wt, hat, rss, alpha)
y0.inverse.gaussian(eta0)
proj0.inverse.gaussian(y0, eta, wt, offset)
kl.inverse.gaussian(eta0, eta1, wt)
cfit.inverse.gaussian(y, wt, offset)

mkdata.nbinomial(y, eta, wt, offset, nu)
dev.resid.nbinomial(y, eta, wt)
dev.null.nbinomial(y, wt, offset)
cv.nbinomial(y, eta, wt, hat, alpha)
y0.nbinomial(y,eta0,nu)
proj0.nbinomial(y0, eta, wt, offset)
kl.nbinomial(eta0, eta1, wt, nu)
cfit.nbinomial(y, wt, offset, nu)

mkdata.polr(y, eta, wt, offset, nu)
dev.resid.polr(y, eta, wt, nu)
dev.null.polr(y, wt, offset)
cv.polr(y, eta, wt, hat, nu, alpha)
y0.polr(eta0)
proj0.polr(y0, eta, wt, offset, nu)
kl.polr(eta0, eta1, wt)
cfit.polr(y, wt, offset)

mkdata.weibull(y, eta, wt, offset, nu)
dev.resid.weibull(y, eta, wt, nu)
dev.null.weibull(y, wt, offset, nu)
cv.weibull(y, eta, wt, hat, nu, alpha)
y0.weibull(y, eta0, nu)
proj0.weibull(y0, eta, wt, offset, nu)
kl.weibull(eta0, eta1, wt, nu, int)
cfit.weibull(y, wt, offset, nu)

mkdata.lognorm(y, eta, wt, offset, nu)
dev.resid.lognorm(y, eta, wt, nu)
dev0.resid.lognorm(y, eta, wt, nu)
dev.null.lognorm(y, wt, offset, nu)
cv.lognorm(y, eta, wt, hat, nu, alpha)
y0.lognorm(y, eta0, nu)
proj0.lognorm(y0, eta, wt, offset, nu)
kl.lognorm(eta0, eta1, wt, nu, y0)
cfit.lognorm(y, wt, offset, nu)

mkdata.loglogis(y, eta, wt, offset, nu)
dev.resid.loglogis(y, eta, wt, nu)
dev0.resid.loglogis(y, eta, wt, nu)
dev.null.loglogis(y, wt, offset, nu)
cv.loglogis(y, eta, wt, hat, nu, alpha)
y0.loglogis(y, eta0, nu)
proj0.loglogis(y0, eta, wt, offset, nu)
kl.loglogis(eta0, eta1, wt, nu, y0)
cfit.loglogis(y, wt, offset, nu)

Arguments

y

Model response.

eta

Fitted values on link scale.

wt

Model weights.

offset

Model offset.

nu

Size for nbinomial. Inverse scale for log life time.

Note

gssanova0 uses mkdata.x, dev.resid.x, and dev.null.x. gssanova uses the above plus dev0.resid.x and cv.x.

y0.x, proj0.x, kl.x, and cfit.x are used by project.gssanova.

Fitted Values and Residuals from Smoothing Spline ANOVA Fits

Description

Methods for extracting fitted values and residuals from smoothing spline ANOVA fits.

Usage

## S3 method for class 'ssanova'
fitted(object, ...)
## S3 method for class 'ssanova'
residuals(object, ...)

## S3 method for class 'gssanova'
fitted(object, ...)
## S3 method for class 'gssanova'
residuals(object, type="working", ...)

Arguments

object

Object of class "ssanova" or "gssanova".

type

Type of residuals desired, with two alternatives "working" (default) or "deviance".

...

Ignored.

Details

The fitted values for "gssanova" objects are on the link scale, so are the "working" residuals.

Gastric Cancer Data

Description

Survival of gastric cancer patients under chemotherapy and chemotherapy-radiotherapy combination.

Usage

data(gastric)

Format

A data frame containing 90 observations on the following variables.

`futime`	Follow-up time, in days.
`status`	Censoring status.
`trt`	Factor indicating the treatments: 1 -- chemothrapy, 2 -- combination.

Source

Moreau, T., O'Quigley, J., and Mesbah, M. (1985), A global goodness-of-fit statistic for the proportional hazards model. Applied Statistics, 34, 212-218.

Generating Gauss-Legendre Quadrature

Description

Generate Gauss-Legendre quadratures using the FORTRAN routine gaussq.f found on NETLIB.

Usage

gauss.quad(size, interval)

Arguments

size

Size of quadrature.

interval

Interval to be covered.

Value

gauss.quad returns a list object with the following elements.

pt

Quadrature nodes.

wt

Quadrature weights.

Fitting Smoothing Spline ANOVA Models with Non-Gaussian Responses

Description

Fit smoothing spline ANOVA models in non-Gaussian regression. The symbolic model specification via formula follows the same rules as in lm and glm.

Usage

gssanova(formula, family, type=NULL, data=list(), weights, subset,
         offset, na.action=na.omit, partial=NULL, alpha=NULL, nu=NULL,
         id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL,
         skip.iter=FALSE)

Arguments

formula

Symbolic description of the model to be fit.

family

Description of the error distribution. Supported are exponential families "binomial", "poisson", "Gamma", "inverse.gaussian", and "nbinomial". Also supported are accelerated life model families "weibull", "lognorm", and "loglogis". Further more, proportional odds logistic regression "polr" for ordinal response is also supported.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

weights

Optional vector of weights to be used in the fitting process.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset

Optional offset term with known parameter 1.

na.action

Function which indicates what should happen when the data contain NAs.

partial

Optional symbolic description of parametric terms in partial spline models.

alpha

Tuning parameter defining cross-validation; larger values yield smoother fits. Defaults are alpha=1 for family="binomial" and alpha=1.4 otherwise.

nu

Inverse scale parameter in accelerated life model families. Ignored for exponential families.

id.basis

Index designating selected "knots".

nbasis

Number of "knots" to be selected. Ignored when id.basis is supplied.

seed

Seed for reproducible random selection of "knots". Ignored when id.basis is supplied.

random

Input for parametric random effects in nonparametric mixed-effect models. See mkran for details.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

Details

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form

y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

with the terms denoted by "1", "x1", "x2", and "x1:x2".

The model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

Only one link is implemented for each family. It is the logit link for "binomial", and the log link for "poisson", and "Gamma". For "nbinomial", the working parameter is the logit of the probability p; see NegBinomial. For "weibull", "lognorm", and "loglogis", it is the location parameter for the log lifetime.

The selection of smoothing parameters is through direct cross-validation. The cross-validation score used for family="poisson" is taken from density estimation as in Gu and Wang (2003), and those used for other families are derived following the lines of Gu and Xiang (2001).

A subset of the observations are selected as "knots." Unless specified via id.basis or nbasis, the number of "knots" q is determined by max(30,10n^{2/9}), which is appropriate for the default cubic splines for numerical vectors.

Value

gssanova returns a list object of class c("gssanova","ssanova").

The method summary.gssanova can be used to obtain summaries of the fits. The method predict.ssanova can be used to evaluate the fits at arbitrary points along with standard errors, on the link scale. The method project.gssanova can be used to calculate the Kullback-Leibler projection for model selection. The methods residuals.gssanova and fitted.gssanova extract the respective traits from the fits.

Responses

For family="binomial", the response can be specified either as two columns of counts or as a column of sample proportions plus a column of total counts entered through the argument weights, as in glm.

For family="nbinomial", the response may be specified as two columns with the second being the known sizes, or simply as a single column with the common unknown size to be estimated through the maximum likelihood.

For family="weibull", "lognorm", or "loglogis", the response consists of three columns, with the first giving the follow-up time, the second the censoring status, and the third the left-truncation time. For data with no truncation, the third column can be omitted.

For family="polr", the response should be an ordered factor.

Note

For simpler models and moderate sample sizes, the exact solution of gssanova0 can be faster.

The results may vary from run to run. For consistency, specify id.basis or set seed.

In gss versions earlier than 1.0, gssanova was under the name gssanova1.

References

Gu, C. and Xiang, D. (2001), Cross validating non Gaussian data: generalized approximate cross validation revisited. Journal of Computational and Graphical Statistics, 10, 581–591.

Gu, C. and Wang, J. (2003), Penalized likelihood density estimation: Direct cross-validation and scalable approximation. Statistica Sinica, 13, 811–826.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

## Fit a cubic smoothing spline logistic regression model
test <- function(x)
        {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2}
x <- (0:100)/100
p <- 1-1/(1+exp(test(x)))
y <- rbinom(x,3,p)
logit.fit <- gssanova(cbind(y,3-y)~x,family="binomial")
## The same fit
logit.fit1 <- gssanova(y/3~x,"binomial",weights=rep(3,101),
                       id.basis=logit.fit$id.basis)
## Obtain estimates and standard errors on a grid
est <- predict(logit.fit,data.frame(x=x),se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y/3,ylab="p")
lines(x,p,col=1)
lines(x,1-1/(1+exp(est$fit)),col=2)
lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3)
lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3)

## Fit a mixed-effect logistic model
data(bacteriuria)
bact.fit <- gssanova(infect~trt+time,family="binomial",data=bacteriuria,
                     id.basis=(1:820)[bacteriuria$id%in%c(3,38)],random=~1|id)
## Predict fixed effects
predict(bact.fit,data.frame(time=2:16,trt=as.factor(rep(1,15))),se=TRUE)
## Estimated random effects
bact.fit$b

## Clean up
## Not run: rm(test,x,p,y,logit.fit,logit.fit1,est,bacteriuria,bact.fit)
dev.off()
## End(Not run)

Fitting Smoothing Spline ANOVA Models with Non-Gaussian Responses

Description

Fit smoothing spline ANOVA models in non-Gaussian regression. The symbolic model specification via formula follows the same rules as in lm and glm.

Usage

gssanova0(formula, family, type=NULL, data=list(), weights, subset,
          offset, na.action=na.omit, partial=NULL, method=NULL,
          varht=1, nu=NULL, prec=1e-7, maxiter=30)

gssanova1(formula, family, type=NULL, data=list(), weights, subset,
          offset, na.action=na.omit, partial=NULL, method=NULL,
          varht=1, alpha=1.4, nu=NULL, id.basis=NULL, nbasis=NULL,
          seed=NULL, random=NULL, skip.iter=FALSE)

Arguments

formula

Symbolic description of the model to be fit.

family

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

weights

Optional vector of weights to be used in the fitting process.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset

Optional offset term with known parameter 1.

na.action

Function which indicates what should happen when the data contain NAs.

partial

Optional symbolic description of parametric terms in partial spline models.

method

Score used to drive the performance-oriented iteration. Supported are method="v" for GCV, method="m" for GML, and method="u" for Mallows' CL.

varht

Dispersion parameter needed for method="u". Ignored when method="v" or method="m" are specified.

nu

Inverse scale parameter in accelerated life model families. Ignored for exponential families.

prec

Precision requirement for the iterations.

maxiter

Maximum number of iterations allowed for performance-oriented iteration, and for inner-loop multiple smoothing parameter selection when applicable.

alpha

Tuning parameter modifying GCV or Mallows' CL.

id.basis

Index designating selected "knots".

nbasis

Number of "knots" to be selected. Ignored when id.basis is supplied.

seed

Seed for reproducible random selection of "knots". Ignored when id.basis is supplied.

random

Input for parametric random effects in nonparametric mixed-effect models. See mkran for details.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

Details

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form

y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

with the terms denoted by "1", "x1", "x2", and "x1:x2".

Only one link is implemented for each family. It is the logit link for "binomial", and the log link for "poisson", "Gamma", and "inverse.gaussian". For "nbinomial", the working parameter is the logit of the probability p; see NegBinomial. For "weibull", "lognorm", and "loglogis", it is the location parameter for the log lifetime.

The models are fitted by penalized likelihood method through the performance-oriented iteration as described in the reference. For family="binomial", "poisson", "nbinomial", "weibull", "lognorm", and "loglogis", the score driving the performance-oriented iteration defaults to method="u" with varht=1. For family="Gamma" and "inverse.gaussian", the default is method="v".

gssanova0 uses the algorithm of ssanova0 for the iterated penalized least squares problems, whereas gssanova1 uses the algorithm of ssanova.

In gssanova1, a subset of the observations are selected as "knots." Unless specified via id.basis or nbasis, the number of "knots" q is determined by max(30,10n^{2/9}), which is appropriate for the default cubic splines for numerical vectors.

Value

gssanova0 returns a list object of class c("gssanova0","ssanova0","gssanova").

gssanova1 returns a list object of class c("gssanova","ssanova").

The method summary.gssanova0 or summary.gssanova can be used to obtain summaries of the fits. The method predict.ssanova0 or predict.ssanova can be used to evaluate the fits at arbitrary points along with standard errors, on the link scale. The methods residuals.gssanova and fitted.gssanova extract the respective traits from the fits.

Responses

For family="polr", the response should be an ordered factor.

Note

The direct cross-validation of gssanova can be more effective, and more stable for complex models.

For large sample sizes, the approximate solutions of gssanova1 and gssanova can be faster than gssanova0.

The results from gssanova1 may vary from run to run. For consistency, specify id.basis or set seed.

The method project is not implemented for gssanova0, nor is the mixed-effect model support through mkran.

In gss versions earlier than 1.0, gssanova0 was under the name gssanova.

References

Gu, C. (1992), Cross-validating non Gaussian data. Journal of Computational and Graphical Statistics, 1, 169-179.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

GU, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

## Fit a cubic smoothing spline logistic regression model
test <- function(x)
        {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2}
x <- (0:100)/100
p <- 1-1/(1+exp(test(x)))
y <- rbinom(x,3,p)
logit.fit <- gssanova0(cbind(y,3-y)~x,family="binomial")
## The same fit
logit.fit1 <- gssanova0(y/3~x,"binomial",weights=rep(3,101))
## Obtain estimates and standard errors on a grid
est <- predict(logit.fit,data.frame(x=x),se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y/3,ylab="p")
lines(x,p,col=1)
lines(x,1-1/(1+exp(est$fit)),col=2)
lines(x,1-1/(1+exp(est$fit+1.96*est$se)),col=3)
lines(x,1-1/(1+exp(est$fit-1.96*est$se)),col=3)
## Clean up
## Not run: rm(test,x,p,y,logit.fit,logit.fit1,est)
dev.off()
## End(Not run)

Evaluating Smoothing Spline Hazard Estimates

Description

Evaluate smoothing spline hazard estimates by sshzd.

Usage

hzdrate.sshzd(object, x, se=FALSE, include=c(object$terms$labels,object$lab.p))
hzdcurve.sshzd(object, time, covariates=NULL, se=FALSE)
survexp.sshzd(object, time, covariates=NULL, start=0)

Arguments

object

Object of class "sshzd".

x

Data frame or vector of points on which hazard is to be evaluated.

se

Flag indicating if standard errors are required.

include

List of model terms to be included in the evaluation.

time

Vector of time points.

covariates

Vector of covariate values.

start

Optional starting times of the intervals.

Value

For se=FALSE, hzdrate.sshzd returns a vector of hazard evaluations, and hzdcurve.sshzd returns a vector or columns of hazard curve(s) evaluated on time points at the covariates values. For se=TRUE, hzdrate.sshzd and hzdcurve.sshzd return a list consisting of the following elements.

fit

Vector or columns of hazard.

se.fit

Vector or columns of standard errors for log hazard.

survexp.sshzd returns a vector or columns of expected survivals based on the cumulative hazards over (start, time) at the covariates values, which in fact are the (conditional) survival probabilities S(time)/S(start).

Note

For left-truncated data, start must be at or after the earliest truncation point.

Evaluating 2-D Smoothing Spline Hazard Estimates

Description

Evaluate 2-D smoothing spline hazard estimates by sshzd2d.

Usage

hzdrate.sshzd2d(object, time, covariates=NULL)
survexp.sshzd2d(object, time, covariates=NULL, job=3)

Arguments

object

Object of class "sshzd2d".

time

Matrix or vector of time points on which hazard or survival function is to be evaluated.

covariates

Data frame of covariate values.

job

Flag indicating which survival function to evaluate.

Value

A vector of hazard or survival values.

Note

For job=1,2, survexp.sshzd2d returns marginal survival S1(t) or S2(t). For job=3, survexp.sshzd2d returns the 2-D survival S(t1,t2).

For hzdrate.sshzd2d and survexp.sshzd2d with job=3, time should be a matrix of two columns. For survexp.sshzd2d with job=1,2, time should be a vector.

When covariates is present, its length should be either 1 or that of time.

Generating Covariance for Correlated Data

Description

Generate entries of covariance functions for correlated data.

Usage

mkcov.arma(p, q, n)
mkcov.long(id)
mkcov.known(w)

Arguments

p

Order of AR terms.

q

Order of MA terms.

n

Dimension of covariance matrix.

id

Factor of subject ID.

w

Covariance matrix; only the upper triangular part is used.

Details

mkcov.arma generates covariance functions for ARMA(p,q) model.

mkcov.long generates covariance functions for longitudinal data.

mkcov.known allows one to use a known covariance matrix in ssanova9.

Value

A list of three elements.

fun

Covariance matrix to be evaluated through fun(gamma,env) or fun(env).

env

Constants in covariance function.

init

Initial values for correlation parameters.

Note

One may pass list(fun=...,env=...,init=...) directly to the argument cov in calls to ssanova9, or make use of the mkcov.x functions through cov=list("arma",c(p,q)), cov=list("long",id), or cov=list("known",w).

Crafting Building Blocks for Polynomial Splines

Description

Craft numerical functions to be used by mkterm to assemble model terms.

Usage

mkrk.cubic(range)
mkphi.cubic(range)
mkrk.trig(range)
mkphi.trig(range)
mkrk.cubic.per(range)
mkrk.linear(range)
mkrk.linear.per(range)

Arguments

range

Numerical vector whose minimum and maximum specify the range on which the function to be crafted is defined.

Details

mkrk.cubic, mkphi.cubic, and mkrk.linear implement the polynomial spline construction in Gu (2002, Sec. 2.3.3) for m=2,1.

mkrk.cubic.per and mkrk.linear.per implement the periodic polynomial spline construction in Gu (2002, Sec. 4.2.1) for m=2,1.

Value

A list of two elements.

fun

Function definition.

env

Portable local constants derived from the argument.

Note

mkrk.x create a bivariate function fun(x,y,env,outer=FALSE), where x, y are real arguments and local constants can be passed in through env.

mkphi.cubic creates a univariate function fun(x,nu,env).

References

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Crafting Building Blocks for Thin-Plate and Spherical Splines

Description

Craft numerical functions to be used by mkterm to assemble model terms.

Usage

mkrk.tp(dm, order, mesh, weight)
mkphi.tp(dm, order, mesh, weight)
mkrk.tp.p(dm, order)
mkphi.tp.p(dm, order)

mkrk.sphere(order)

Arguments

dm

Dimension of the variable d.

order

Order of the differential operator m.

mesh

Normalizing mesh.

weight

Normalizing weights.

Details

mkrk.tp, mkphi.tp, mkrk.tp.p, and mkphi.tp.p implement the construction in Gu (2002, Sec. 4.4). Thin-plate splines are defined for 2m>d.

mkrk.tp.p generates the pseudo kernel, and mkphi.tp.p generates the (m+d-1)!/d!/(m-1)! lower order polynomials with total order less than m.

mkphi.tp generates normalized lower order polynomials orthonormal w.r.t. a norm specified by mesh and weight, and mkrk.tp conditions the pseudo kernel to generate the reproducing kernel orthogonal to the lower order polynomials w.r.t. the norm.

mkrk.sphere implements the reproducing kernel construction of Wahba (1981) for m=2,3,4.

Value

A list of two elements.

fun

Function definition.

env

Portable local constants derived from the arguments.

Note

mkrk.tp and mkrk.sphere create a bivariate function fun(x,y,env,outer=FALSE), where x, y are real arguments and local constants can be passed in through env.

mkphi.tp creates a collection of univariate functions fun(x,nu,env), where x is the argument and nu is the index.

References

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Wahba, G. (1981), Spline interpolation and smoothing on the sphere. SIAM Journal on Scientific and Statistical Computing, 2, 5–16.

Generating Integrals of Basis Terms

Description

Generate integrals of basis terms for use in the ssden1 suite.

Usage

mkint(mf, type, id.basis, quad, term, rho, rho.int)
mkint2(mf, type, id.basis, quad, term)

Arguments

mf

Model frame of the model formula.

type

List specifying the type of spline for each variable.

id.basis

Index of observations to be used as "knots."

quad

Quadratures on marginal domains, weighted by rho.

term

Model terms generated by mkterm.

rho

Marginal log(rho) on quadrature points.

rho.int

Marginal integrals of log(rho).

Details

mkint calculates the first moments of basis functions with respect to the indepent joint density rho; mkint2 calculates the second moments for use in project.ssden1.

Value

mkint returns a list of five elements.

s

First moments of phi's.

r

First moments of rk's.

s.rho

Cross moments of phi*log(rho).

r.rho

Cross moments of rk*log(rho).

var.type

Types for variables.

mkint2 returns a list of three elements.

ss

Second moments of phi's.

sr

Cross moments of phi's and rk's.

rr

Second moments of rk.

Generating Random Effects in Mixed-Effect Models

Description

Generate entries representing random effects in mixed-effect models.

Usage

mkran(formula, data)
mkran1(ran1, ran2)

Arguments

formula

Symbolic description of the random effects.

data

Data frame containing the variables in the model.

ran1

Random effects in the form of the value of mkran

ran2

Random effects in the form of the value of mkran

Details

mkran generates random effect terms from simple grouping variables, for use in nonparametric mixed-effect models as described in Gu and Ma (2005a, b). The syntax of the formula resembles that of similar utilities for linear and nonlinear mixed-effect models, as described in Pinheiro and Bates (2000).

Currently, mkran takes only two kinds of basic formulas, ~1|grp2 or ~grp1|grp2. Both grp1 and grp2 should be factors, and for the second formula, the levels of grp2 should be nested under those of grp1.

The Z matrix is determined by grp2. When observations are ordered according to the levels of grp2, the Z matrix is block diagonal of 1 vectors.

The Sigma matrix is diagonal. For ~1|grp2, it has one tuning parameter. For ~grp1|grp2, the number of parameters equals the number of levels of grp1, with each parameter shared by the grp2 levels nested under the same grp1 level.

mkran1 adds together two independent random effects, and can be used recursively to add more than two terms. The arguments are of the form of the value of mkran or mkran1, which may or may not be created by mkran or mkran1.

Multiple terms of random effects can also be specified via the likes of mkran(~1|grp1+1|grp2,data), which is equivalent to mkran1(mkran(~1|grp1,data),mkran(~1|grp2,data)).

Value

A list of three elements.

z

Z matrix.

sigma

Sigma matrix to be evaluated through sigma$fun(para,sigma$env).

init

Initial parameter values.

Note

One may pass a formula or a list to the argument random in calls to ssanova orgssanova to fit nonparametric mixed-effect models. A formula will be converted to a list using mkran. A list should be of the same form as the value of mkran.

References

Gu, C. and Ma, P. (2005), Optimal smoothing in nonparametric mixed-effect models. The Annals of Statistics, 33, 1357–1379.

Gu, C. and Ma, P. (2005), Generalized nonparametric mixed-effect models: computation and smoothing parameter selection. Journal of Computational and Graphical Statistics, 14, 485–504.

Pinheiro and Bates (2000), Mixed-Effects Models in S and S-PLUS. New York: Springer-Verlag.

Examples

## Toy data
test <- data.frame(grp=as.factor(rep(1:2,c(2,3))))
## First formula
ran.test <- mkran(~1|grp,test)
ran.test$z
ran.test$sigma$fun(2,ran.test$sigma$env) # diag(10^(-2),2)
## Second formula
ran.test <- mkran(~grp|grp,test)
ran.test$z
ran.test$sigma$fun(c(1,2),ran.test$sigma$env) # diag(10^(-1),10^(-2))
## Clean up
## Not run: rm(test,ran.test)

Crafting Building Blocks for Discrete Splines

Description

Craft numerical functions to be used by mkterm to assemble model terms involving factors.

Usage

mkrk.nominal(levels)
mkrk.ordinal(levels)

Arguments

levels

Levels of the factor.

Details

For a nominal factor with levels 1,2,\dots,k, the level means f(i) will be shrunk towards each other through a penalty proportional to

(f(1)-f(.))^2+\dots+(f(k)-f(.))^2

where f(.)=(f(1)+\dots+f(k))/k.

For a ordinal factor with levels 1<2<\dots<k, the level means f(i) will be shrunk towards each other through a penalty proportional to

(f(1)-f(2))^2+\dots+(f(k-1)-f(k))^2

Value

A list of two elements.

fun

Function definition.

env

Portable local constants derived from the arguments.

Note

mkrk.x create a bivariate function fun(x,y,env,outer=FALSE), where x, y are real arguments and local constants can be passed in through env.

Assembling Model Terms for Smoothing Spline ANOVA Models

Description

Assemble numerical functions for calculating model terms in a smoothing spline ANOVA model.

Usage

mkterm(mf, type)

Arguments

mf

Model frame of the model formula.

type

List specifying the type of spline for each variable.

Details

For a factor x, type$x is ignored; mkrk.ordinal is used if is.ordered(x)==TRUE and mkrk.nominal is used otherwise. Factors with 3 or more levels are penalized.

For a numerical vector x, type$x is of the form type.x for type.x="cubic", "linear", or of the form list(type.x, range) for type.x="per", "cubic.per", "linear.per", "cubic", "linear"; "per" is short for "cubic.per". See mkfun.poly for the functions used. For type.x missing, the default is "cubic". For range missing with type.x="cubic", "linear", the default is c(min(x),max(x))+c(-1,1)*(max(x)-mimn(x))*.05.

For a numerical matrix x, type$x is of the form type.x or list(type.x, order) for type.x="tp", "sphere", or of the form list("tp",list(order=order,mesh=mesh,weight=weight)). See mkfun.tp for the functions used. For type.x missing, the default is "tp". For order missing, the default is 2. For mesh and weight missing with type.x="tp" and order given, the defaults are mesh=x and weight=1.

For a numerical vector or numerical matrix x, one may also use type$x of the form list("custom",list(nphi=nphi,mkphi=mkphi,mkrk=mkrk,env=env)); nphi is the null space dimension excluding the constant, and mkphi is ignored if nphi=0. See examples below. This feature allows the use of other marginal constructions; one may modify mkphi.cubic or mkphi.tp.p for mkphi and modify mkrk.cubic or mkrk.sphere for mkrk.

Value

A list object with an element labels containing the labels of all model terms. For each of the model terms, there is an element holding the numerical functions for calculating the unpenalized and penalized parts within the term.

Background

Tensor-product splines are constructed based on the model formula and the marginal reproducing kernels, as described in Gu (2002, Sec. 2.4). The marginal variables can be factors, numerical vectors, and numerical matrices, as specified in the details section.

One-way ANOVA decompositions are built in the supported marginal constructions, in which one has the constant, a "nonparametric contrast," and possibly also a "parametric contrast." To the "nonparametric contrast" there corresponds a reproducing kernal rk, and to a "parametric contrast" there corresponds a set of null space basis phi. The reproducing kernels and null space basis on the product domain can be constructed from the marginal rk and phi in a systematic manner.

The marginal one-way ANOVA structures induce a multi-way ANOVA structure on the product domain, with model terms consisting of unpenalized "parametric contrasts" and/or penalized "nonparametric contrasts." One only needs to construct rk's and phi's associated with the model terms implied by the model formula.

Note

For a numerical vector x in ssden, the default range is domain$x.

For a numerical matrix x with type.x="sphere", it is assumed that dim(x)[2]==2, x[,1] between [-90,90] the latitude in degrees, and x[,2] between [-180,180] the longitude in degrees.

For backward compatibility, one may set type="cubic", "linear", or "tp", but then the default parameters can not be overridden; the type is simply duplicated for each variable.

References

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Examples

## cubic marginals
x1 <- rnorm(100); x2 <- rnorm(100); y <- 3+5*sin(x1-2*x2)+rnorm(x1)
fit <- ssanova0(y~x1*x2)
## the same fit
fit1 <- ssanova0(y~x1*x2,type=list(x1="cubic"))
## the same fit one more time
par <- list(nphi=1,mkphi=mkphi.cubic,mkrk=mkrk.cubic,
            env=c(min(x2),max(x2))+c(-1,1)*(max(x2)-min(x2))*.05)
fit2 <- ssanova0(y~x1*x2,type=list(x2=list("custom",par)))
## Clean up
## Not run: rm(x1,x2,y,fit,fit1,par,fit2)

## cubic and thin-plate marginals
x1 <- rnorm(100); x2 <- matrix(rnorm(200),100,2)
y <- 3+5*sin(x1-2*x2[,1]*x2[,2])+rnorm(x1)
fit <- ssanova0(y~x1*x2)
## the same fit
fit1 <- ssanova0(y~x1*x2,type=list(x2="tp"))
## the same fit one more time
mkphi.tp1 <- function(x) mkphi.tp(x$dm,x$ord,x$mesh,x$wt)
mkrk.tp1 <- function(x) mkrk.tp(x$dm,x$ord,x$mesh,x$wt)
env <- list(dm=2,ord=2,mesh=x2,wt=1)
par <- list(nphi=2,mkphi=mkphi.tp1,mkrk=mkrk.tp1,env=env)
fit2 <- ssanova0(y~x1*x2,type=list(x2=list("custom",par)))
## Clean up
## Not run: rm(x1,x2,y,fit,fit1,mkphi.tp1,mkrk.tp1,env,par,fit2)

Assembling Model Terms for Copula Density Estimation

Description

Assemble numerical functions for calculating model terms in copula density estimation.

Usage

mkterm.copu(dm, order, symmetry, exclude)

Arguments

dm

Dimension of the domain.

order

Highest order of interactions allowed in log density.

symmetry

Flag indicating whether to enforce symmetry, or invariance under coordinate permutation.

exclude

Pair(s) of marginals whose interactions to be excluded in log density.

Author(s)

Chong Gu, chong@stat.purdue.edu

Minimizing Univariate Functions on Finite Intervals

Description

Minimize univariate functions on finite intervals using 3-point quadratic fit, with golden-section safe-guard.

Usage

nlm0(fun, range, prec=1e-7)

Arguments

fun

Function to be minimized.

range

Interval on which the function to be minimized.

prec

Desired precision of the solution.

Value

nlm0 returns a list object with the following elements.

estimate

Minimizer.

minimum

Minimum.

evaluations

Number of function evaluations.

NOx in Engine Exhaust

Description

Data from an experiment in which a single-cylinder engine was run with ethanol to see how the NOx concentration in the exhaust depended on the compression ratio and the equivalence ratio.

Usage

data(nox)

Format

A data frame containing 88 observations on the following variables.

`nox`	NOx concentration in exhaust.
`comp`	Compression ratio.
`equi`	Equivalence ratio.

Source

Brinkman, N. D. (1981), Ethanol fuel – a single-cylinder engine study of efficiency and exhaust emissions. SAE Transactions, 90, 1410–1424.

References

Cleveland, W. S. and Devlin, S. J. (1988), Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association, 83, 596–610.

Breiman, L. (1991), The pi method for estimating multivariate functions from noisy data. Technometrics, 33, 125–160.

Ozone Concentration in Los Angeles Basin

Description

Daily measurements of ozone concentration and eight meteorological quantities in the Los Angeles basin for 330 days of 1976.

Usage

data(ozone)

Format

A data frame containing 330 observations on the following variables.

`upo3`	Upland ozone concentration, in ppm.
`vdht`	Vandenberg 500 millibar height, in meters.
`wdsp`	Wind speed, in miles per hour.
`hmdt`	Humidity.
`sbtp`	Sandburg Air Base temperature, in Celsius.
`ibht`	Inversion base height, in foot.
`dgpg`	Dagget pressure gradient, in mmHg.
`ibtp`	Inversion base temperature, in Fahrenheit.
`vsty`	Visibility, in miles.
`day`	Calendar day, between 1 and 366.

Source

Unknown.

References

Breiman, L. and Friedman, J. H. (1985), Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80, 580–598.

Hastie, T. and Tibshirani, R. (1990), Generalized Additive Models. Chapman and Hall.

Thickness of US Lincoln Pennies

Description

Thickness of US Lincoln pennies minted during years 1945 through 1989.

Usage

data(nox)

Format

A data frame containing 90 observations on the following variables.

`year`	Year minted.
`mil`	Thickness in mils.

Source

Scott, D. W. (1992), Multivariate Density Estimation: Theory, Practice and Visualization. New York: Wiley.

References

Gu, C. (1995), Smoothing spline density estimation: Conditional distribution, Statistica Sinica, 5, 709–726.

Scott, D. W. (1992), Multivariate Density Estimation: Theory, Practice and Visualization. New York: Wiley.

Predicting from Smoothing Spline ANOVA Fits

Description

Evaluate terms in a smoothing spline ANOVA fit at arbitrary points. Standard errors of the terms can be requested for use in constructing Bayesian confidence intervals.

Usage

## S3 method for class 'ssanova'
predict(object, newdata, se.fit=FALSE,
                          include=c(object$terms$labels,object$lab.p), ...)
## S3 method for class 'ssanova0'
predict(object, newdata, se.fit=FALSE,
                           include=c(object$terms$labels,object$lab.p), ...)
## S3 method for class 'ssanova'
predict1(object, contr=c(1,-1), newdata, se.fit=TRUE,
                           include=c(object$terms$labels,object$lab.p), ...)

Arguments

object

Object of class inheriting from "ssanova".

newdata

Data frame or model frame in which to predict.

se.fit

Flag indicating if standard errors are required.

include

List of model terms to be included in the prediction. The offset term, if present, is to be specified by "offset".

contr

Contrast coefficients.

...

Ignored.

Value

For se.fit=FALSE, predict.ssanova returns a vector of the evaluated fit.

For se.fit=TRUE, predict.ssanova returns a list consisting of the following elements.

fit

Vector of evaluated fit.

se.fit

Vector of standard errors.

Note

For mixed-effect models through ssanova or gssanova, the Z matrix is set to 0 if not supplied. To supply the Z matrix, add an element random=I(...) in newdata, where the as-is function I(...) preserves the integrity of the Z matrix in data frame.

predict1.ssanova takes a list of data frames in newdata representing x1, x2, etc. By default, it calculates f(x1)-f(x2) along with standard errors. While pairwise contrast is the targeted application, all linear combinations can be computed.

For "gssanova" objects, the results are on the link scale. See also predict9.gssanova.

References

Gu, C. (1992), Penalized likelihood regression: a Bayesian analysis. Statistica Sinica, 2, 255–264.

Gu, C. and Wahba, G. (1993), Smoothing spline ANOVA with component-wise Bayesian "confidence intervals." Journal of Computational and Graphical Statistics, 2, 97–117.

Kim, Y.-J. and Gu, C. (2004), Smoothing spline Gaussian regression: more scalable computation via efficient approximation. Journal of the Royal Statistical Society, Ser. B, 66, 337–356.

Examples

## THE FOLLOWING EXAMPLE IS TIME-CONSUMING
## Not run: 
## Fit a model with cubic and thin-plate marginals, where geog is 2-D
data(LakeAcidity)
fit <- ssanova(ph~log(cal)*geog,,LakeAcidity)
## Obtain estimates and standard errors on a grid
new <- data.frame(cal=1,geog=I(matrix(0,1,2)))
new <- model.frame(~log(cal)+geog,new)
predict(fit,new,se=TRUE)
## Evaluate the geog main effect
predict(fit,new,se=TRUE,inc="geog")
## Evaluate the sum of the geog main effect and the interaction
predict(fit,new,se=TRUE,inc=c("geog","log(cal):geog"))
## Evaluate the geog main effect on a grid
grid <- seq(-.04,.04,len=21)
new <- model.frame(~geog,list(geog=cbind(rep(grid,21),rep(grid,rep(21,21)))))
est <- predict(fit,new,se=TRUE,inc="geog")
## Plot the fit and standard error
par(pty="s")
contour(grid,grid,matrix(est$fit,21,21),col=1)
contour(grid,grid,matrix(est$se,21,21),add=TRUE,col=2)
## Clean up
rm(LakeAcidity,fit,new,grid,est)
dev.off()

## End(Not run)

Evaluating Smoothing Spline ANOVA Estimate of Relative Risk

Description

Evaluate terms in a smoothing spline ANOVA estimate of relative risk at arbitrary points. Standard errors of the terms can be requested for use in constructing Bayesian confidence intervals.

Usage

## S3 method for class 'sscox'
predict(object, newdata, se.fit=FALSE,
                        include=c(object$terms$labels,object$lab.p), ...)

Arguments

object

Object of class "sscox".

newdata

Data frame or model frame in which to predict.

se.fit

Flag indicating if standard errors are required.

include

List of model terms to be included in the prediction.

...

Ignored.

Value

For se.fit=FALSE, predict.sscox returns a vector of the evaluated relative risk.

For se.fit=TRUE, predict.sscox returns a list consisting of the following elements.

fit

Vector of evaluated relative risk.

se.fit

Vector of standard errors for log relative risk.

Note

For mixed-effect models through sscox, the Z matrix is set to 0 if not supplied. To supply the Z matrix, add an element random=I(...) in newdata, where the as-is function I(...) preserves the integrity of the Z matrix in data frame.

Evaluating Log-Linear Regression Model Fits

Description

Evaluate conditional density in a log-linear regression model fit at arbitrary x, or contrast of log conditional density possibly with standard errors for constructing Bayesian confidence intervals.

Usage

## S3 method for class 'ssllrm'
predict(object, x, y=object$qd.pt, odds=NULL, se.odds=FALSE, ...)

Arguments

object

Object of class "ssllrm".

x

Data frame of x values.

y

Data frame of y values; y-variables must be factors.

odds

Optional coefficients of contrast.

se.odds

Flag indicating if standard errors are required. Ignored when odds=NULL.

...

Ignored.

Value

For odds=NULL, predict.ssanova returns a vector/matrix of the estimated f(y|x).

When odds is given, it should match y in length and the coefficients must add to zero; predict.ssanova then returns a vector of estimated "odds ratios" if se.odds=FALSE or a list consisting of the following elements if se.odds=TRUE.

fit

Vector of evaluated fit.

se.fit

Vector of standard errors.

Predicting from Smoothing Spline ANOVA Fits with Non-Gaussian Responses

Description

Evaluate smoothing spline ANOVA fits with non-Gaussian responses at arbitrary points, with results on the response scale.

Usage

## S3 method for class 'gssanova'
predict9(object, newdata, ci=FALSE, level=.95, nu=NULL, ...)

Arguments

object

Object of class inheriting from "gssanova".

newdata

Data frame or model frame in which to predict.

ci

Flag indicating if Bayesian confidence intervals are required. Ignored for family="polr".

level

Confidence level. Ignored when ci=FALSE.

nu

Sizes for "nbinomial" fits with known sizes. Ignored otherwise.

...

Ignored.

Value

For ci=FALSE, predict9.gssanova returns a vector of the evaluated fit,

For ci=TRUE, predict9.gssanova returns a list of three elements.

fit

Vector of evaluated fit on response scale.

lcl

Vector of lower confidence limit on response scale.

ucl

Vector of upper confidence limit on response scale.

For family="polr", predict9.gssanova returns a matrix of probabilities with each row adding up to 1.

Note

For mixed-effect models through gssanova or gssanova1, the Z matrix is set to 0 if not supplied. To supply the Z matrix, add an element random=I(...) in newdata, where the as-is function I(...) preserves the integrity of the Z matrix in data frame.

Unlike on the link scale, partial sums make no sense on the response scale, so all terms are forced in here.

References

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Print Functions for Smoothing Spline ANOVA Models

Description

Print functions for Smoothing Spline ANOVA models.

Usage

## S3 method for class 'ssanova'
print(x, ...)
## S3 method for class 'ssanova0'
print(x, ...)
## S3 method for class 'gssanova'
print(x, ...)
## S3 method for class 'ssden'
print(x, ...)
## S3 method for class 'sscden'
print(x, ...)
## S3 method for class 'sshzd'
print(x, ...)
## S3 method for class 'sscox'
print(x, ...)
## S3 method for class 'ssllrm'
print(x, ...)
## S3 method for class 'summary.ssanova'
print(x, digits=6, ...)
## S3 method for class 'summary.gssanova'
print(x, digits=6, ...)
## S3 method for class 'summary.gssanova0'
print(x, digits=6, ...)

Arguments

x

Object of class ssanova, summary.ssanova, summary.gssanova, or ssden.

digits

Number of significant digits to be printed in values.

...

Ignored.

Projecting Smoothing Spline ANOVA Fits for Model Diagnostics

Description

Calculate Kullback-Leibler projection of smoothing spline ANOVA fits for model diagnostics.

Usage

project(object, ...)
## S3 method for class 'ssanova'
project(object, include, ...)
## S3 method for class 'ssanova9'
project(object, include, ...)
## S3 method for class 'gssanova'
project(object, include, ...)
## S3 method for class 'ssden'
project(object, include, mesh=FALSE, ...)
## S3 method for class 'ssden1'
project(object, include, drop1=FALSE, ...)
## S3 method for class 'sscden'
project(object, include, ...)
## S3 method for class 'sscden1'
project(object, include, ...)
## S3 method for class 'sshzd'
project(object, include, mesh=FALSE, ...)
## S3 method for class 'sscox'
project(object, include, ...)
## S3 method for class 'sshzd1'
project(object, include, ...)
## S3 method for class 'ssllrm'
project(object, include, ...)

Arguments

object

Object of class "ssanova", "gssanova", "ssden", "ssden1", "sscden", "sscden1", "sshzd", "sshzd1", or "ssllrm".

...

Additional arguments. Ignored in project.x.

include

List of model terms to be included in the reduced model space. The partial and offset terms, if present, are to be specified by "partial" and "offset", respectively.

mesh

Flag indicating whether to return evaluations of the projection.

drop1

If TRUE, calculate p<-length(include) projections with include[-i], i=1,...,p.

Details

The entropy KL(fit0,null) can be decomposed as the sum of KL(fit0,fit1) and KL(fit1,null), where fit0 is the fit to be projected, fit1 is the projection in the reduced model space, and null is the constant fit. The ratio KL(fit0,fit1)/KL(fit0,null) serves as a diagnostic of the feasibility of the reduced model.

For regression fits, smoothness safe-guard is used to prevent interpolation, and KL(fit0,fit1)+KL(fit1,null) may not match KL(fit0,null) perfectly.

For mixed-effect models from ssanova and gssanova, the estimated random effects are treated as offset.

Value

The functions return a list consisting of the following elements.

ratio

KL(fit0,fit1)/KL(fit0,null); the smaller the value, the more feasible the reduced model is.

kl

KL(fit0,fit1).

For regression fits, the list also contains the following element.

check

KL(fit0,fit1)/KL(fit0,null)+KL(fit1,null)/KL(fit0,null); a value closer to 1 is preferred.

For density and hazard fits, the list may contain the following optional element.

mesh

The evaluations of the projection.

Note

project.ssden1, project.sscden1, and project.sshzd1 calculates square error projections.

References

Gu, C. (2004), Model diagnostics for smoothing spline ANOVA models. The Canadian Journal of Statistics, 32, 347–358.

Numerical Engine for ssanova and gssanova

Description

Perform numerical calculations for the ssanova and gssanova suites.

Usage

sspreg1(s, r, q, y, wt, method, alpha, varht, random)
mspreg1(s, r, id.basis, y, wt, method, alpha, varht, random, skip.iter)
ngreg1(family, s, r, id.basis, y, wt, offset, method, varht, alpha, nu, random, skip.iter)

sspreg91(s, r, q, y, cov, method, alpha, varht)
mspreg91(s, r, id.basis, y, cov, method, alpha, varht, skip.iter)

sspngreg(family, s, r, q, y, wt, offset, alpha, nu, random)
mspngreg(family, s, r, id.basis, y, wt, offset, alpha, nu, random, skip.iter)
ngreg(dc, family, sr, q, y, wt, offset, nu, alpha)

regaux(s, r, q, nlambda, fit)

ngreg.proj(dc, family, sr, q, y0, wt, offset, nu)

Arguments

family

Description of the error distribution. Supported are exponential families "binomial", "poisson", "Gamma", and "nbinomial". Also supported are accelerated life model families "weibull", "lognorm", and "loglogis".

s

Unpenalized terms evaluated at data points.

r

Basis of penalized terms evaluated at data points.

q

Penalty matrix.

id.basis

Index of observations to be used as "knots."

y

Response vector.

wt

Model weights.

cov

Input for covariance function for correlated data.

offset

Model offset.

method

"v" for GCV, "m" for GML, or "u" for Mallows' CL.

alpha

Parameter modifying GCV or Mallows' CL scores for smoothing parameter selection.

nu

Optional argument for future support of nbinomial, weibull, lognorm, and loglogis families.

varht

External variance estimate needed for method="u".

random

Input for parametric random effects in nonparametric mixed-effect models.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration.

nlambda

Smoothing parameter in effect.

fit

Fitted model.

dc

Coefficients of fits.

sr

cbind(s,r).

y0

Components of the fit to be projected.

Details

sspreg1 is used by ssanova to compute regression estimates with a single smoothing parameter. mspreg1 is used by ssanova to compute regression estimates with multiple smoothing parameters.

ssngpreg is used by gssanova to compute non-Gaussian regression estimates with a single smoothing parameter. mspngreg is used by gssanova to compute non-Gaussian regression estimates with multiple smoothing parameters. ngreg is used by ssngpreg and mspngreg to perform Newton iteration with fixed smoothing parameters and to calculate cross-validation scores on return.

regaux is used by sspreg1, mspreg1, ssngpreg, and mspngreg to obtain auxiliary information needed by predict.ssanova for standard error calculation.

ngreg.proj is used by project.gssanova to calculate the Kullback-Leibler projection for non-Gaussian regression.

References

Gu, C. (1992), Cross validating non Gaussian data. Journal of Computational and Graphical Statistics, 1, 169–179.

Kim, Y.-J. and Gu, C. (2004), Smoothing spline Gaussian regression: more scalable computation via efficient approximation. Journal of the Royal Statistical Society, Ser. B, 66, 337–356.

Interface to RKPACK

Description

Call RKPACK routines for numerical calculations concerning the ssanova0 and gssanova0 suites.

Usage

sspreg0(s, q, y, method="v", varht=1)
mspreg0(s, q, y, method="v", varht=1, prec=1e-7, maxiter=30)
sspregpoi(family, s, q, y, wt, offset, method="u", varht=1, nu, prec=1e-7, maxiter=30)
mspregpoi(family, s, q, y, wt, offset, method="u", varht=1, nu, prec=1e-7, maxiter=30)
getcrdr(obj, r)
getsms(obj)

Arguments

s

Design matrix of unpenalized terms.

q

Penalty matrices of penalized terms.

y

Model response.

method

Method for smoothing parameter selection.

varht

Assumed dispersion parameter, needed only for method="u".

prec

Precision requirement for iterations.

maxiter

Maximum number of iterations allowed.

family

Error family.

wt

Model weights.

offset

Model offset.

obj

Object returned from a call to sspreg, mspreg, sspregpoi, or mspregpoi.

nu

Optional argument for nbinomial, weibull, lognorm, and loglogis families.

r

Inputs for standard error calculation.

Details

sspreg0 is used by ssanova0 to fit Gaussian models with a single smoothing parameter. mspreg0 is used to fit Gaussian models with multiple smoothing parameters.

sspregpoi is used by gssanova0 to fit non Gaussian models with a single smoothing parameter. mspregpoi is used to fit non Gaussian models with multiple smoothing parameters.

getcrdr and getsms are used by predict.ssanova0 to calculate standard errors of the fitted terms.

References

Gu, C. (1989), RKPACK and its applications: Fitting smoothing spline models. In ASA Proceedings of Statistical Computing Section, pp. 42–51.

Gu, C. (1992), Cross validating non Gaussian data. Journal of Computational and Graphical Statistics, 1, 169–179.

Generating Smolyak Cubature

Description

Generate delayed Smolyak cubatures using C routines modified from smolyak.c found in Knut Petras' SMOLPACK.

Usage

smolyak.quad(d, k)

smolyak.size(d, k)

Arguments

d

Dimension of unit cube.

k

Depth of algorithm.

Value

smolyak.quad returns a list object with the following elements.

pt

Quadrature nodes in rows of matrix.

wt

Quadrature weights.

smolyak.size returns an integer.

Fitting Smoothing Spline ANOVA Models

Description

Fit smoothing spline ANOVA models in Gaussian regression. The symbolic model specification via formula follows the same rules as in lm.

Usage

ssanova(formula, type=NULL, data=list(), weights, subset, offset,
        na.action=na.omit, partial=NULL, method="v", alpha=1.4,
        varht=1, id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL,
        skip.iter=FALSE)

Arguments

formula

Symbolic description of the model to be fit.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

weights

Optional vector of weights to be used in the fitting process.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset

Optional offset term with known parameter 1.

na.action

Function which indicates what should happen when the data contain NAs.

partial

Optional symbolic description of parametric terms in partial spline models.

method

Method for smoothing parameter selection. Supported are method="v" for GCV, method="m" for GML (REML), and method="u" for Mallows' CL.

alpha

Parameter modifying GCV or Mallows' CL; larger absolute values yield smoother fits; negative value invokes a stable and more accurate GCV/CL evaluation algorithm but may take two to five times as long. Ignored when method="m" are specified.

varht

External variance estimate needed for method="u". Ignored when method="v" or method="m" are specified.

id.basis

Index designating selected "knots".

nbasis

Number of "knots" to be selected. Ignored when id.basis is supplied.

seed

Seed to be used for the random generation of "knots". Ignored when id.basis is supplied.

random

Input for parametric random effects in nonparametric mixed-effect models. See mkran for details.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See notes on skipping theta iteration.

Details

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form

y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

with the terms denoted by "1", "x1", "x2", and "x1:x2".

Using q "knots," ssanova calculates an approximate solution to the penalized least squares problem using algorithms of the order O(nq^{2}), which for q<<n scale better than the O(n^{3}) algorithms of ssanova0. For the exact solution, one may set q=n in ssanova, but ssanova0 would be much faster.

Value

ssanova returns a list object of class "ssanova".

The method summary.ssanova can be used to obtain summaries of the fits. The method predict.ssanova can be used to evaluate the fits at arbitrary points along with standard errors. The method project.ssanova can be used to calculate the Kullback-Leibler projection for model selection. The methods residuals.ssanova and fitted.ssanova extract the respective traits from the fits.

Skipping Theta Iteration

For the selection of multiple smoothing parameters, nlm is used to minimize the selection criterion such as the GCV score. When the number of smoothing parameters is large, the process can be time-consuming due to the great amount of function evaluations involved.

The starting values for the nlm iteration are obtained using Algorith 3.2 in Gu and Wahba (1991). These starting values usually yield good estimates themselves, leaving the subsequent quasi-Newton iteration to pick up the "last 10%" performance with extra effort many times of the initial one. Thus, it is often a good idea to skip the iteration by specifying skip.iter=TRUE, especially in high-dimensions and/or with multi-way interactions.

skip.iter=TRUE could be made the default in future releases.

Note

To use GCV and Mallows' CL unmodified, set alpha=1.

For simpler models and moderate sample sizes, the exact solution of ssanova0 can be faster.

The results may vary from run to run. For consistency, specify id.basis or set seed.

In gss versions earlier than 1.0, ssanova was under the name ssanova1.

References

Wahba, G. (1990), Spline Models for Observational Data. Philadelphia: SIAM.

Gu, C. and Wahba, G. (1991), Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM Journal on Scientific and Statistical Computing, 12, 383–398.

Kim, Y.-J. and Gu, C. (2004), Smoothing spline Gaussian regression: more scalable computation via efficient approximation. Journal of the Royal Statistical Society, Ser. B, 66, 337–356.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

## Fit a cubic spline
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
cubic.fit <- ssanova(y~x)
## Obtain estimates and standard errors on a grid
new <- data.frame(x=seq(min(x),max(x),len=50))
est <- predict(cubic.fit,new,se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y,col=1); lines(new$x,est$fit,col=2)
lines(new$x,est$fit+1.96*est$se,col=3)
lines(new$x,est$fit-1.96*est$se,col=3)
## Clean up
## Not run: rm(x,y,cubic.fit,new,est)
dev.off()
## End(Not run)

## Fit a tensor product cubic spline
data(nox)
nox.fit <- ssanova(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and nominal marginals
nox$comp<-as.factor(nox$comp)
nox.fit.n <- ssanova(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and ordinal marginals
nox$comp<-as.ordered(nox$comp)
nox.fit.o <- ssanova(log10(nox)~comp*equi,data=nox)
## Clean up
## Not run: rm(nox,nox.fit,nox.fit.n,nox.fit.o)

Fitting Smoothing Spline ANOVA Models

Description

Fit smoothing spline ANOVA models in Gaussian regression. The symbolic model specification via formula follows the same rules as in lm.

Usage

ssanova0(formula, type=NULL, data=list(), weights, subset,
         offset, na.action=na.omit, partial=NULL, method="v",
         varht=1, prec=1e-7, maxiter=30)

Arguments

formula

Symbolic description of the model to be fit.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

weights

Optional vector of weights to be used in the fitting process.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset

Optional offset term with known parameter 1.

na.action

Function which indicates what should happen when the data contain NAs.

partial

Optional symbolic description of parametric terms in partial spline models.

method

Method for smoothing parameter selection. Supported are method="v" for GCV, method="m" for GML (REML), and method="u" for Mallow's CL.

varht

External variance estimate needed for method="u". Ignored when method="v" or method="m" are specified.

prec

Precision requirement in the iteration for multiple smoothing parameter selection. Ignored when only one smoothing parameter is involved.

maxiter

Maximum number of iterations allowed for multiple smoothing parameter selection. Ignored when only one smoothing parameter is involved.

Details

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form

y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

with the terms denoted by "1", "x1", "x2", and "x1:x2".

ssanova0 and the affiliated methods provide a front end to RKPACK, a collection of RATFOR routines for nonparametric regression via the penalized least squares. The algorithms implemented in RKPACK are of the order O(n^{3}).

Value

ssanova0 returns a list object of class c("ssanova0","ssanova").

The method summary.ssanova0 can be used to obtain summaries of the fits. The method predict.ssanova0 can be used to evaluate the fits at arbitrary points along with standard errors. The methods residuals.ssanova and fitted.ssanova extract the respective traits from the fits.

Note

For complex models and large sample sizes, the approximate solution of ssanova can be faster.

The method project is not implemented for ssanova0, nor is the mixed-effect model support through mkran.

In gss versions earlier than 1.0, ssanova0 was under the name ssanova.

References

Wahba, G. (1990), Spline Models for Observational Data. Philadelphia: SIAM.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

## Fit a cubic spline
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
cubic.fit <- ssanova0(y~x,method="m")
## Obtain estimates and standard errors on a grid
new <- data.frame(x=seq(min(x),max(x),len=50))
est <- predict(cubic.fit,new,se=TRUE)
## Plot the fit and the Bayesian confidence intervals
plot(x,y,col=1); lines(new$x,est$fit,col=2)
lines(new$x,est$fit+1.96*est$se,col=3)
lines(new$x,est$fit-1.96*est$se,col=3)
## Clean up
## Not run: rm(x,y,cubic.fit,new,est)
dev.off()
## End(Not run)

## Fit a tensor product cubic spline
data(nox)
nox.fit <- ssanova0(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and nominal marginals
nox$comp<-as.factor(nox$comp)
nox.fit.n <- ssanova0(log10(nox)~comp*equi,data=nox)
## Fit a spline with cubic and ordinal marginals
nox$comp<-as.ordered(nox$comp)
nox.fit.o <- ssanova0(log10(nox)~comp*equi,data=nox)
## Clean up
## Not run: rm(nox,nox.fit,nox.fit.n,nox.fit.o)

Fitting Smoothing Spline ANOVA Models with Correlated Data

Description

Fit smoothing spline ANOVA models with correlated Gaussian data. The symbolic model specification via formula follows the same rules as in lm.

Usage

ssanova9(formula, type=NULL, data=list(), subset, offset,
         na.action=na.omit, partial=NULL, method="v", alpha=1.4,
         varht=1, id.basis=NULL, nbasis=NULL, seed=NULL, cov,
         skip.iter=FALSE)

para.arma(fit)

Arguments

formula

Symbolic description of the model to be fit.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset

Optional offset term with known parameter 1.

na.action

Function which indicates what should happen when the data contain NAs.

partial

Optional symbolic description of parametric terms in partial spline models.

method

Method for smoothing parameter selection. Supported are method="v" for V, method="m" for M, and method="u" for U; see the reference for definitions of U, V, and M.

alpha

Parameter modifying V or U; larger absolute values yield smoother fits. Ignored when method="m" are specified.

varht

External variance estimate needed for method="u". Ignored when method="v" or method="m" are specified.

id.basis

Index designating selected "knots".

nbasis

Number of "knots" to be selected. Ignored when id.basis is supplied.

seed

Seed to be used for the random generation of "knots". Ignored when id.basis is supplied.

cov

Input for covariance functions. See mkcov for details.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See notes on skipping theta iteration.

fit

ssanova9 fit with ARMA error.

Details

The model specification via formula is intuitive. For example, y~x1*x2 yields a model of the form

y = C + f_{1}(x1) + f_{2}(x2) + f_{12}(x1,x2) + e

with the terms denoted by "1", "x1", "x2", and "x1:x2".

Value

ssanova9 returns a list object of class c("ssanova9","ssanova").

The method summary.ssanova9 can be used to obtain summaries of the fits. The method predict.ssanova can be used to evaluate the fits at arbitrary points along with standard errors. The method project.ssanova9 can be used to calculate the Kullback-Leibler projection for model selection. The methods residuals.ssanova and fitted.ssanova extract the respective traits from the fits.

para.arma returns the fitted ARMA coefficients for cov=list("arma",c(p,q)) in the call to ssanova9.

Skipping Theta Iteration

skip.iter=TRUE could be made the default in future releases.

Note

The results may vary from run to run. For consistency, specify id.basis or set seed.

References

Han, C. and Gu, C. (2008), Optimal smoothing with correlated data, Sankhya, 70-A, 38–72.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

x <- runif(100); y <- 5 + 3*sin(2*pi*x) + rnorm(x)
## independent fit
fit <- ssanova9(y~x,cov=list("known",diag(1,100)))
## AR(1) fit
fit <- ssanova9(y~x,cov=list("arma",c(1,0)))
para.arma(fit)
## MA(1) fit
e <- rnorm(101); e <- e[-1]-.5*e[-101]
x <- runif(100); y <- 5 + 3*sin(2*pi*x) + e
fit <- ssanova9(y~x,cov=list("arma",c(0,1)))
para.arma(fit)
## Clean up
## Not run: rm(x,y,e,fit)

Estimating Conditional Probability Density Using Smoothing Splines

Description

Estimate conditional probability densities using smoothing spline ANOVA models. The symbolic model specification via formula follows the same rules as in lm.

Usage

sscden(formula, response, type=NULL, data=list(), weights, subset,
       na.action=na.omit, alpha=1.4, id.basis=NULL, nbasis=NULL,
       seed=NULL, ydomain=as.list(NULL), yquad=NULL, prec=1e-7,
       maxiter=30, skip.iter=FALSE)

sscden1(formula, response, type=NULL, data=list(), weights, subset,
        na.action=na.omit, alpha=1.4, id.basis=NULL, nbasis=NULL,
        seed=NULL, rho=list("xy"), ydomain=as.list(NULL), yquad=NULL,
        prec=1e-7, maxiter=30, skip.iter=FALSE)

Arguments

formula

Symbolic description of the model to be fit.

response

Formula listing response variables.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

weights

Optional vector of counts for duplicated data.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

na.action

Function which indicates what should happen when the data contain NAs.

alpha

Parameter defining cross-validation scores for smoothing parameter selection.

id.basis

Index of observations to be used as "knots."

nbasis

Number of "knots" to be used. Ignored when id.basis is specified.

seed

Seed to be used for the random generation of "knots." Ignored when id.basis is specified.

ydomain

Data frame specifying marginal support of conditional density.

yquad

Quadrature for calculating integral on Y domain. Mandatory if response variables other than factors or numerical vectors are involved.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

rho

rho function needed for sscden1.

Details

The model is specified via formula and response, where response lists the response variables. For example, sscden(~y*x,~y) prescribe a model of the form

log f(y|x) = g_{y}(y) + g_{xy}(x,y) + C(x)

with the terms denoted by "y", "y:x"; the term(s) not involving response(s) are removed and the constant C(x) is determined by the fact that a conditional density integrates to one on the y axis. sscden1 does keep terms not involving response(s) during estimation, although those terms cancel out when one evaluates the estimated conditional density.

Value

sscden returns a list object of class "sscden". sscden1 returns a list object of class c("sscden1","sscden").

dsscden and cdsscden can be used to evaluate the estimated conditional density f(y|x) and f(y1|x,y2); psscden, qsscden, cpsscden, and cqsscden can be used to evaluate conditional cdf and quantiles. The methods project.sscden or project.sscden1 can be used to calculate the Kullback-Leibler or square-error projections for model selection.

Note

Default quadrature on the Y domain will be constructed for numerical vectors on a hyper cube, then outer product with factor levels will be taken if factors are involved. The sides of the hyper cube are specified by ydomain; for ydomain$y missing, the default is c(min(y),max(y))+c(-1,1)*(max(y)-mimn(y))*.05.

On a 1-D interval, the quadrature is the 200-point Gauss-Legendre formula returned from gauss.quad. For multiple numerical vectors, delayed Smolyak cubatures from smolyak.quad are used on cubes with the marginals properly transformed; see Gu and Wang (2003) for the marginal transformations.

The results may vary from run to run. For consistency, specify id.basis or set seed.

For reasonable execution time in high dimensions, set skip.iter=TRUE.

References

Gu, C. (1995), Smoothing spline density estimation: Conditional distribution. Statistica Sinica, 5, 709–726. Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

data(penny); set.seed(5732)
fit <- sscden1(~year*mil,~mil,data=penny,
              ydomain=data.frame(mil=c(49,61)))
yy <- 1944+(0:92)/2
quan <- qsscden(fit,c(.05,.25,.5,.75,.95),
                data.frame(year=yy))
plot(penny$year+.1*rnorm(90),penny$mil,ylim=c(49,61))
for (i in 1:5) lines(yy,quan[i,])
## Clean up
## Not run: rm(penny,yy,quan)

Composition Estimation

Description

Estimate composition using multinomial counts.

Usage

sscomp(x,wt=rep(1,length(x)),alpha=1.4)

sscomp2(x,alpha=1.4)

Arguments

x

Numerical vector or matrix of multinomial counts.

wt

Numerical vector of integration weights.

alpha

Parameter defining cross-validation score for smoothing parameter selection.

Details

sscomp takes a vector x to estimate composition using density estimation on a nominal discrete domain; zero counts must be included in x to specify the domain. wt mimicking the shape of the unknown density could improve performance.

sscomp2 takes a matrix x, collapses columns to estimate a density using sscomp, then using that as wt in further sscomp calls to estimate composition for each column.

Value

sscomp returns a column of estimated probabilities.

sscomp2 returns a matrix of estimated probabilities, matching the input x in dimensions.

References

Gu, C. (2020), Composition estimation via shrinkage. manuscript.

Estimating Copula Density Using Smoothing Splines

Description

Estimate copula densities using tensor-product cubic splines.

Usage

sscopu(x, symmetry=FALSE, alpha=1.4, order=NULL, exclude=NULL,
       weights=NULL, id.basis=NULL, nbasis=NULL, seed=NULL,
       qdsz.depth=NULL, prec=1e-7, maxiter=30, skip.iter=dim(x)[2]!=2)

sscopu2(x, censoring=NULL, truncation=NULL, symmetry=FALSE, alpha=1.4,
        weights=NULL, id.basis=NULL, nbasis=NULL, seed=NULL, prec=1e-7,
        maxiter=30)

Arguments

x

Matrix of observations on unit cubes.

symmetry

Flag indicating whether to enforce symmetry, or invariance under coordinate permutation.

order

Highest order of interaction terms in log density. When NULL, it is set to dim(x)[2] internally.

exclude

Pair(s) of marginals whose interactions to be excluded in log density.

alpha

Parameter defining cross-validation score for smoothing parameter selection.

weights

Optional vector of bin-counts for histogram data.

id.basis

Index of observations to be used as "knots."

nbasis

Number of "knots" to be used. Ignored when id.basis is specified.

seed

Seed to be used for the random generation of "knots." Ignored when id.basis is specified.

qdsz.depth

Depth to be used in smolyak.quad for the generation of quadrature.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

censoring

Optional censoring indicator.

truncation

Optional truncation points.

Details

sscopu is essentially ssden applied to observations on unit cubes. Instead of variables in data frames, the data are entered as a numerical matrix, and model complexity is globally controlled by the highest order of interactions allowed in log density.

sscopu2 further restricts the domain to the unit square, but allows for possible censoring and truncation. With censoring==0,1,2,3, a data point (x1,x2) represents exact observation, [0,x1]x{x2}, {x1}x[0,x2], or [0,x1]x[0,x2]. With truncation point (t1,t2), the sample is taken from [0,t1]x[0,t2] instead of the unit square.

With symmetriy=TRUE, one may enforce the interchangeability of coordinates so that f(x1,x2)=f(x2,x1), say.

When (1,2) is a row in exclude, interaction terms involving coordinates 1 and 2 are excluded.

Value

sscopu and sscopu2 return a list object of class "sscopu". dsscopu can be used to evaluate the estimated copula density. A "copularization" process is applied to the estimated density by default so the resulting marginal densities are guaranteed to be uniform.

cdsscopu, cpsscopu, and cqsscopu can be used to evaluate 1-D conditional pdf, cdf, and quantiles.

Note

For reasonable execution time in higher dimensions, set skip.iter=TRUE in calls to sscopu.

When "Newton iteration diverges" in sscopu, try to use a larger qdsz.depth; the default values for dimensions 2, 3, 4, 5, 6+ are 24, 14, 12, 11, 10. To be sure a larger qdsz.depth indeed makes difference, verify the cubature size using smolyak.size.

The results may vary from run to run. For consistency, specify id.basis or set seed.

Author(s)

Chong Gu, chong@stat.purdue.edu

References

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2015), Hazard estimation with bivariate survival data and copula density estimation. Journal of Computational and Graphical Statistics, 24, 1053-1073.

Examples

## simulate 2-D data
x <- matrix(runif(200),100,2)
## fit copula density
fit <- sscopu(x)
## "same fit"
fit2 <- sscopu2(x,id=fit$id)
## symmetric fit
fit.s <- sscopu(x,sym=TRUE,id=fit$id)
## Not run: 
## Kendall's tau and Spearman's rho
summary(fit); summary(fit2); summary(fit.s)
## clean up
rm(x,fit,fit2,fit.s)

## End(Not run)

Estimating Relative Risk Using Smoothing Splines

Description

Estimate relative risk using smoothing spline ANOVA models. The symbolic model specification via formula follows the same rules as in lm, but with the response of a special form.

Usage

sscox(formula, type=NULL, data=list(), weights=NULL, subset,
      na.action=na.omit, partial=NULL, alpha=1.4, id.basis=NULL,
      nbasis=NULL, seed=NULL, random=NULL, prec=1e-7, maxiter=30,
      skip.iter=FALSE)

Arguments

formula

Symbolic description of the model to be fit, where the response is of the form Surv(futime,status,start=0).

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

weights

Optional vector of counts for duplicated data.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

na.action

Function which indicates what should happen when the data contain NAs.

partial

Optional symbolic description of parametric terms in partial spline models.

alpha

Parameter defining cross-validation score for smoothing parameter selection.

id.basis

Index of observations to be used as "knots."

nbasis

Number of "knots" to be used. Ignored when id.basis is specified.

seed

Seed to be used for the random generation of "knots." Ignored when id.basis is specified.

random

Input for parametric random effects (frailty) in nonparametric mixed-effect models. See mkran for details.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

Details

A proportional hazard model is assumed, and the relative risk is estimated via penalized partial likelihood. The model specification via formula is for the log relative risk. For example, Suve(t,d)~u*v prescribes a model of the form

log f(u,v) = g_{u}(u) + g_{v}(v) + g_{u,v}(u,v)

with the terms denoted by "u", "v", and "u:v"; relative risk is defined only up to a multiplicative constant, so the constant term is not included in the model.

sscox takes standard right-censored lifetime data, with possible left-truncation and covariates; in Surv(futime,status,start=0)~..., futime is the follow-up time, status is the censoring indicator, and start is the optional left-truncation time.

Parallel to those in a ssanova object, the model terms are sums of unpenalized and penalized terms. Attached to every penalized term there is a smoothing parameter, and the model complexity is largely determined by the number of smoothing parameters.

The selection of smoothing parameters is through a cross-validation mechanism designed for density estimation under biased sampling, with a fudge factor alpha; alpha=1 is "unbiased" for the minimization of Kullback-Leibler loss but may yield severe undersmoothing, whereas larger alpha yields smoother estimates.

Value

sscox returns a list object of class "sscox".

The method predict.sscox can be used to evaluate the fits at arbitrary points along with standard errors. The method project.sscox can be used to calculate the Kullback-Leibler projection for model selection.

Note

The function Surv(futime,status,start=0) is defined and parsed inside sscox, not quite the same as the one in the survival package. The estimation is invariant of monotone transformations of time.

The results may vary from run to run. For consistency, specify id.basis or set seed.

References

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

## Relative Risk
data(stan)
fit.rr <- sscox(Surv(futime,status)~age,data=stan)
est.rr <- predict(fit.rr,data.frame(age=c(35,40)),se=TRUE)
## Base Hazard
risk <- predict(fit.rr,stan)
fit.bh <- sshzd(Surv(futime,status)~futime,data=stan,offset=log(risk))
tt <- seq(0,max(stan$futime),length=51)
est.bh <- hzdcurve.sshzd(fit.bh,tt,se=TRUE)
## Clean up
## Not run: rm(stan,fit.rr,est.rr,risk,fit.bh,tt,est.bh)

Estimating Probability Density Using Smoothing Splines

Description

Estimate probability densities using smoothing spline ANOVA models. The symbolic model specification via formula follows the same rules as in lm, but with the response missing.

Usage

ssden(formula, type=NULL, data=list(), alpha=1.4, weights=NULL,
      subset, na.action=na.omit, id.basis=NULL, nbasis=NULL, seed=NULL,
      domain=as.list(NULL), quad=NULL, qdsz.depth=NULL, bias=NULL,
      prec=1e-7, maxiter=30, skip.iter=FALSE)

ssden1(formula, type=NULL, data=list(), alpha=1.4, weights=NULL,
       subset, na.action=na.omit, id.basis=NULL, nbasis=NULL, seed=NULL,
       domain=as.list(NULL), quad=NULL, prec=1e-7, maxiter=30)

Arguments

formula

Symbolic description of the model to be fit.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

alpha

Parameter defining cross-validation score for smoothing parameter selection.

weights

Optional vector of bin-counts for histogram data.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

na.action

Function which indicates what should happen when the data contain NAs.

id.basis

Index of observations to be used as "knots."

nbasis

Number of "knots" to be used. Ignored when id.basis is specified.

seed

Seed to be used for the random generation of "knots." Ignored when id.basis is specified.

domain

Data frame specifying marginal support of density.

quad

Quadrature for calculating integral. Mandatory if variables other than factors or numerical vectors are involved.

qdsz.depth

Depth to be used in smolyak.quad for the generation of quadrature.

bias

Input for sampling bias.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

Details

The model specification via formula is for the log density. For example, ~x1*x2 prescribes a model of the form

log f(x1,x2) = g_{1}(x1) + g_{2}(x2) + g_{12}(x1,x2) + C

with the terms denoted by "x1", "x2", and "x1:x2"; the constant is determined by the fact that a density integrates to one.

The selective term elimination may characterize (conditional) independence structures between variables. For example, ~x1*x2+x1*x3 yields the conditional independence of x2 and x3 given x1.

The selection of smoothing parameters is through a cross-validation mechanism described in the references, with a parameter alpha; alpha=1 is "unbiased" for the minimization of Kullback-Leibler loss but may yield severe undersmoothing, whereas larger alpha yields smoother estimates.

Value

ssden returns a list object of class "ssden". ssden1 returns a list object of class c("ssden1","ssden").

dssden and cdssden can be used to evaluate the estimated joint density and conditional density; pssden, qssden, cpssden, and cqssden can be used to evaluate (conditional) cdf and quantiles.

The method project.ssden can be used to calculate the Kullback-Leibler projection of "ssden" objects for model selection; project.ssden1 can be used to calculate the square error projection of "ssden1" objects.

Note

In ssden, default quadrature will be constructed for numerical vectors on a hyper cube, then outer product with factor levels will be taken if factors are involved. The sides of the hyper cube are specified by domain; for domain$x missing, the default is c(min(x),max(x))+c(-1,1)*(max(x)-mimn(x))*.05. In 1-D, the quadrature is the 200-point Gauss-Legendre formula returned from gauss.quad. In multi-D, delayed Smolyak cubatures from smolyak.quad are used on cubes with the marginals properly transformed; see Gu and Wang (2003) for the marginal transformations.

For reasonable execution time in higher dimensions, set skip.iter=TRUE in call to ssden.

If you get an error message from ssden stating "Newton iteration diverges", try to use a larger qdsz.depth which will execute slower, or switch to ssden1. The default values of qdsz.depth for dimensions 4, 5, 6+ are 12, 11, 10.

ssden1 does not involve multi-D quadrature but does not perform as well as ssden. It can be used in very high dimensions where ssden is infeasible.

The results may vary from run to run. For consistency, specify id.basis or set seed.

Author(s)

Chong Gu, chong@stat.purdue.edu

References

Gu, C. and Wang, J. (2003), Penalized likelihood density estimation: Direct cross-validation and scalable approximation. Statistica Sinica, 13, 811–826.

Gu, C., Jeon, Y., and Lin, Y. (2013), Nonparametric density estimation in high dimensions. Statistica Sinica, 23, 1131–1153.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

## 1-D estimate: Buffalo snowfall
data(buffalo)
buff.fit <- ssden(~buffalo,domain=data.frame(buffalo=c(0,150)))
plot(xx<-seq(0,150,len=101),dssden(buff.fit,xx),type="l")
plot(xx,pssden(buff.fit,xx),type="l")
plot(qq<-seq(0,1,len=51),qssden(buff.fit,qq),type="l")
## Clean up
## Not run: rm(buffalo,buff.fit,xx,qq)
dev.off()
## End(Not run)

## 2-D with triangular domain: AIDS incubation
data(aids)
## rectangular quadrature
quad.pt <- expand.grid(incu=((1:40)-.5)/40*100,infe=((1:40)-.5)/40*100)
quad.pt <- quad.pt[quad.pt$incu<=quad.pt$infe,]
quad.wt <- rep(1,nrow(quad.pt))
quad.wt[quad.pt$incu==quad.pt$infe] <- .5
quad.wt <- quad.wt/sum(quad.wt)*5e3
## additive model (pre-truncation independence)
aids.fit <- ssden(~incu+infe,data=aids,subset=age>=60,
                  domain=data.frame(incu=c(0,100),infe=c(0,100)),
                  quad=list(pt=quad.pt,wt=quad.wt))
## conditional (marginal) density of infe
jk <- cdssden(aids.fit,xx<-seq(0,100,len=51),data.frame(incu=50))
plot(xx,jk$pdf,type="l")
## conditional (marginal) quantiles of infe (TIME-CONSUMING)
## Not run: 
cqssden(aids.fit,c(.05,.25,.5,.75,.95),data.frame(incu=50))

## End(Not run)
## Clean up
## Not run: rm(aids,quad.pt,quad.wt,aids.fit,jk,xx)
dev.off()
## End(Not run)

## One factor plus one vector
data(gastric)
gastric$trt
fit <- ssden(~futime*trt,data=gastric)
## conditional density
cdssden(fit,c("1","2"),cond=data.frame(futime=150))
## conditional quantiles
cqssden(fit,c(.05,.25,.5,.75,.95),data.frame(trt=as.factor("1")))
## Clean up
## Not run: rm(gastric,fit)

## Sampling bias
## (X,T) is truncated to T<X<1 for T~U(0,1), so X is length-biased
rbias <- function(n) {
  t <- runif(n)
  x <- rnorm(n,.5,.15)
  ok <- (x>t)&(x<1)
  while(m<-sum(!ok)) {
    t[!ok] <- runif(m)
    x[!ok] <- rnorm(m,.5,.15)
    ok <- (x>t)&(x<1)
  }
  cbind(x,t)
}
xt <- rbias(100)
x <- xt[,1]; t <- xt[,2]
## length-biased
bias1 <- list(t=1,wt=1,fun=function(t,x){x[,]})
fit1 <- ssden(~x,domain=list(x=c(0,1)),bias=bias1)
plot(xx<-seq(0,1,len=101),dssden(fit1,xx),type="l")
## truncated
bias2 <- list(t=t,wt=rep(1/100,100),fun=function(t,x){x[,]>t})
fit2 <- ssden(~x,domain=list(x=c(0,1)),bias=bias2)
plot(xx,dssden(fit2,xx),type="l")
## Clean up
## Not run: rm(rbias,xt,x,t,bias1,fit1,bias2,fit2)

Estimating Hazard Function Using Smoothing Splines

Description

Estimate hazard function using smoothing spline ANOVA models. The symbolic model specification via formula follows the same rules as in lm, but with the response of a special form.

Usage

sshzd(formula, type=NULL, data=list(), alpha=1.4, weights=NULL,
      subset, offset, na.action=na.omit, partial=NULL, id.basis=NULL,
      nbasis=NULL, seed=NULL, random=NULL, prec=1e-7, maxiter=30,
      skip.iter=FALSE)

sshzd1(formula, type=NULL, data=list(), alpha=1.4, weights=NULL,
       subset, na.action=na.omit, rho="marginal", partial=NULL,
       id.basis=NULL, nbasis=NULL, seed=NULL, random=NULL, prec=1e-7,
       maxiter=30, skip.iter=FALSE)

Arguments

formula

Symbolic description of the model to be fit, where the response is of the form Surv(futime,status,start=0).

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

alpha

Parameter defining cross-validation score for smoothing parameter selection.

weights

Optional vector of counts for duplicated data.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

offset

Optional offset term with known parameter 1.

na.action

Function which indicates what should happen when the data contain NAs.

partial

Optional symbolic description of parametric terms in partial spline models.

id.basis

Index of observations to be used as "knots."

nbasis

Number of "knots" to be used. Ignored when id.basis is specified.

seed

Seed to be used for the random generation of "knots." Ignored when id.basis is specified.

random

Input for parametric random effects (frailty) in nonparametric mixed-effect models. See mkran for details.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

rho

Choice of rho function for sshzd1: "marginal" or "weibull".

Details

The model specification via formula is for the log hazard. For example, Suve(t,d)~t*u prescribes a model of the form

log f(t,u) = C + g_{t}(t) + g_{u}(u) + g_{t,u}(t,u)

with the terms denoted by "1", "t", "u", and "t:u". Replacing t*u by t+u in the formula, one gets a proportional hazard model with g_{t,u}=0.

sshzd takes standard right-censored lifetime data, with possible left-truncation and covariates; in Surv(futime,status,start=0)~..., futime is the follow-up time, status is the censoring indicator, and start is the optional left-truncation time. The main effect of futime must appear in the model terms specified via ....

The selection of smoothing parameters is through a cross-validation mechanism described in Gu (2002, Sec. 7.2), with a parameter alpha; alpha=1 is "unbiased" for the minimization of Kullback-Leibler loss but may yield severe undersmoothing, whereas larger alpha yields smoother estimates.

Value

sshzd returns a list object of class "sshzd". sshzd1 returns a list object of class c("sshzd1","sshzd").

hzdrate.sshzd can be used to evaluate the estimated hazard function. hzdcurve.sshzd can be used to evaluate hazard curves with fixed covariates. survexp.sshzd can be used to calculated estimated expected survival.

The method project.sshzd can be used to calculate the Kullback-Leibler projection of "sshzd" objects for model selection; project.sshzd1 can be used to calculate the square error projection of "sshzd1" objects.

Note

The function Surv(futime,status,start=0) is defined and parsed inside sshzd, not quite the same as the one in the survival package.

Integration on the time axis is done by the 200-point Gauss-Legendre formula on c(min(start),max(futime)), returned from gauss.quad.

sshzd1 can be up to 50 times faster than sshzd, at the cost of performance degradation.

The results may vary from run to run. For consistency, specify id.basis or set seed.

References

Du, P. and Gu, C. (2006), Penalized likelihood hazard estimation: efficient approximation and Bayesian confidence intervals. Statistics and Probability Letters, 76, 244–254.

Du, P. and Gu, C. (2009), Penalized Pseudo-Likelihood Hazard Estimation: A Fast Alternative to Penalized Likelihood. Journal of Statistical Planning and Inference, 139, 891–899.

Du, P. and Ma, S. (2010), Frailty Model with Spline Estimated Nonparametric Hazard Function, Statistica Sinica, 20, 561–580.

Gu, C. (2013), Smoothing Spline ANOVA Models (2nd Ed). New York: Springer-Verlag.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

## Model with interaction
data(gastric)
gastric.fit <- sshzd(Surv(futime,status)~futime*trt,data=gastric)
## exp(-Lambda(600)), exp(-(Lambda(1200)-Lambda(600))), and exp(-Lambda(1200))
survexp.sshzd(gastric.fit,c(600,1200,1200),data.frame(trt=as.factor(1)),c(0,600,0))
## Clean up
## Not run: rm(gastric,gastric.fit)
dev.off()
## End(Not run)

## THE FOLLOWING EXAMPLE IS TIME-CONSUMING
## Proportional hazard model
## Not run: 
data(stan)
stan.fit <- sshzd(Surv(futime,status)~futime+age,data=stan)
## Evaluate fitted hazard
hzdrate.sshzd(stan.fit,data.frame(futime=c(10,20),age=c(20,30)))
## Plot lambda(t,age=20)
tt <- seq(0,60,leng=101)
hh <- hzdcurve.sshzd(stan.fit,tt,data.frame(age=20))
plot(tt,hh,type="l")
## Clean up
rm(stan,stan.fit,tt,hh)
dev.off()

## End(Not run)

Estimating 2-D Hazard Function Using Smoothing Splines

Description

Estimate 2-D hazard function using smoothing spline ANOVA models.

Usage

sshzd2d(formula1, formula2, symmetry=FALSE, data, alpha=1.4,
        weights=NULL, subset=NULL, id.basis=NULL, nbasis=NULL, seed=NULL,
        prec=1e-7, maxiter=30, skip.iter=FALSE)

sshzd2d1(formula1, formula2, symmetry=FALSE, data, alpha=1.4,
         weights=NULL, subset=NULL, rho="marginal",
         id.basis=NULL, nbasis=NULL, seed=NULL, prec=1e-7, maxiter=30,
         skip.iter=FALSE)

Arguments

formula1

Description of the hazard model to be fit on the first axis.

formula2

Description of the hazard model to be fit on the second axis.

symmetry

Flag indicating whether to enforce symmetry of the two axes.

data

Data frame containing the variables in the model.

alpha

Parameter defining cross-validation scores for smoothing parameter selection.

weights

Optional vector of counts for duplicated data.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

id.basis

Index of observations to be used as "knots."

nbasis

Number of "knots" to be used. Ignored when id.basis is specified.

seed

Seed to be used for the random generation of "knots." Ignored when id.basis is specified.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration in marginal hazard estimation.

rho

Choice of rho function for sshzd2d1: "marginal" or "weibull".

Details

The 2-D survival function is expressed as S(t1,t2)=C(S1(t1),S2(t2)), where S1(t1), S2(t2) are marginal survival functions and C(u1,u2) is a 2-D copula. The marginal survival functions are estimated via the marginal hazards as in sshzd, and the copula is estimated nonparametrically by calling sscopu2.

When symmetry=TRUE, a common marginal survial function S1(t)=S2(t) is estimated, and a symmetric copula is estimated such that C(u1,u2)=C(u2,u1).

Covariates can be incorporated in the marginal hazard models as in sshzd, including parametric terms via partial and frailty terms via random. Arguments formula1 and formula2 are typically model formulas of the same form as the argument formula in sshzd, but when partial or random are needed, formula1 and formula2 should be lists with model formulas as the first elements and partial/random as named elements; when necessary, variable configurations (that are done via argument type in sshzd) should also be entered as named elements of lists formula1/formula2.

When symmetry=TRUE, parallel model formulas must be consistent of each other, such as

formula1=list(Surv(t1,d1)~t1*u1,partial=~z1,random=~1|id1)

formula2=list(Surv(t2,d2)~t2*u2,partial=~z2,random=~1|id2)

where pairs t1-t2, d2-d2 respectively are different elements in data, pairs u1-u2, z1-z2 respectively may or may not be different elements in data, and factors id1 and id2 are typically the same but at least should have the same levels.

Value

sshzd2d and sshzd2d1 return a list object of class "sshzd2d".

hzdrate.sshzd2d can be used to evaluate the estimated 2-D hazard function. survexp.sshzd2d can be used to calculate estimated survival functions.

Note

sshzd2d1 executes faster than sshzd2d, but often at the cost of performance degradation.

The results may vary from run to run. For consistency, specify id.basis or set seed.

Author(s)

Chong Gu, chong@stat.purdue.edu

References

Gu, C. (2015), Hazard estimation with bivariate survival data and copula density estimation. Journal of Computational and Graphical Statistics, 24, 1053-1073.

Examples

## THE FOLLOWING EXAMPLE IS TIME-CONSUMING
## Not run: 
data(DiaRet)
## Common proportional hazard model on the margins
fit <- sshzd2d(Surv(time1,status1)~time1+trt1*type,
               Surv(time2,status2)~time2+trt2*type,
               data=DiaRet,symmetry=TRUE)
## Evaluate fitted survival and hazard functions
time <- cbind(c(50,70),c(70,70))
cova <- data.frame(trt1=as.factor(c(1,1)),trt2=as.factor(c(1,0)),
                   type=as.factor(c("juvenile","adult")))
survexp.sshzd2d(fit,time,cov=cova)
hzdrate.sshzd2d(fit,time,cov=cova)
## Association between margins: Kendall's tau and Spearman's rho
summary(fit$copu)
## Clean up
rm(DiaRet,fit,time,cova)
dev.off()

## End(Not run)

Fitting Smoothing Spline Log-Linear Regression Models

Description

Fit smoothing spline log-linear regression models. The symbolic model specification via formula follows the same rules as in lm.

Usage

ssllrm(formula, response, type=NULL, data=list(), weights, subset,
       na.action=na.omit, alpha=1, id.basis=NULL, nbasis=NULL,
       seed=NULL, random=NULL, prec=1e-7, maxiter=30, skip.iter=FALSE)

Arguments

formula

Symbolic description of the model to be fit.

response

Formula listing response variables.

type

List specifying the type of spline for each variable. See mkterm for details.

data

Optional data frame containing the variables in the model.

weights

Optional vector of weights to be used in the fitting process.

subset

Optional vector specifying a subset of observations to be used in the fitting process.

na.action

Function which indicates what should happen when the data contain NAs.

alpha

id.basis

Index designating selected "knots".

nbasis

Number of "knots" to be selected. Ignored when id.basis is supplied.

seed

Seed to be used for the random generation of "knots". Ignored when id.basis is supplied.

random

Input for parametric random effects in nonparametric mixed-effect models. See mkran for details.

prec

Precision requirement for internal iterations.

maxiter

Maximum number of iterations allowed for internal iterations.

skip.iter

Flag indicating whether to use initial values of theta and skip theta iteration. See ssanova for notes on skipping theta iteration.

Details

The model is specified via formula and response, where response lists the response variables. For example, ssllrm(~y1*y2*x,~y1+y2) prescribe a model of the form

log f(y1,y2|x) = g_{1}(y1) + g_{2}(y2) + g_{12}(y1,y2) + g_{x1}(x,y1) + g_{x2}(x,y2) + g_{x12}(x,y1,y2) + C(x)

with the terms denoted by "y1", "y2", "y1:y2", "y1:x", "y2:x", and "y1:y2:x"; the term(s) not involving response(s) are removed and the constant C(x) is determined by the fact that a conditional density integrates (adds) to one on the y axis.

Value

ssllrm returns a list object of class "ssllrm".

The method predict.ssllrm can be used to evaluate f(y|x) at arbitrary x, or contrasts of log{f(y|x)} such as the odds ratio along with standard errors. The method project.ssllrm can be used to calculate the Kullback-Leibler projection for model selection.

Note

The responses, or y-variables, must be factors, and there must be at least one numerical x's. For response, there is no difference between ~y1+y2 and ~y1*y2.

The results may vary from run to run. For consistency, specify id.basis or set seed.

References

Gu, C. and Ma, P. (2011), Nonparametric regression with cross-classified responses. The Canadian Journal of Statistics, 39, 591–609.

Gu, C. (2014), Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.

Examples

## Simulate data
test <- function(x)
        {.3*(1e6*(x^11*(1-x)^6)+1e4*(x^3*(1-x)^10))-2}
x <- (0:100)/100
p <- 1-1/(1+exp(test(x)))
y <- rbinom(x,3,p)
y1 <- as.ordered(y)
y2 <- as.factor(rbinom(x,1,p))
## Fit model
fit <- ssllrm(~y1*y2*x,~y1+y2)

## Evaluate f(y|x)
est <- predict(fit,data.frame(x=x),
               data.frame(y1=as.factor(0:3),y2=as.factor(rep(0,4))))
## f(y|x) at all y values (fit$qd.pt)
est <- predict(fit,data.frame(x=x))

## Evaluate contrast of log f(y|x)
est <- predict(fit,data.frame(x=x),odds=c(-1,.5,.5,0),
               data.frame(y1=as.factor(0:3),y2=as.factor(rep(0,4))),se=TRUE)
## Odds ratio log{f(0,0|x)/f(3,0|x)}
est <- predict(fit,data.frame(x=x),odds=c(1,-1),
               data.frame(y1=as.factor(c(0,3)),y2=as.factor(c(0,1))),se=TRUE)

## KL projection
kl <- project(fit,include=c("y2:x","y1:y2","y1:x","y2:x"))

## Clean up
## Not run: rm(test,x,p,y,y1,y2,fit,est,kl)
dev.off()
## End(Not run)

Stanford Heart Transplant Data

Description

Survival of patients from the Stanford heart transplant program.

Usage

data(stan)

Format

A data frame containing 184 observations on the following variables.

`time`	Follow-up time after transplant, in days.
`status`	Censoring status.
`age`	Age at transplant.
`futime`	Square root of `time`.

Source

Miller, R. G. and Halpern, J. (1982), Regression with censored data. Biometrika, 69, 521–531.

Assessing Smoothing Spline ANOVA Fits with Non-Gaussian Responses

Description

Calculate various summaries of smoothing spline ANOVA fits with non-Gaussian responses.

Usage

## S3 method for class 'gssanova'
summary(object, diagnostics=FALSE, ...)

Arguments

object

Object of class "gssanova".

diagnostics

Flag indicating if diagnostics are required.

...

Ignored.

Details

Similar to the iterated weighted least squares fitting of glm, penalized likelihood regression fit can be calculated through iterated penalized weighted least squares.

The diagnostics are based on the "pseudo" Gaussian response model behind the weighted least squares problem at convergence.

Value

summary.gssanova returns a list object of class "summary.gssanova" consisting of the following elements. The entries pi, kappa, cosines, and roughness are only calculated if diagnostics=TRUE.

call

Fitting call.

family

Error distribution.

alpha

Parameter used to define cross-validation in model fitting.

fitted

Fitted values on the link scale.

dispersion

Assumed or estimated dispersion parameter.

residuals

Working residuals on the link scale.

rss

Residual sum of squares.

dev.resid

Deviance residuals.

deviance

Deviance of the fit.

dev.null

Deviance of the null model.

penalty

Roughness penalty associated with the fit.

pi

"Percentage decomposition" of "explained variance" into model terms.

kappa

Concurvity diagnostics for model terms. Virtually the square roots of variance inflation factors of a retrospective linear model.

cosines

Cosine diagnostics for practical significance of model terms.

roughness

Percentage decomposition of the roughness penalty penalty into model terms.

References

Gu, C. (1992), Diagnostics for nonparametric regression models with additive terms. Journal of the American Statistical Association, 87, 1051–1058.

Assessing Smoothing Spline ANOVA Fits with Non-Gaussian Responses

Description

Calculate various summaries of smoothing spline ANOVA fits with non-Gaussian responses.

Usage

## S3 method for class 'gssanova0'
summary(object, diagnostics=FALSE, ...)

Arguments

object

Object of class "gssanova".

diagnostics

Flag indicating if diagnostics are required.

...

Ignored.

Details

Similar to the iterated weighted least squares fitting of glm, penalized likelihood regression fit can be calculated through iterated penalized weighted least squares.

The diagnostics are based on the "pseudo" Gaussian response model behind the weighted least squares problem at convergence.

Value

summary.gssanova0 returns a list object of class "summary.gssanova0" consisting of the following elements. The entries pi, kappa, cosines, and roughness are only calculated if diagnostics=TRUE.

call

Fitting call.

family

Error distribution.

method

Method for smoothing parameter selection.

dispersion

Assumed or estimated dispersion parameter.

iter

Number of performance-oriented iterations performed.

fitted

Fitted values on the link scale.

residuals

Working residuals on the link scale.

rss

Residual sum of squares.

dev.resid

Deviance residuals.

deviance

Deviance of the fit.

dev.null

Deviance of the null model.

alpha

Estimated size for family="nbinomial" with one column responses. Estimated inverse scale of log life time for family="nbinomial", "lognorm", or "loglogis".

penalty

Roughness penalty associated with the fit.

pi

"Percentage decomposition" of "explained variance" into model terms.

kappa

Concurvity diagnostics for model terms. Virtually the square roots of variance inflation factors of a retrospective linear model.

cosines

Cosine diagnostics for practical significance of model terms.

roughness

Percentage decomposition of the roughness penalty penalty into model terms.

References

Gu, C. (1992), Diagnostics for nonparametric regression models with additive terms. Journal of the American Statistical Association, 87, 1051–1058.

Assessing Smoothing Spline ANOVA Fits

Description

Calculate various summaries of smoothing spline ANOVA fits.

Usage

## S3 method for class 'ssanova'
summary(object, diagnostics=FALSE, ...)
## S3 method for class 'ssanova0'
summary(object, diagnostics=FALSE, ...)
## S3 method for class 'ssanova9'
summary(object, diagnostics=FALSE, ...)

Arguments

object

Object of class "ssanova".

diagnostics

Flag indicating if diagnostics are required.

...

Ignored.

Value

summary.ssanova returns a list object of class "summary.ssanova" consisting of the following elements. The entries pi, kappa, cosines, and roughness are only calculated if diagnostics=TRUE; see the reference below for details concerning the diagnostics.

call

Fitting call.

method

Method for smoothing parameter selection.

fitted

Fitted values.

residuals

Residuals.

sigma

Assumed or estimated error standard deviation.

r.squared

Fraction of "explained variance" by the fitted model.

rss

Residual sum of squares.

penalty

Roughness penalty associated with the fit.

pi

"Percentage decomposition" of "explained variance" into model terms.

kappa

Concurvity diagnostics for model terms. Virtually the square roots of variance inflation factors of a retrospective linear model.

cosines

Cosine diagnostics for practical significance of model terms.

roughness

Percentage decomposition of the roughness penalty penalty into model terms.

References

Gu, C. (1992), Diagnostics for nonparametric regression models with additive terms. Journal of the American Statistical Association, 87, 1051–1058.

Calculating Kendall's Tau and Spearman's Rho for 2-D Copula Density Estimates

Description

Calculate Kendall's tau and Spearman's rho for 2-D copula density estimates.

Usage

## S3 method for class 'sscopu'
summary(object, ...)

Arguments

object

Object of class "sscopu".

...

Ignored.

Value

A list containing Kendall's tau and Spearman's rho.

Progression of Diabetic Retinopathy

Description

Data derived from the Wisconsin Epidemiological Study of Diabetic Retinopathy.

Usage

data(wesdr)

Format

A data frame containing 669 observations on the following variables.

`dur`	Duration of diabetes at baseline, in years.
`gly`	Percent of glycosylated hemoglobin at baseline.
`bmi`	Body mass index at baseline.
`ret`	Binary indicator of retinopathy progression at first follow-up.

Source

Wang, Y. (1997), GRKPACK: Fitting smoothing spline ANOVA models for exponential families. Communications in Statistics – Simulations and Computation, 26, 765–782.

References

Klein, R., Klein, B. E. K., Moss, S. E., Davis, M. D., and DeMets, D. L. (1988), Glycosylated hemoglobin predicts the incidence and progression of diabetic retinopathy. Journal of the American Medical Association, 260, 2864–2871.

Klein, R., Klein, B. E. K., Moss, S. E., Davis, M. D., and DeMets, D. L. (1989), The Wisconsin Epidemiologic Study of Diabetic Retinopathy. X. Four incidence and progression of diabetic retinopathy when age at diagnosis is 30 or more years. Archive Ophthalmology, 107, 244–249.

Wahba, G., Wang, Y., Gu, C., Klein, R., and Klein, B. E. K. (1995), Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. The Annals of Statistics, 23, 1865–1895.

Stages of Diabetic Retinopathy

Description

Data derived from the Wisconsin Epidemiological Study of Diabetic Retinopathy.

Usage

data(wesdr1)

Format

A data frame containing 2049 observations on the following variables.

`age`	Age of patient.
`dur`	Duration of diabetes, in years.
`gly`	Percent of glycosylated hemoglobin.
`upro`	Ordinal urine protein level.
`insl`	Binary indicator of insulin usage.
`ret1`	Ordinal retinopathy stage, right eye.
`ret2`	Ordinal retinopathy stage, left eye.

Colorectal Cancer Mortality Rate in Indiana Counties

Description

Usage

Format

Details

Source

References

Examples

Diabetic Retinopathy

Description

Usage

Format

Source

References

Water Acidity in Lakes

Description

Usage

Format

Details

Source

References

Examples

Air Pollution and Road Traffic

Description

Usage

Format

Source

Protein Expression in Human Immune System Cells

Description

Usage

Format

Source

AIDS Incubation

Description

Usage

Format

Source

Treatment of Bacteriuria

Description

Usage

Format

Source

References

Buffalo Annual Snowfall

Description

Usage

Format

Source

Evaluating Conditional PDF, CDF, and Quantiles of Smoothing Spline Conditional Density Estimates

Description

Usage

Arguments

Details

Value

Note

See Also

Evaluating 1-D Conditional PDF, CDF, and Quantiles of Copula Density Estimates

Description

Usage

Arguments

Value

See Also

Evaluating Conditional PDF, CDF, and Quantiles of Smoothing Spline Density Estimates

Description

Usage

Arguments

Details

Value

Note

See Also

Average Temperatures During December 1980 Through February 1981

Description

Usage

Format

Source

Numerical Engine for ssden, sshzd, and sshzd1

Description

Usage

Arguments

Details