[R] Problems using quantile regression (rq) to model GLD random variables in R

Fri Sep 9 17:51:32 CEST 2011

Everyone:

I am working on a simulation of the efficiencies of regression
estimators when applied to model a specific form of highly skewed
data.  The outcome variable (y) is being simulated from a generalized
lambda distribution (GLD) to reflect the characteristics (mean,
variance, skewness, kurtosis) of an observed variable.  The regressor
of interest (x) is simply a binary indicator of group membership.  The
relevant R code is as follows:

x[1:200]<-0
x[201:400]<-1

params <- c(2.684864244,.0144182,26.01913,711.0554)
y0<-rgl(200, params, param="rs")

params <- c(-0.113741088,.0523072,15.98282,426.4456)
y1<-rgl(200, params, param="rs")

y<-c(y0, y1)

I have verified that the GLD parameters in each case are valid using
gl.check.lambda (in the GLD package).

While I experienced no difficulty when using OLS to fit models to y,
the quantile regression estimator and robust (e.g., M) regression
estimators yielded minute (or missing) variance estimates and
infinitely large t statistics for the coefficient for x.  The problem
appears to be related to the number of duplicate observations in my
simulated data.  As I understand it, the GLD is a transformation of
the uniform distribution.  Given the parameters specified above, I end
up with many duplicate observations that happen to be equal to the
true median value.  This lack of variation around the median appears
to be causing problems for the quantile regression estimator (as
implemented using rq) and robust regression estimator.

I am unaware of a viable alternative to the GLD that can be readily
implemented in R.  In the absence of an alternative distribution, I am
wondering whether jittering could be used as a practical (and
hopefully valid) solution to my dilemma.  That is, add a small
residual drawn from U(-.5,.5) to each GLD observation and model the
composite variable as a function of x.  This would be expected to
preserve the mean and median of y over repeated simulations, and the
added variance would be expected to be negligible.  When using this
procedure, I derive reasonable variance estimates and get results that
make intuitive sense (i.e., the efficiency of the M estimator >=
quantile regression > OLS).  I have seen a similar jittering procedure
applied in a paper on the modeling of quantiles of count data (Machado
and Santa Silva. JASA. 2005; 100: 1226).

I would appreciate others' thoughts regarding the validity of the
proposed jittering procedure or suggestions for alternative approaches
I could use to deal with my problem.  Many thanks!

Regards,

Jim

P.S.:  Although I do not think it has any bearing on my problem, here
is the quantile regression code I am using:

fit1<-summary(rq(y~x, tau = .5, ci=FALSE),se="ker")