[R] Labour Statistics

Max mnevill at exitcheck.net
Wed Oct 15 18:21:51 CEST 2008


Thankyou for the advice. I'll do what I can with it.

Ruben Wrote:
If I understand your problem correctly, you have that the magnitude of
deviations from the mean/median/mode in the volume of your requests for
background checks in month m predicts a multivariate response that
represents the macroeconomic situation in month m+1.
First, regarding your original question, a statistician judging your
product would like to see a measure of predictive success. If you have 
model to relate your predictor (the deviation in volume of requests)
with your response (several variables representing the macroeconomic
status) then you could run the model for many months (say from Jan 2000
to Sep 2008) and predict the macroeconomic status with the model and
compare it with the actual macroeconomic status observed. This would be
framed into a measure of predictive success and predictive mean squared

Second, regarding what method to use to fit the relation between your
predictor and the multivariate response, you have a number of options.
One simple alternative that would reduce your problem to a simple
univariate modeling problem would be to research the economic 
to define an index of macroeconomic status that would reduce your
multivariate response to an univariate response. Additionally, if the
variables in the multivariate response are strongly correlated, you can
define your own index by using principal component analysis on the
multivariate response, and later use the first principal component as a
univariate response. After that, many options are again available, such
as forecasting methods or regular time series analysis. A more complex
but probably more precise approach would be to model the multivariate
response as such. This depends on the nature of the variables in the
multivariate response. If they can be considered as multinomial counts
then you have a very good solution using multinomial logistic 
with function multinom in package nnet.
Maybe this can get you started.

> Gad Abraham explained :
>> Max wrote:
>>> Hi everyone,
>>> This is not so much of an R question as a statistics question. I currently 
>>> work for the largest pre employment screening company in Canada. Upper 
>>> management has noticed that noticed that usually a month or so before any 
>>> big kind of economic shock happens, that our incoming files (requests for 
>>> a background check) jump up or down.
>>> As the company statistician, they've asked me to see if the relationship 
>>> is strong enough to put together a product that can be sold to any kind of 
>>> firm or organization (brokerages or any kind of investing firm, federal 
>>> ministry of finance, statistics canada (like the bureau of stats in the 
>>> USA), universities etc)
>>> In Canada on the 10th of every month, statistics canada releases labour 
>>> statistics for the previous month. The way CFO sees it, *ideally* on the 
>>> (1st to 10th, something like that) every month, the firm I work for could 
>>> be releasing data for the rest of the month.
>>> What I'm trying to figure out is if you were in the position of evaluating 
>>> the final product for purchase, what kind of information would make the 
>>> product credible/viable? Summary statistics? Variance covariance matrices? 
>>> Graphs of the data? Cross Correlation matrices for time series analysis?
>>> It's frustrating because I can see a noticeable relationship between our 
>>> file volume and the unemployment rate (in particular,) but I'm not sure 
>>> how to appropriately frame it in a way that another statistician/modeler 
>>> would want the data.
>> Why not start with some simple plots of the relationships between your 
>> variables? Once you have a feel for the problem, you can look into 
>> modelling it more formally using a suitable regression model.
> Gad, the issue I have is that I technically have one predictor for multiple 
> response. The data is not very clean for simple univariate models. 
> Unfortunately, my knowledge of multivariate response models is poor, and how 
> to set up the problem in R as a multivariate regression is a total mystery to 
> me. (Multivariate was the one course that I wasn't able to take in my 
> undergrad math/stats degree. )
> The other issue is that if I view the problem as a time series problem, it's 
> multiple time series analysis, which I don't have any books on.
> The more I look at the data and the problem the more I feel like I'm in way 
> over my head.

More information about the R-help mailing list