[R] {car} outlierTest looses p/q values

Phil chobophil at gmail.com
Tue Oct 21 09:34:07 CEST 2014


Hi all,

as John pointed out, there is a way to create settings where the 
studentized residuals are undefined. However, after cross-checking it 
seems that the residuals are getting calculated without any error. The 
problem comes up when I use outlierTest to assign a p,q value,respectively.

Below is the code and some printouts:


non.zero.orga.vector <- 
c(1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,1.00,13.98) #targervector

print(meta.Matrix[,3])
  x1         x28          x2      x3      x4         x51     x52     x5 
     x6      x7     x82
      4.6     35.8      4.1      5.4      5.2     27.9 48.2      4.5    
5.7    4.2    100.5

print(meta.Matrix[,10])
      x1                      x28               x2     x3
0.2990466     1.0497156     0.2805028 0.3517993

       x4          x51                x52                  x5
0.3543678 1.0178604 0.4933182 0.2810865

     x6                   x7              x82
0.4349550 0.3269192 3.0889747

glModel <- glm(non.zero.orga.vector ~ meta.Matrix[,3]+meta.Matrix[,10]) 
#create glm for testing

print(glModel$residuals) #check if residuals are calculated for every 
entry in the target vector
   x1                   x28                x2                x3         
     x4              x51           x52
  0.6600696 -2.5816334  0.7438969  0.4129873  0.3976498 -2.5350861 
0.3128240
        x5            x6                    x7           x82
  0.7465766  0.0102020  0.5181337  1.3143796



  test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf) #test 
the glm for outlier, cutoff=Inf/n.max=Inf == report everything

print(test.Res$bonf.p) #check if q-value exists
      x28              x51              x52           x5     x2       
         x1               x7           x3
0.1915995 0.2240759 2.1637438 6.0211575 6.0306110 6.4618792 7.1932613 
7.7595619
       x4               x6
7.8394477 9.9437848



As you see the glm calculates residuals for x82 (which is in fact 1.314) 
but the outlierTest does not assign a p/q value to it. Does anyone know why?

Thanks in advance,

Phil

On 10/17/2014 08:54 PM, John Fox wrote:
> Dear Phil,
>
> Yes, that's a bit clearer. One can invent data configurations where certain studentized residuals are undefined. For example, try the following:
>
> y <- c(0, 0, 0, 0, 0, 1)
> x <- 1:6
> xx <- (1:6 - 3.5)^2
> rstudent(lm(y ~ x))
> rstudent(lm(y ~ xx))
> plot(x, y)
> plot(xx, y)
>
> The plots should clarify what's going on.
>
> I'm copying to r-help since the discussion began there.
>
> I hope this helps,
>   John
>
> On Fri, 17 Oct 2014 18:35:59 +0200
>   Philip Stevens <chobophil at gmail.com> wrote:
>> Dear John,
>>
>> Thank you for your fast reply. Unfortunately I am out of office right now
>> but I will get back to you on Monday asap with a toy example and some code.
>> Meanwhile let me try to explain further.
>>
>> Basically not the glm but the outlierTest afterwards fails. And it only
>> fails if all values, used to set up the glm, are exactly 1 except for one
>> value which can be arbitrary large. I construct the glm from a vector of
>> doubles (the target values) and a vector of other numeric values(metadata).
>> (So this should be fit <- glm(targetVector ~ metadata) ) And want to check
>> if one of those doubles(in the target vector) is an outlier using
>> outlierTest(fit). The residuals are calculated by the glm but the
>> outlierTest does not report a p nor a q value for the one value in the
>> target vector which is not 1. And I can't figure out why...
>>
>> I hope this makes it a bit clearer.
>> Anyways I will come back to you on Monday.
>>
>> Best,
>>
>> Phil
>> Am 17.10.2014 17:43 schrieb "John Fox" <jfox at mcmaster.ca>:
>>
>>> Dear Phil,
>>>
>>> After reading your posting several times, I still don't understand what
>>> you did. As usual, having a reproducible example illustrating the error
>>> would be a great help. I do have a guess about the source of the error:
>>> glm() failed in some way for the problematic case.
>>>
>>> Best,
>>>   John
>>>
>>> ------------------------------------------------
>>> John Fox, Professor
>>> McMaster University
>>> Hamilton, Ontario, Canada
>>> http://socserv.mcmaster.ca/jfox/
>>>
>>>
>>> On Fri, 17 Oct 2014 12:53:30 +0200
>>>   Phil <chobophil at gmail.com> wrote:
>>>> Hi guys,
>>>>
>>>> I came across a strange phenomena and can't figure out why it happens by
>>> myself so here we go.
>>>> I got a dataframe which consists of double numbers which I want to
>>> check, row-wise if there are outliers in the rows.
>>>> So I iterate over the rows and create a glm using the numbers of that
>>> particular row. Which might look like this:
>>>> case1)
>>>>           x1        x2        x3        x4        x5        x6 x7
>>>> x8        x9        x10        x11
>>>>       0.00     3.91     0.00     0.00     0.00   68.03   40.39 0.00
>>>> 0.00      0.00       4.11
>>>>
>>>> or like this:
>>>> case2)
>>>>           x1        x2        x3        x4        x5        x6 x7
>>>> x8        x9        x10        x11
>>>>        1.00     1.00    1.00     1.00     1.00     1.00     1.00 1.00
>>>> 1.00     1.00      5.34
>>>>
>>>> or any other combination of double numbers...
>>>>
>>>> however, using a glm like this:
>>>>
>>>> glModel <- glm(vector ~ some_other_meta_data_which_is_double_numbers)
>>>>
>>>> and testing it with:
>>>>
>>>>    test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf)
>>>>
>>>> I always get a result consisting of the desired p and q values but not
>>> if the vector I use looks like case2. There is no error message and the
>>> computation does not stop either.
>>>> However, all p and q values are produced except for the last value x11.
>>>>
>>>> Any idea why this particular value gets dropped from the output of the
>>> outlierTest Method in the car package.
>>>> Here is the sessioninfo:
>>>>
>>>>    sessionInfo()
>>>> R version 3.1.1 (2014-07-10)
>>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>>    [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>>>>    [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>>>>    [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>>>>    [7] LC_PAPER=en_US.utf8       LC_NAME=C
>>>>    [9] LC_ADDRESS=C              LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods base
>>>>
>>>> other attached packages:
>>>> [1] ggplot2_1.0.0      car_2.0-21         RColorBrewer_1.0-5 iNEXT_1.0
>>>> [5] vegan_2.0-10       lattice_0.20-29    permute_0.8-3
>>>>
>>>> loaded via a namespace (and not attached):
>>>>    [1] colorspace_1.2-4 compiler_3.1.1   digest_0.6.4 grid_3.1.1
>>>>    [5] gtable_0.1.2     labeling_0.3     MASS_7.3-33 munsell_0.4.2
>>>>    [9] nnet_7.3-8       plyr_1.8.1       proto_0.3-10 Rcpp_0.11.2
>>>> [13] reshape2_1.4     scales_0.2.4     stringr_0.6.2    tools_3.1.1
>>>>
>>>> Any help is highly appreciated.
>>>>
>>>> Thanks
>>>>
>>>> Phil
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>>
> ------------------------------------------------
> John Fox, Professor
> McMaster University
> Hamilton, Ontario, Canada
> http://socserv.mcmaster.ca/jfox/
> 	
> 	
>



More information about the R-help mailing list