[R] prop.trend.test

Fri Sep 8 14:47:43 CEST 2023

Say I have one machine that produces 15 million widgets per day. Every day a few widgets are defective. Is the proportion increasing?
The data analyst needs to know what time span is of interest. I assume that there is some day-to-day variability. Is today's defect rate greater or less than yesterday's rate is not a trend. However, there might also be other trends. 4th shift might have a higher rate than 2nd shift, or within a shift the rate may change as workers tire or get breaks.

I might try the PooledInfRate package. It calculates a proportion defective and a 95% confidence interval. Do the intervals overlap?
I might  try being selective. If the rate changes over the week, then the weekly cycle may obscure an underlying trend. For the last year, has the rate increased using only data from Monday?
The question gets more complicated if there are multiple production lines and locations. High worker turnover may also be an issue.

Tim

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rui Barradas
Sent: Friday, September 8, 2023 6:53 AM
To: peter dalgaard <pdalgd using gmail.com>; Thomas Subia <tgs77m using yahoo.com>
Cc: R. Mailing List <r-help using r-project.org>
Subject: Re: [R] prop.trend.test

[External Email]

Às 10:06 de 08/09/2023, peter dalgaard escreveu:
> Yes, this was written a bit bone-headed (as I am allowed to say...)
>
> If you look at the code, you will see inside:
>
>      a <- anova(lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)),
>          weights = w))
>
> and the lm() inside should give you the direction via the sign of the regression coefficient on "score".
>
> So, at least for now, you could just doctor a copy of the code for
> your own purposes, as in
>
>   fit <- lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)),
>          weights = w)
>   a <- anova(fit)
>
> and arrange to return coef(fit)["score"] at the end. Something like
> structure(... estimate=c(lpm.slope=coef(fit)["score"]) ....)
>
> (I expect that you might also extract the t-statistic from
> coef(summary(fit)) and find that it is the signed square root of the
> Chi-square, but I won't have time to test that just now.)
>
> -pd
>
>> On 8 Sep 2023, at 07:22 , Thomas Subia via R-help <r-help using r-project.org> wrote:
>>
>> Colleagues,
>>
>> Thanks all for the responses.
>>
>> I am monitoring the daily total number of defects per sample unit.
>> I need to know whether this daily defect proportion is trending upward (a bad thing for a manufacturing process).
>>
>> My first thought was to use either a u or a u' control chart for this.
>> As far as I know, u or u' charts are poor to detect drifts.
>>
>> This is why I chose to use prop.trend.test to detect trends in proportions.
>>
>> While prop.trend.test can confirm the existence of a trend, as far as
>> I know, it is left to the user to determine what direction that trend is.
>>
>> One way to illustrate trending is of course to plot the data and use
>> geom_smooth and method lm For the non-statisticians in my group, I've found that using this method along with the p-value of prop.trend.test, makes it easier for the users to determine the existence of trending and its direction.
>>
>> If there are any other ways to do this, please let me know.
>>
>> Thomas Subia
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thursday, September 7, 2023 at 10:31:27 AM PDT, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>
>>
>>
>>
>>
>> Às 14:23 de 07/09/2023, Thomas Subia via R-help escreveu:
>>>
>>> Colleagues
>>>
>>>     Consider
>>> smokers  <- c( 83, 90, 129, 70 )
>>> patients <- c( 86, 93, 136, 82 )
>>>
>>>     prop.trend.test(smokers, patients)
>>>
>>>     Output:
>>>
>>>         Chi-squared Test for Trend inProportions
>>>
>>>     data:  smokers out of patients ,
>>>
>>> using scores: 1 2 3 4
>>>
>>> X-squared = 8.2249, df = 1, p-value = 0.004132
>>>
>>>     # trend test for proportions indicates proportions aretrending.
>>>
>>>     How does one identify the direction of trending?
>>>     # prop.test indicates that the proportions are unequal but doeslittle to indicate trend direction.
>>> All the best,
>>> Thomas Subia
>>>
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://st/
>>> at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl
>>> .edu%7Cc652dfff973549a790db08dbb059e5da%7C0d4da0f84a314d76ace60a6233
>>> 1e1b84%7C0%7C0%7C638297672399811965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
>>> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
>>> %7C%7C&sdata=RLmEonNbUSLwfvHjWylq75aNjrfQ5nNlRocDb98HQa4%3D&reserved
>>> =0 PLEASE do read the posting guide
>>> http://www/
>>> .r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%
>>> 7Cc652dfff973549a790db08dbb059e5da%7C0d4da0f84a314d76ace60a62331e1b8
>>> 4%7C0%7C0%7C638297672399811965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
>>> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7
>>> C&sdata=9jMsAjDR2Zusf2Fp0idn9baEaa6euE7Hw6xngd21BPU%3D&reserved=0
>>> and provide commented, minimal, self-contained, reproducible code.
>> Hello,
>>
>> By visual inspection it seems that there is a decreasing trend.
>> Note that the sample estimates of prop.test and smokers/patients are equal.
>>
>>
>> smokers  <- c( 83, 90, 129, 70 )
>> patients <- c( 86, 93, 136, 82 )
>>
>> prop.test(smokers, patients)$estimate
>> #>    prop 1    prop 2    prop 3    prop 4
>> #> 0.9651163 0.9677419 0.9485294 0.8536585
>>
>> smokers/patients
>>
>> #> [1] 0.9651163 0.9677419 0.9485294 0.8536585
>>
>> plot(smokers/patients, type = "b")
>>
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://sta/
>> t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.e
>> du%7Cc652dfff973549a790db08dbb059e5da%7C0d4da0f84a314d76ace60a62331e1
>> b84%7C0%7C0%7C638297672399811965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
>> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7
>> C&sdata=RLmEonNbUSLwfvHjWylq75aNjrfQ5nNlRocDb98HQa4%3D&reserved=0
>> PLEASE do read the posting guide
>> http://www/.
>> r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C
>> c652dfff973549a790db08dbb059e5da%7C0d4da0f84a314d76ace60a62331e1b84%7
>> C0%7C0%7C638297672399811965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
>> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
>> ta=9jMsAjDR2Zusf2Fp0idn9baEaa6euE7Hw6xngd21BPU%3D&reserved=0
>> and provide commented, minimal, self-contained, reproducible code.
>
Hello,

Actually, the t-statistic is not the signed square root of the X-squared test statistic. I have edited the function, assigned the lm fit and returned it as is. (print.htest won't print this new list member so the output is not cluttered with irrelevant noise.)

smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

edit(prop.trend.test, file = "ptt.R")
source("ptt.R")

# stats::prop.trend.test edited to include the results # of the lm fit and saved under a new name ptt <- function (x, n, score = seq_along(x)) {
   method <- "Chi-squared Test for Trend in Proportions"
   dname <- paste(deparse1(substitute(x)), "out of", deparse1(substitute(n)),
                  ",\n using scores:", paste(score, collapse = " "))
   x <- as.vector(x)
   n <- as.vector(n)
   p <- sum(x)/sum(n)
   w <- n/p/(1 - p)
   a <- anova(fit <- lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)),
                        weights = w))
   chisq <- c(`X-squared` = a["score", "Sum Sq"])
   structure(list(statistic = chisq, parameter = c(df = 1),
                  p.value = pchisq(as.numeric(chisq), 1, lower.tail = FALSE),
                  method = method, data.name = dname
                  , lmfit = fit
                  ), class = "htest")
}

ht <- ptt(smokers, patients)
ht$statistic |> sqrt()
#> X-squared
#>  2.867913
ht$lmfit |> summary() |> coef()
#>                Estimate Std. Error   t value   Pr(>|t|)
#> (Intercept)  1.02187141 0.04732740 21.591539 0.00213815
#> score       -0.03341563 0.01723384 -1.938954 0.19207036

Hope this helps,

Rui Barradas

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.