Control Charts as defined by Walter A. Shewhart and popularized by W. Edwards Deming were originally designed for quality characteristics that have a normal distribution of probability.

In some contexts, the assumption that the quality characteristic is normally distributed is incorrect. This is particularly true for a characteristic such as resolution time of tickets, whose probability is better represented by an exponential distribution.

This article is an attempt to adapt Control Charts to exponential distributions.

# Normal Distribution vs. Exponential Distribution

The following charts demonstrate the differences between normal and exponential distributions.

- Charts at left show the evolution of a given quality characteristic. The X-Axis can be either a sample ID or some time-related value.
- Upper left chart simulates a normal distribution where Mean = 10 and Standard Deviation = 5.
- Bottom left chart simulates an exponential distribution where Lambda = 1/10.
- Charts at right evaluate the frequency of given ranges of the quality characteristic. For example, nearly 30 samples of the normal distribution have a quality value between 8 and 10.
- Upper right chart shows the characteristic bell curve a normal distribution.
- Bottom right chart is typical of an exponential distribution, with a rapidly decreasing frequency as the quality characteristic increases.

**Key Properties**

Normal Distribution | Exponential Distribution | |
---|---|---|

Probability Density Function | ||

Cumulative Distribution Function | ||

Mean | ||

Median | ||

Variance |

# Controls Limits for Exponential Distributions

For normal distributions, control limits used to detect special-cause variations are often based on the famous six-sigma spread around the mean.

This is because nearly 99.7% of values lie within this spread, as shown hereafter:

**What 99.7% Coverage Means for Exponential Distribution**

We have just seen that, for a normal distribution, 99.7% of values are covered with x (the quality characteristic) being between μ – 3σ and μ + 3σ.

How does it translate to an exponential distribution? We simply have to resolve the following equation:

This result is much more visible by drawing the Cumulative Distribution Function of an exponential distribution:

**Summary of Control Limits Based On The 99.7% Rule**

Normal Distribution | Exponential Distribution | |
---|---|---|

LCL (Lower Limit Control) | μ – 3σ | 0 |

UCL (Upper Limit Control) | μ + 3σ | 5.9/λ |

**Generalization of the Three-Sigma Rule**

Range (Normal Distribution) | Population in Range | Range (Exponential Distribution) |
---|---|---|

μ ± 1σ | 68% | [0; 1.1/λ] |

μ ± 2σ | 95% | [0; 3/λ] |

μ ± 3σ | 99.7% | [0; 5.9/λ] |

# Tutorial: Statistical Process Control Based on Resolution Time of Tickets

**Context:** Customers submit demands though JIRA tickets to the IT team.

**Goal #1:** The team wants to compute several key characteristics of its process of resolution of tickets, based on historical data.

**Goal #2:** The team wants to analyze variations of resolution time, separate common-cause variations from special-cause variations, and try to reduce variations in order to improve predictability and ultimately gain trust from customers.

## Step 1 – Get Data from JIRA

Browse JIRA and get the latest 50 tickets your team has resolved.

**Example of JQL**:

`project = GTS AND (resolution = Fixed OR resolution = Done) ORDER BY resolved DESC`

Export your selection with all fields to Microsoft Excel. Format the worksheet to obtain something like this:

Add a new column “Resolution Time”, which is simply the difference between “Resolved” and “Created”.

## Step 2 – Check if Hypothesis of Exponential Distribution is Correct

On a new worksheet, create a table computing the probability that the resolution time is lower or equal to `x`, for about a hundred values of `x` (so that you capture frequencies for short resolutions times).

You may use a linear scale, such as 1, 2, 3, 4 days, and so on, or an exponential one. The advantage of using an exponential scale is to capture the numerous small resolution times and therefore to increase the precision of the chart hereafter.

To compute an exponential scale, you can use the following formula:

In our example, we have:

**max(resolution time):**55 days**max(index):**100 (e.g. how many points we want to have on the chart)

Therefore, the formula simplifies to:

Using formula above, compute an exponential scale in the spreadsheet.

Add a column that computes the probability.

Plot the computed probability on a chart and check that the curve can be approximated by an exponential distribution Cumulative Distribution Function.

## Step 3 – Compute Control Limits

Return to the previous sheet containing the original data. Compute the average resolution time.

We can get an estimation of λ by computing:

Now that we have found the average resolution time, we can define the control limits as:

**LCL:**0 (resolution time cannot be negative)**UCL:**5.9 x 2.62 =**15.5 days**

## Step 4 – Draw Control Chart

Create a chart that includes the UCL computed previously. In this tutorial, tickets are ordered by resolution date.

## Step 5 – Reflect and Adjust

Assuming that we can approximate the probability of resolution time by an exponential distribution with λ = 1/2.62 = 0.38 tickets/day, ticket GTS-1104 had only a probability of 0.3% to be above UCL = 15.5 days. Therefore, we can consider that the resolution time of this ticket is “out-of-process”.

Perform root-cause analysis to find the special causes for these tickets. Can you prevent these special causes? What is the probability these causes appear another time in the future? Can you devise countermeasures? If yes, use a PDCA process to implement them properly.

I believe you overlooked the data-mining adjustment. The p-value of 0.003 for the ticket #1104 would be statistically significant if you only had that one item. But you “mined” it from a set of 50 items, so the probability that one of them would take that long is:

P_adj = 1 – (1-P)^N

In this case, the adjusted p-value is about 0.14. If you had 1000 items to mine from, your chance of encountering such item would increase from 14% to about 95%.

A coin-toss sequence provides a simple model explaining this phenomenon. If you ask someone to toss a coin 10 times, the probability of heads (or tails) 10 times in a row is 1/1024, indicating an assignable cause (to their skill or to a coin defect). But if you ask 10000 people to do the same, the chance that someone will do it is nearly 100%, with obviously no skill involved.

Thank you for your insights! I will try to understand how p-value is used in the context of normal distribution-based statistical process control and see if I can improve my proposal about exponential distributions.

p-value is simply the items’ probability to be more deviant than the item you’re looking at. For example, if they’re normally distributed and the item you’re looking at is at one-sigma, its p-value is 0.16; if it’s at two-sigma, its p-value is 0.023, etc. Similar probabilities can be calculated for the exponential and other distributions. The adjustment part – where I adjusted it for the size of the set the item was selected from – is independent of the distribution type.

But regardless of all this math, when the time comes to have a retrospective, the item #1104 looks like one of the things to talk about :-)