Control Charts for Exponential Distributions

Control Charts as defined by Walter A. Shewhart and popularized by W. Edwards Deming were originally designed for quality characteristics that have a normal distribution of probability.

In some contexts, the assumption that the quality characteristic is normally distributed is incorrect. This is particularly true for a characteristic such as resolution time of tickets, whose probability is better represented by an exponential distribution.

This article is an attempt to adapt Control Charts to exponential distributions.

Normal Distribution vs. Exponential Distribution

The following charts demonstrate the differences between normal and exponential distributions.

  • Charts at left show the evolution of a given quality characteristic. The X-Axis can be either a sample ID or some time-related value.
    • Upper left chart simulates a normal distribution where Mean = 10 and Standard Deviation = 5.
    • Bottom left chart simulates an exponential distribution where Lambda = 1/10.
  • Charts at right evaluate the frequency of given ranges of the quality characteristic. For example, nearly 30 samples of the normal distribution have a quality value between 8 and 10.
    • Upper right chart shows the characteristic bell curve a normal distribution.
    • Bottom right chart is typical of an exponential distribution, with a rapidly decreasing frequency as the quality characteristic increases.

Illustration of differences between normal and exponential distributions based on simulated data.

Key Properties

Normal Distribution Exponential Distribution
Probability Density Function
Cumulative Distribution Function
Mean
Median
Variance

Controls Limits for Exponential Distributions

For normal distributions, control limits used to detect special-cause variations are often based on the famous six-sigma spread around the mean.

This is because nearly 99.7% of values lie within this spread, as shown hereafter:

Probability Density Function of a normal distribution.

What 99.7% Coverage Means for Exponential Distribution

We have just seen that, for a normal distribution, 99.7% of values are covered with x (the quality characteristic) being between μ – 3σ and μ + 3σ.

How does it translate to an exponential distribution? We simply have to resolve the following equation:

99.7% of values for an exponential distribution lie between x = 0 and x = 5.9/λ.

This result is much more visible by drawing the Cumulative Distribution Function of an exponential distribution:

Cumulative Distribution Function of an exponential distribution. P(X <= x) = 99.7% for x = 5.9/λ.

Summary of Control Limits Based On The 99.7% Rule

Normal Distribution Exponential Distribution
LCL (Lower Limit Control) μ – 3σ 0
UCL (Upper Limit Control) μ + 3σ 5.9/λ

Generalization of the Three-Sigma Rule

Range (Normal Distribution) Population in Range Range (Exponential Distribution)
μ ± 1σ 68% [0; 1.1/λ]
μ ± 2σ 95% [0; 3/λ]
μ ± 3σ 99.7% [0; 5.9/λ]

Three-Sigma Rule applied to normal distribution.

Three-Sigma Rule adapted to exponential distribution.

Tutorial: Statistical Process Control Based on Resolution Time of Tickets

Context: Customers submit demands though JIRA tickets to the IT team.

Goal #1: The team wants to compute several key characteristics of its process of resolution of tickets, based on historical data.

Goal #2: The team wants to analyze variations of resolution time, separate common-cause variations from special-cause variations, and try to reduce variations in order to improve predictability and ultimately gain trust from customers.

Step 1 – Get Data from JIRA

Browse JIRA and get the latest 50 tickets your team has resolved.

Example of JQL:
project = GTS AND (resolution = Fixed OR resolution = Done) ORDER BY resolved DESC

Export your selection with all fields to Microsoft Excel. Format the worksheet to obtain something like this:

Fields of interest are mainly "Created" and "Resolved". "Key" is informative only.

Add a new column “Resolution Time”, which is simply the difference between “Resolved” and “Created”.

When subtracting two dates, Excel returns automatically duration expressed in days.

Step 2 – Check if Hypothesis of Exponential Distribution is Correct

On a new worksheet, create a table computing the probability that the resolution time is lower or equal to x, for about a hundred values of x (so that you capture frequencies for short resolutions times).

You may use a linear scale, such as 1, 2, 3, 4 days, and so on, or an exponential one. The advantage of using an exponential scale is to capture the numerous small resolution times and therefore to increase the precision of the chart hereafter.

To compute an exponential scale, you can use the following formula:

"Index" is the index of the point on the chart, for example 0, 1, 2, ... 100.

In our example, we have:

  • max(resolution time): 55 days
  • max(index): 100 (e.g. how many points we want to have on the chart)

Therefore, the formula simplifies to:

This formula produces a simple exponential scale in Excel. Note that f(0) = 0 and f(100) = 55.

Using formula above, compute an exponential scale in the spreadsheet.

An exponential scale helps to capture lowest resolution times.

Add a column that computes the probability.

This table helps to simulate a Cumulative Distribution Function directly computed from the real data.

Plot the computed probability on a chart and check that the curve can be approximated by an exponential distribution Cumulative Distribution Function.

This chart provides a lot of information. For example, we can see that the probability a ticket is resolved within three days is 80%.

Step 3 – Compute Control Limits

Return to the previous sheet containing the original data. Compute the average resolution time.

You may also use the standard AVERAGE() function of Excel.

We can get an estimation of λ by computing:

We are using a property of exponential distributions stating that the maximum likelihood estimate for the rate parameter λ is the inverse of the average.

Now that we have found the average resolution time, we can define the control limits as:

  • LCL: 0 (resolution time cannot be negative)
  • UCL: 5.9 x 2.62 = 15.5 days

Step 4 – Draw Control Chart

Create a chart that includes the UCL computed previously. In this tutorial, tickets are ordered by resolution date.

Annotate your chart to show which tickets are above UCL.

Step 5 – Reflect and Adjust

Assuming that we can approximate the probability of resolution time by an exponential distribution with λ = 1/2.62 = 0.38 tickets/day, ticket GTS-1104 had only a probability of 0.3% to be above UCL = 15.5 days. Therefore, we can consider that the resolution time of this ticket is “out-of-process”.

Perform root-cause analysis to find the special causes for these tickets. Can you prevent these special causes? What is the probability these causes appear another time in the future? Can you devise countermeasures? If yes, use a PDCA process to implement them properly.

About these ads
This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

3 Responses to Control Charts for Exponential Distributions

  1. azheglov says:

    I believe you overlooked the data-mining adjustment. The p-value of 0.003 for the ticket #1104 would be statistically significant if you only had that one item. But you “mined” it from a set of 50 items, so the probability that one of them would take that long is:

    P_adj = 1 – (1-P)^N

    In this case, the adjusted p-value is about 0.14. If you had 1000 items to mine from, your chance of encountering such item would increase from 14% to about 95%.

    A coin-toss sequence provides a simple model explaining this phenomenon. If you ask someone to toss a coin 10 times, the probability of heads (or tails) 10 times in a row is 1/1024, indicating an assignable cause (to their skill or to a coin defect). But if you ask 10000 people to do the same, the chance that someone will do it is nearly 100%, with obviously no skill involved.

    • Thank you for your insights! I will try to understand how p-value is used in the context of normal distribution-based statistical process control and see if I can improve my proposal about exponential distributions.

      • azheglov says:

        p-value is simply the items’ probability to be more deviant than the item you’re looking at. For example, if they’re normally distributed and the item you’re looking at is at one-sigma, its p-value is 0.16; if it’s at two-sigma, its p-value is 0.023, etc. Similar probabilities can be calculated for the exponential and other distributions. The adjustment part – where I adjusted it for the size of the set the item was selected from – is independent of the distribution type.

        But regardless of all this math, when the time comes to have a retrospective, the item #1104 looks like one of the things to talk about :-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s