Control Charts with qicharts for R

Types of control charts

The control limits, also called sigma limits, are usually placed at \(\pm3\) standard deviations from the centre line. The standard deviation is the estimated standard deviation of the common cause variation in the process of interest, which depends on the theoretical distribution of data. It is a beginner’s mistake to simply calculate the standard deviation of all the data points, which would include both the common and special cause variation.

Since the calculations of control limits depend on the type of data many types of control charts have been developed for specific purposes.

The qicharts package employs a handful of the classic Shewhart charts for measure and count data plus a couple of rare events charts. Together these charts cover the majority of control chart needs of healthcare quality improvement and control.

The formulas for calculation of control limits can be found in Montgomery 2009 and Provost 2011.

C chart for count of defects

To demonstrate the use of C, U and P charts for count data we will create a data frame mimicking the weekly number of hospital acquired pressure ulcers at a hospital that, on average, has 300 patients with an average length of stay of four days.

# Setup parameters
m.beds       <- 300
m.stay       <- 4
m.days       <- m.beds * 7
m.discharges <- m.days / m.stay
p.pu         <- 0.08

# Simulate data
discharges  <- rpois(24, lambda = m.discharges)
patientdays <- round(rnorm(24, mean = m.days, sd = 100))
n.pu        <- rpois(24, lambda = m.discharges * p.pu * 1.5)
n.pat.pu    <- rbinom(24, size = discharges, prob = p.pu)
week        <- seq(as.Date('2014-1-1'),
                    length.out = 24, 
                    by         = 'week') 

# Combine data into a data frame
d <- data.frame(week, discharges, patientdays,n.pu, n.pat.pu)
d

##          week discharges patientdays n.pu n.pat.pu
## 1  2014-01-01        554        2271   58       47
## 2  2014-01-08        529        2172   68       39
## 3  2014-01-15        542        2148   70       31
## 4  2014-01-22        538        1943   71       29
## 5  2014-01-29        502        2132   66       37
## 6  2014-02-05        511        2117   63       42
## 7  2014-02-12        541        2010   69       49
## 8  2014-02-19        527        2108   61       47
## 9  2014-02-26        523        2116   69       37
## 10 2014-03-05        515        2154   67       51
## 11 2014-03-12        530        2170   67       41
## 12 2014-03-19        532        2132   62       42
## 13 2014-03-26        528        2211   79       53
## 14 2014-04-02        521        2177   58       47
## 15 2014-04-09        552        2215   67       51
## 16 2014-04-16        508        2226   64       35
## 17 2014-04-23        492        2170   69       35
## 18 2014-04-30        516        2143   58       28
## 19 2014-05-07        470        2008   67       41
## 20 2014-05-14        538        2038   61       51
## 21 2014-05-21        527        2013   73       51
## 22 2014-05-28        546        1936   60       41
## 23 2014-06-04        518        1967   67       43
## 24 2014-06-11        518        2011   62       28

Each of the data frame’s 24 rows contains information for one week on the number of discharges, patient days, pressure ulcers, and number of discharged patients with one or more pressure ulcers. On average, 8% of discharged patients have 1.5 hospital acquired pressure ulcers.

Traditionally, the term “defect” has been used to name whatever it is one is counting with control charts. There is a subtle but important distinction between counting defects, e.g. number of pressure ulcers, and counting defectives, e.g. number of patient with one or more pressure ulcers.

Defects are expected to reflect the poisson distribution, while defectives reflect the binomial distribution.

The correct control chart on the number of pressure ulcers is the C chart, which is based on the poisson distribution.

qic(n.pu,
    x     = week,
    data  = d,
    chart = 'c',
    main  = 'Hospital acquired pressure ulcers (C chart)',
    ylab  = 'Count',
    xlab  = 'Week')

Figure 3: C chart displaying the number of defects

Figure 3 shows that the average weekly number of hospital acquired pressure ulcers is 66 and that anything between 41 and 90 would be within the expected range.

U chart for rate of defects

The U chart is different from the C chart in that it accounts for variation in the area of opportunity, e.g. the number of patients or the number of patient days, over time or between units one wishes to compare. If there are many more patients in the hospital in the winter than in the summer, the C chart may falsely detect special cause variation in the raw number of pressure ulcers.

The U chart plots the rate of defects. A rate differs from a proportion in that the numerator and the denominator need not be of the same kind and that the numerator may exceed the denominator. For example, the rate of pressure ulcers may be expressed as the number of pressure ulcers per 1000 patient days.

qic(n.pu, 
    n        = patientdays,
    x        = week,
    data     = d,
    chart    = 'u',
    multiply = 1000,
    main     = 'Hospital acquired pressure ulcers (U chart)',
    ylab     = 'Count per 1000 patient days',
    xlab     = 'Week')

Figure 4: U chart displaying the rate of defects

Figure 4 displays the number of pressure ulcers per 1000 patient days. Note that the control limits vary slightly. This happens when the numerator (area of opportunity) differs between subgroups. The larger the numerator, the narrower the control limits.

P chart for proportion of defective units

The P chart is probably the most common control chart in healthcare. It is used to plot the proportion (or percent) of defective units, e.g. the proportion of patients with one or more pressure ulcers. As mentioned, defectives are modelled by the binomial distribution.

qic(n.pat.pu,
    n        = discharges,
    x        = week,
    data     = d,
    chart    = 'p',
    multiply = 100,
    main     = 'Hospital acquired pressure ulcers (P chart)',
    ylab     = 'Percent patients',
    xlab     = 'Week')

Figure 5: P chart displaying the percent of defectives

In theory, the P chart is less sensitive to special cause variation than the U chart because it discards information by dichotomising inspection units (patients) in defectives and non-defectives ignoring the fact that a unit may have more than one defect (pressure ulcers).

On the other hand, the P chart often communicates better. For most people, not to mention the press, the percent of harmed patients is easier to grasp than the the rate of pressure ulcers expressed in counts per 1000 patient days.

Luckily, one does not have to choose between C, U and P charts. One can do them all at the same time.

G chart for units produced between defective units

When defects or defectives are rare and the subgroups are small, C, U, and P charts become useless as most subgroups will have no defects. If, for example, 8% of discharged patients have a hospitals acquired pressure ulcer and the average weekly number of discharges in a small department is 10, we would, on average, expect to have less than one pressure ulcer per week.

Instead we could plot the number of discharges between each discharge of a patient with one or more pressure ulcers. The number of units between defectives is modelled by the geometric distribution.

# Create vector of random values from a geometric distribution
d <- c(NA, rgeom(23, 0.08))
d

##  [1] NA 30 16  1  3  8  5  1 17  5  8 12  0  3 11 11 13  0  1  7  5  3  5  2

# Plot G chart
qic(d,
    chart = 'g',
    main  = 'Patients between pressure ulcers (G chart)',
    ylab  = 'Count',
    xlab  = 'Discharge no.')

Figure 6: G chart displaying the number of units produced between defectives

Figure 6 displays a G chart mimicking 24 discharged patient with pressure ulcers. The indicator is the number of discharges between each of these. Note that the first patient with pressure ulcer is missing from the chart since, we do not know how many discharges there had been since the previous patient with pressure ulcer.

The centre line of the G chart is the theoretical median of the distribution (\(mean \times 0.693\)). This is because the geometric distribution is highly skewed, thus the median is a better representation of the process centre to be used with the runs analysis.

Also note that the G chart rarely has a lower control limit.

An alternative to the G chart is the T chart for time between defects, which we will come back to later.

I and MR charts for individual measurements

In healthcare, which, you may have guessed, is my domain, most quality data are count data. However, from time to time I stumble across measure data, often in the form of physiological parameters or waiting times.

Figure 7 is an I chart of birth weights from 24 babies.

# Vector of birth weights from 24 babies
y <- round(rnorm(24, mean = 3400, sd = 400))
y

##  [1] 2634 2464 3593 3871 2883 3646 3502 3691 3930 3459 3161 4277 4331 3854 3324
## [16] 3590 3182 3822 3572 2672 3123 2627 3884 3288

# Plot I chart of individual birth weights
qic(y,
    chart = 'i',
    main  = 'Birth weight (I chart)',
    ylab  = 'Grams',
    xlab  = 'Baby no.')

Figure 7: I chart for individual measurements

I charts are often accompanied by moving range (MR) charts, which show the absolute difference between neighbouring data points.

# Plot moving ranges
qic(y,
    chart = 'mr',
    main  = 'Pairwise differences in birth weights (MR chart)',
    ylab  = 'Grams',
    xlab  = 'Baby no.')

Figure 8: Moving range chart

The purpose of the MR chart is to identify sudden changes in the (estimated) within subgroup variation. If any data point in the MR is above the upper control limit, one should interpret the I chart very cautiously.

Xbar and S charts for average measurements

If there is more than one measurement in each subgroup the Xbar and S charts will display the average and the within subgroup standard deviation respectively.

# Vector of 24 subgroup sizes (average = 12)
sizes <- rpois(24, 12)

# Vector of dates identifying subgroups
date <- seq(as.Date('2015-1-1'), length.out = 24, by = 'day')
date <- rep(date, sizes)

# Vector of birth weights
y <- round(rnorm(sum(sizes), 3400, 400))

# Data frame of birth weights and dates
d <- data.frame(y, date)
head(d, 24)

##       y       date
## 1  3589 2015-01-01
## 2  3176 2015-01-01
## 3  3897 2015-01-01
## 4  3394 2015-01-01
## 5  3083 2015-01-01
## 6  3239 2015-01-01
## 7  2641 2015-01-01
## 8  3789 2015-01-01
## 9  3194 2015-01-02
## 10 3406 2015-01-02
## 11 3296 2015-01-02
## 12 4009 2015-01-02
## 13 2811 2015-01-02
## 14 3393 2015-01-02
## 15 3410 2015-01-02
## 16 3399 2015-01-02
## 17 3226 2015-01-02
## 18 3238 2015-01-02
## 19 3462 2015-01-02
## 20 3011 2015-01-02
## 21 4019 2015-01-02
## 22 3252 2015-01-02
## 23 4185 2015-01-02
## 24 3156 2015-01-03

# Plot Xbar chart of average birth weights by date of birth
qic(y, 
    x     = date, 
    data  = d,
    chart = 'xbar',
    main  = 'Average birth weight (Xbar chart)',
    ylab  = 'Grams',
    xlab  = 'Date')

Figure 9: Xbar chart of average measurements

# Plot S chart of within subgroup standard deviation
qic(y, 
    x = date, 
    data = d,
    chart = 's',
    main = 'Standard deviation of birth weight (S chart)',
    ylab = 'Grams',
    xlab = 'Date')

Figure 10: S chart of within subgroup standard deviations

T chart for time between events

Like the G chart, the T chart is a rare event chart. But instead of displaying the number of cases between events (defectives) it displays the time between events. And since time is a continuous variable it belongs with the other charts for measure data.

# Pick 24 random dates and sort them
dates  <- seq(as.Date('2015-1-1'), as.Date('2015-12-31'), by = 'day')
events <- sort(sample(dates, 24))
events

##  [1] "2015-01-15" "2015-02-23" "2015-03-28" "2015-04-22" "2015-05-15"
##  [6] "2015-05-25" "2015-05-26" "2015-06-16" "2015-07-07" "2015-07-08"
## [11] "2015-07-17" "2015-08-03" "2015-08-09" "2015-08-10" "2015-08-18"
## [16] "2015-09-03" "2015-09-30" "2015-10-11" "2015-10-27" "2015-11-04"
## [21] "2015-11-12" "2015-11-21" "2015-12-04" "2015-12-09"

# Vector of time (days) between events
d <- c(NA, diff(events))
d

##  [1] NA 39 33 25 23 10  1 21 21  1  9 17  6  1  8 16 27 11 16  8  8  9 13  5

# Plot T chart of days between events
qic(d,
    chart = 't',
    main  = 'Days between pressure ulcers (T chart)',
    ylab  = 'Days',
    xlab  = 'Pressure ulcer no.')

Figure 11: T chart displaying time between events

Standardised control charts

If one does not like the wavy control lines in U, P, Xbar and S charts, one can do a standardised chart, which turns the indicator into a Z score by subtracting the mean from each value and dividing by the standard deviation. The standardised chart has fixed control limits at \(\pm3\) and a centre line at 0.

# Rebuild data frame from figure 5
d <- data.frame(n.pat.pu, discharges, week)

# Plot standardised P chart
qic(n.pat.pu, 
    n            = discharges,
    x            = week, 
    data         = d,
    chart        = 'p',
    standardised = TRUE,
    main         = 'Patients with hospital acquired pressure ulcers (Standardised P chart)',
    ylab         = 'Standard deviations',
    xlab         = 'Week')

Figure 12: Standardised P chart

The standardised chart shows the same information as its not-standardised peer, but the straight control lines may appear less confusing. On the other hand, one looses the original units of data, which may make the chart harder to interpret.

Prime control charts

Sometimes, with very large subgroups, the control limits of U and P charts seem much too narrow leaving almost all data points outside of common cause variation. This may be an artefact caused by the fact that the “true” common cause variation in data is greater than that predicted by the poisson or binomial distribution. This is called overdispersion. In theory, overdispersion will often be present in real life data but only detectable with large subgroups where point estimates become very precise.

Laney proposed a solution to this problem that incorporates the between subgroup variation (Laney 2002).

# Plot prime P chart
qic(n.pat.pu, discharges, week, d,
    chart    = 'p',
    multiply = 100,
    main     = 'Prime P chart of patients with pressure ulcer',
    ylab     = 'Percent',
    xlab     = 'Week',
    prime    = TRUE)

Figure 13: Prime P chart

Figure 13 is a prime P chart of the same data as in figure 5. The control limits are slightly wider.

In an interview Laney says that there is no reason not always to use prime charts. However, Provost and Murray (Provost 2011) suggest to use prime charts only for very large subgroups (N > 2000) when all other explanations for special cause variation have been examined.

Control Charts with qicharts for R

Jacob Anhoej

2025-04-03

Introduction

Control chart basics