Show the code
plot_distributions(dist = 'norm',
xvals = c(-1, 0, 0.5),
xmin = -4,
xmax = 4)
The quantile-quantile plot (Q-Q plot) is a graphical method for comparing two distributions. This is done by comparing the quantiles1 between the two distributions. Q-Q plots are often used to determine whether the distribution of a sample approximates the normal distribution, though it should be noted that it can be used for any type of distribution. To illustrate how the Q-Q plot is constructed, let’s begin with three populations with a uniform, skewed, and normal distribution.
Let’s start with a small sample with n = 6.
[1] 6.605251 2.147276 6.244270 6.036563 6.184984 4.255935
If we wish to determine if our sample approximates the normal distribution, we randomly select values (using n = 6) from the normal distribution using the rnorm
function.
[1] -0.1160746 -0.8244852 -0.4644825 0.2672684 0.6679687 -0.6274922
The Q-Q plot plots the theoretical quantiles (using the rnorm
function in this example) against the sample quantiles. We take the two vectors, sort them, then combine them such that the smallest value from the sample is paired with the smallest value from the theoretical distribution (i.e. rnorm
here), all the way to the largest values.
samp1 samp_norm
[1,] 2.147276 -0.8244852
[2,] 4.255935 -0.6274922
[3,] 6.036563 -0.4644825
[4,] 6.184984 -0.1160746
[5,] 6.244270 0.2672684
[6,] 6.605251 0.6679687
The following scatter plot depicts the foundation of the Q-Q plot. If the two distributions are the same then we would expect all the points to fall on a straight line.
It is desirable to draw a straight line for reference, however determining the slope and intercept requires some work. If both the sample and comparison distributions are expressed as standard scores (i.e. mean = 0, standard deviation = 1), then we can draw the unit line (i.e. \(y = x\)).
However, if we wish to retain the scaling of the original sample, we need an alternative strategy to determine the slope and intercept of the line. One common approach is to take the paired quantiles at the 25th and 75th percentiles and then calculate the equation of the line that intercepts those two points.
25% 75%
4.701092 6.229449
25% 75%
-0.5867398 0.1714326
An alternative variation to the Q-Q plot line would be to plot a Loess regression line. If the two distributions are the same then the Loess regression line should be a straight line. With very small samples as demonstrated below, however, the Loess estimation is not very good.
Let’s draw random samples of n = 50 from the three populations defined above.
The following three figures are Q-Q plots comparing the sample distributions to the normal distribution using the gg_qq_plot
function in the VisualStats
package. This is slight variation on the traditional Q-Q plot by also plotting the marginal distributions.
Although the most common use of Q-Q plots is to determine whether a sample distribution approximates the normal distribution, technically the Q-Q plot allows for the comparison between any two distributions. For example, the skewed distribution example created above was randomly selected from a chi-squared distribution. The theoretical_dist
parameter of the gg_qq_plot
function allows you to change the comparison distribution.
Quantiles are simply cut points from a distribution such that the intervals have equal probabilities.↩︎