Post 1 Linear regression analysis
Post 2 Hypothesis testing of temperature trends
Post 3 Confidence intervals around temperature trend lines
Post 4 Statistical power of temperature trends
Post 5 Piecewise linear regression applied to temperature trends
The posts are gathered in this pdf document.
Start of post 3, Confidence intervals around temperature trend lines:
Figure 3.1 shows the monthly temperatures in the last 30 years as blue dots. The solid red line shows the temperature trend in these 30 years.
|Figure 3.1: Monthly temperatures from January 1984 to December 2013 with trend line|
The 95% confidence interval around an estimated value has a 95% likelihood of covering the true value. The upper endpoint of the confidence interval has a 97.5% likelihood of exceeding the true value, and the lower endpoint has a 97.5% likelihood of being less than it.
The red regression line in Figure 3.1 may be regarded as a model. It may be used in two different ways. One way is to estimate the most likely temperature at a given time. The red dotted lines show the 95% confidence interval around this estimation. Another way is to predict a measurement at a given time. The blue dotted lines show the 95% confidence interval around this prediction. It is wider than the confidence interval for the estimate because it also includes the uncertainty of the measurement that is being predicted.
Many trend calculators on the internet calculate trend lines based on monthly temperatures. The SkS trend calculator is one of these. It estimates exactly the same values as the programs behind Figure 3.1 do. The confidence interval plotted by the SKS trend calculator is the same as the one plotted with the dotted red lines in Figure 3.1.
The temperatures after the turn of the millennium, when analyzed separately, have not increased as much as they did in the decades before. We will now check if there really has been a change in the long term temperature trend.
The trend line in Figure 3.1 is a very simple model of the global surface temperatures. The model can be used for crude predictions of the the temperature in the years to come. Instead of predicting future temperatures, which we do not know, we can do a thought experiment. Imagine that we are in the beginning of 1998 and that we want to predict the temperatures in the 16 years ahead. The black dots in Figure 3.2 are the monthly temperatures in the preceding 30 years, and the black line is the trend calculated with linear regression analysis for these 30 years. The red line is an extrapolation of the trend line, and it may be used to predict the temperatures in the years ahead, which in the thought experiment are between 1998 and 2013. The red dotted lines show the 95% confidence interval for these predictions.
|Figure 3.2: Temperatures in the last 16 years compared to an extension of the trend in the preceding 30 years|
Tamino did a similar analysis in January 2014. It inspired me to write programs to do this analysis and to generate Figure 3.2.
The uncertainty of an estimation is often expressed as a 95% confidence interval. We use the general expression p x 100% for the confidence interval. p is 0.95 for 95% confidence intervals.
The term t(1+p)/2 is used in the next equations. For 95% confidence intervals it is t0.975, which is the 97.5% quantile of the t-distribution.
The standard error of the regression SEregression is an estimate of σE, see (2.1) in post 2. SXX provides a measure of both the number of measurements and of the length of the interval which the trend is calculated over, see (2.2). Both σE and SXX are used in the next equations.
The confidence interval of the slope is
The confidence interval of an yi estimate is
(3.2) is used to calculate the 95% confidence interval shown with the dotted red lines in Figure 3.1.
The confidence interval of an yi measurement is
(3.3) is used to calculate the 95% confidence interval shown with the dotted blue lines in Figure 3.1. Statistically we expect 95% of the measurements to be within this confidence interval.
We compensate for autocorrelation in the monthly temperatures in (3.1) to (3.3) in the same way as we did when we estimated the 1-sigma uncertainty of the slope in (2.5).
The confidence intervals in (3.1) to (3.3) become narrower when the number of independent measurements increase. Se more details in the explanation to Figure 4.4 in post 4.
References for the mathematics
Hans von Storch, Francis W. Zwiers. 2001. Statistical Analysis in Climate Research is our main reference. Chapter 5.4 'Interval estimators' explains confidence intervals in general, and chapter 8.3 'Fitting and Diagnosing Simple Regression Models' applies this to trend analysis. The formulas in 8.3.10 are used for the confidence interval of estimations, and the formulas in chapter 8.3.11 are used for predictions. The formulas are modified to compensate for the autocorrelation in the monthly temperatures, as recommended in the Methods appendix in Foster and Rahmstorf (2011).
Derek S. Young. May 2014. PSU course STAT 501 Regression Methods explains confidence intervals well in Part I chapter 3.1 'Hypothesis testing and Confidence Intervals'.