onsdag 18. juni 2014

Confidence intervals around temperature trend lines

This is the third blog post in a series of five that analyse trends in the global surface temperatures. The posts put emphasis on the mathematics and the statistics used in the analyses. The posts are numbered 1 to 5. They should be read consecutively.

Post 1    Linear regression analysis
Post 2    Hypothesis testing of temperature trends
Post 3    Confidence intervals around temperature trend lines
Post 4    Statistical power of temperature trends
Post 5    Piecewise linear regression applied to temperature trends

The posts are gathered in this pdf document.

Start of post 3, Confidence intervals around temperature trend lines:

Figure 3.1 shows the monthly temperatures in the last 30 years as blue dots. The solid red line shows the temperature trend in these 30 years.
Figure 3.1: Monthly temperatures from January 1984 to December 2013 with trend line
The red line is a “best fit” to the blue dots. The slope and the intersection with the vertical Y axis are estimated with linear regression analysis. The slope is defined with its value and its uncertainty, both with units °C/year. The uncertainty is decided by both the length of the interval which the trend is calculated over and by the noise on the temperatures. Long intervals give low uncertainty, and much noise gives high uncertainty. The uncertainty is usually specified with its 1-sigma value σ. See more details in post 1.

The 95% confidence interval around an estimated value has a 95% likelihood of covering the true value. The upper endpoint of the confidence interval has a 97.5% likelihood of exceeding the true value, and the lower endpoint has a 97.5% likelihood of being less than it.

The red regression line in Figure 3.1 may be regarded as a model. It may be used in two different ways. One way is to estimate the most likely temperature at a given time. The red dotted lines show the 95% confidence interval around this estimation. Another way is to predict a measurement at a given time. The blue dotted lines show the 95% confidence interval around this prediction. It is wider than the confidence interval for the estimate because it also includes the uncertainty of the measurement that is being predicted.


Many trend calculators on the internet calculate trend lines based on monthly temperatures. The SkS trend calculator is one of these. It estimates exactly the same values as the programs behind Figure 3.1 do. The confidence interval plotted by the SKS trend calculator is the same as the one plotted with the dotted red lines in Figure 3.1.

The temperatures after the turn of the millennium, when analyzed separately, have not increased as much as they did in the decades before. We will now check if there really has been a change in the long term temperature trend.

The trend line in Figure 3.1 is a very simple model of the global surface temperatures. The model can be used for crude predictions of the the temperature in the years to come. Instead of predicting future temperatures, which we do not know, we can do a thought experiment. Imagine that we are in the beginning of 1998 and that we want to predict the temperatures in the 16 years ahead. The black dots in Figure 3.2 are the monthly temperatures in the preceding 30 years, and the black line is the trend calculated with linear regression analysis for these 30 years. The red line is an extrapolation of the trend line, and it may be used to predict the temperatures in the years ahead, which in the thought experiment are between 1998 and 2013. The red dotted lines show the 95% confidence interval for these predictions.

Figure 3.2: Temperatures in the last 16 years compared to an extension of the trend in the preceding 30 years
The green dots are the monthly temperatures measured in the 16 years after January 1998. They have no influence on neither the trend line nor the confidence interval. We see that 9 of the monthly temperatures are warmer than the upper limit of the confidence interval and that none of them are colder than the lower limit. If the simple model were correct, we would expect approximately 5 of the temperatures to be on each side of the confidence interval. We also see that most of the monthly temperatures are warmer than predicted (132 are warmer and only 60 are colder). This simple analysis shows that the temperatures after 1998 have increased at least as much as they did in the 30 preceding years.

Tamino did a similar analysis in January 2014. It inspired me to write programs to do this analysis and to generate Figure 3.2.

Mathematics

The uncertainty of an estimation is often expressed as a 95% confidence interval. We use the general expression p x 100% for the confidence interval. p is 0.95 for 95% confidence intervals.

The term t(1+p)/2 is used in the next equations. For 95% confidence intervals it is t0.975, which is the 97.5% quantile of the t-distribution.

The standard error of the regression SEregression is an estimate of σE, see (2.1) in post 2. SXX provides a measure of both the number of measurements and of the length of the interval which the trend is calculated over, see (2.2). Both σE and SXX are used in the next equations.

The confidence interval of the slope is

The confidence interval of an yi estimate is

(3.2) is used to calculate the 95% confidence interval shown with the dotted red lines in Figure 3.1.

The confidence interval of an yi measurement is

(3.3) is used to calculate the 95% confidence interval shown with the dotted blue lines in Figure 3.1. Statistically we expect 95% of the measurements to be within this confidence interval.

We compensate for autocorrelation in the monthly temperatures in (3.1) to (3.3) in the same way as we did when we estimated the 1-sigma uncertainty of the slope in (2.5).

The confidence intervals in (3.1) to (3.3) become narrower when the number of independent measurements increase. Se more details in the explanation to Figure 4.4 in post 4.

References for the mathematics

Hans von Storch, Francis W. Zwiers. 2001. Statistical Analysis in Climate Research is our main reference. Chapter 5.4 'Interval estimators' explains confidence intervals in general, and chapter 8.3 'Fitting and Diagnosing Simple Regression Models' applies this to trend analysis. The formulas in 8.3.10 are used for the confidence interval of estimations, and the formulas in chapter 8.3.11 are used for predictions. The formulas are modified to compensate for the autocorrelation in the monthly temperatures, as recommended in the Methods appendix in Foster and Rahmstorf (2011).

Derek S. Young. May 2014. PSU course STAT 501 Regression Methods explains confidence intervals well in Part I chapter 3.1 'Hypothesis testing and Confidence Intervals'.


Previous and Next post in the series


Ingen kommentarer:

Legg inn en kommentar