This is the first blog post in a series of six that deals with mathematics for calculation of correlation and trend in data with outliers. The posts are numbered 1 to 6. They should be read consecutively. This first post is just an introduction.
Post 1 Introduction to Statistical analysis of data with outliers
Calculate correlation when outliers in the data.
Calculate trend when outliers in the data.
Correlation and trend when an outlier is added. Example.
Compare Kendall-Theil and OLS trends. Simulations.
Detect serial correlation when outliers. Simulations.
Start of post 1 Introduction to Statistical analysis of data with outliers
Five blog posts in June 2014 deal with the mathematics that is most commonly used when analysing global temperature series. That mathematics is not well suited when there are large outliers in the data. The first blog post in that series
gives an overview of those five posts.
Ordinary least square (OLS)
error mathematics is the most commonly used method to calculate trends. It is based on data values
, and it therefore performs poorly when there are large outliers in the data. Global temperatures do not have large outliers due to both the inertia in the global climate system and due to the thorough processing before the temperature data is released. Other climate data, such as precipitation, snow depth and skiing conditions at specific locations, have large outliers, and the OLS mathematics is not suitable for those data.
The calculation of the Pearson correlation coefficient
is also based on data values
. This is the most commonly used method to calculate correlation between variables. It too performs poorly when there are large outliers in the data.
Mathematics based on data ranks
performs better than mathematics based on data values
when analysing data with large outliers. In this series of blog posts I will describe the rank mathematics which I use to calculate the Kendall tau-b correlation coefficient
and the Kendall-Theil robust trend line
. For comparison I also shortly describe the Pearson and the OLS mathematics.
As will be seen, the mathematics that is used to calculate the Kendall tau-b correlation coefficient and the Kendall-Theil robust trend line is rather simple and easy to explain. But the mathematics that is used to quantify their uncertainties, which are p-values and confidence intervals, is more complicated.