An outlier is a numeric data that is significantly different from other data in a sample. This term is used in statistical studies, and can indicate anomalies in the data studied or errors in measurements. Knowing how to deal with outliers is important to ensure adequate understanding of the data, and will allow for more accurate conclusions from the study. There is a fairly simple procedure that allows you to calculate outliers in a given set of values.
Steps
Step 1. Learn to recognize potential outliers
Before calculating whether a certain numeric value is an outlier, it is helpful to review the data set and choose the potential outliers. For example, consider a set of data representing the temperature of 12 different objects in the same room. If 11 of the objects have a temperature in a certain temperature range near 21 degrees Celsius, but the twelfth object (possibly an oven) has a temperature of 150 degrees Celsius, a superficial examination could lead to the conclusion that the oven temperature measurement is a potential outlier.
Step 2. Arrange the numerical values in ascending order
Continuing with the previous example, consider the following set of numbers representing the temperatures of some objects: {21, 20, 23, 20, 20, 19, 20, 22, 21, 150, 21, 19}. This set should be ordered as follows: {19, 19, 20, 20, 20, 20, 21, 21, 21, 22, 23, 150}.
Step 3. Calculate the median of the dataset
The median is the number above which half of the data lies, and below which the other half lies. If the set has even cardinality, the two intermediate terms must be averaged. In the above example, the two intermediate terms are 20 and 21, so the median is ((20 + 21) / 2), i.e. 20, 5.
Step 4. Calculate the first quartile
This value, called Q1, is the number below which 25 percent of the numeric data lies. Referring again to the example above, also in this case it will be necessary to average between two numbers, in this case it is 20 and 20. Their average is ((20 + 20) / 2), ie 20.
Step 5. Calculate the third quartile
This value, called Q3, is the number above which 25 percent of the data lies. Continuing with the same example, averaging the 2 values 21 and 22 yields a Q2 value of 21.5.
Step 6. Find the "inner fences" for the dataset
The first step is to multiply the difference between Q1 and Q3 (called the interquartile gap) by 1, 5. In the example, the interquartile gap is (21.5 - 20), i.e. 1, 5. Multiplying this gap by 1, 5 you get 2, 25. Add this number to Q3 and subtract it from Q1 to build the inner fences. In our example, the inner fences would be 17, 75 and 23, 75.
Any numerical data that lies outside this range is considered a slightly anomalous value. In our example set of values, only the oven temperature, 150 degrees, is considered a mild outlier
Step 7. Find the "outer fence" for the set of values
You can find them with exactly the same procedure you used for inner fences, except that the interquartile range is multiplied by 3 instead of 1.5. Multiplying the interquartile range obtained in our example by 3 you get (1.5 * 3) 4, 5. The outer fences are therefore 15, 5 and 26.