The correlation coefficient, denoted by “r”, is the measure of the linear correlation (the relationship, in terms of both strength and direction) between two variables. It ranges from -1 to +1, with plus and minus signs used to represent positive or negative correlation. If the correlation coefficient is exactly -1, then the relationship between the two variables is a completely negative fit; if the correlation coefficient is exactly +1, then the relationship between the two variables is a completely positive fit. Otherwise, two variables can have a positive correlation, a negative correlation, or no correlation. If you need to find the correlation coefficient, go to Step 1.
Steps
Part 1 of 2: Understanding the Basics
Step 1. Understand the concept of correlation
Correlation refers to the statistical relationship between two quantities. Statisticians often use the correlation coefficient to measure the dependence between two or more variables.
Step 2. Figure out how to find an average
The arithmetic mean, or “mean”, of a data set is calculated by adding all the data values together, and then dividing by the number of values.
The mean of a variable is indicated with the variable with a horizontal line above it
Step 3. Note the importance of the standard deviation
In statistics, the standard deviation measures variations, showing how the numbers are spread in relation to the mean.
Mathematically, the standard deviation is expressed as Sx, Sy, and so on (Sx is the standard deviation of x, Sy the standard deviation of y, etc.)
Step 4. Recognize the summation notation
The summation operator is one of the most common operators in mathematics and indicates the sum of the values. It is represented with the Greek capital letter sigma, or ∑.
Step 5. Learn the basic formula for finding the correlation coefficient
The formula for calculating the correlation coefficient uses means, standard deviations, and the number of pairs in your dataset (represented by n). It appears as in the figure.
Part 2 of 2: Finding the Correlation Coefficient
Step 1. Collect the data
To calculate a correlation coefficient, first look at your data pairs. It is useful to put them in a table.
For example, let's say you have four pairs of data for x and y. The table will look as shown in the figure
Step 2. Calculate the mean of x
To calculate the average, you need to add all the values of x, then divide by the number of values, using the following formula:
Using the previous example, note that you have four values for x. To calculate the average, add all the values given by x, and then divide by 4. Your calculations will look like the figure shown
Step 3. Find the mean of y
To find the mean of y, follow the same steps, adding all the y values together, then dividing by the number of values:
In the previous example, you have four values for y. Add all these values, then divide by 4. Your calculations must look like those shown in the figure
Step 4. Determine the standard deviation of x
Once you have your means, you can calculate the standard deviation. To do this, use the following formula:
- In the example above, your calculations must have the appearance shown in the figure.
- Note that the part of the equation that refers to X i - the average of x is calculated by subtracting the average from each value of x present in your table.
Step 5. Calculate the standard deviation of y
Using the same basic steps, find the standard deviation of y. Use the following formula:
- In the previous example, your calculations will look as shown in the figure.
- Note, again, that the part of the equation that refers to Y i - the mean of y is valued by subtracting the mean from each value of y present in your table.
Step 6. Find the correlation coefficient
You now have the means and standard deviations for your variables, so you can proceed to use the formula for the correlation coefficient. Remember that n represents the number of values you have. You have already obtained the information you need in the previous steps.
In the previous example, you will enter your data in the formula for the correlation coefficient and calculate as shown in the figure. Your correlation coefficient is therefore 0.989949. Notice that this number is very close to +1, so you have a completely positive correlation
Advice
- The correlation coefficient is also called the "Pearson Correlation Index" in honor of its creator, Karl Pearson.
- In general, a correlation coefficient greater than 0.8 (both positive and negative) represents a strong correlation; a correlation coefficient less than 0.5 (both positive and negative) represents a weak one.