Correlation and Regression
Correlation is the measure of two variables normally being x and y. X and y are normally variables of a bivariate distribution. Bivariate distribution (2 variables) for each unit observed having 2 separate and distinct measurements. If the value of one variable is related to the value of another, they are said to be correlated.
There are five degrees of correlation that we use; 1. Positive – line goes in upwards direction 2. Negative – line goes in downward direction 3. Perfect – all of the pairs of values will lie on a straight line 4. Partial – values form a pattern or a trend on a graph 5. No correlation – no pattern
When showing the workings out of correlation this can be done numerically but the most common and simplest way is by scatter graph. It can only work when one variable is dependant upon the other for example salary spends is dependent on hours worked during a week. As these variables change they will change the plots on the graph. The dependant value will be shown on the y axis and the independent value shown on the x axis.
Coefficient Correlation
The most common method used to measure coefficient of correlation is Pearson’s product moment as it uses quantitative data. When we use Pearson’s method R = correlation coefficient. R must always fall between -1 and +1 as discussed before. Again as discussed before +1 is perfect positive correlation and -1 is perfect negative correlation; an R=0 is no correlation.
The scatter graph that is produced shows the bivariate distribution which will indicate if there is a pattern association between the two variables. This pattern of association is measured by the coefficient of correlation; this doesn’t measure the nature of the relationship and doesn’t indicate casual links.
When the co-efficient of correlation is worked out it will take a numeric value +1 or less. When the correlation is closer to one the co-efficient has a stronger association. When the co-efficient is closer to zero then the association will be very weak. If the co-efficient is zero then there is no correlation at all and the reverse of this is when the co-efficient is 1 it equals a perfect association. The direction of association relates to the movement of the two variables, if they move in the same direction the coefficient is positive with a value between 0 and +1, if the variables move in reverse the coefficient has a negative value between 0 and -1.
The scatter graph that I have produced shows that the correlation in this case to be near perfect and positive as all values lie on a straight upward line. The coefficient of correlation as worked out equals 0.987289 which is near 1 as stated producing a very strong association. As the direction is positive the variables will move in the same direction and again the value is closer to 1 than 0. In the scatter graph what we can see straight away is that the older the age of the aircraft the more costly it is to maintain yearly.
Regression
Regression predicts the value of a variable from the values of another variable depends on casual forecasting. High correlation between two variables enables casual forecasting if the relationship appears reasonably likely i.e. expected rise in sales levels from increased levels of advertising.
It is possible to forecast the dependent variable if the independent variable is known; this technique is known as recession and works out the relationship as a mathematical formula in a scatter diagram. Correlation is high when the points on the graph show a pattern similar to a straight sloping line. Regression is the method of finding the formula that represents that line.
The regression line
Any straight line on a graph is written as y=a+bx, x and y are the variables of a bivariate distribution, x is independent and y is dependent; and the variable to be found is the called the dependant variable and is dependent on the value of the other variable. The variable which the estimate is made is the independent variable. a is the value of y where the line crosses the y axis and b is the slope of the line, if the line had a downward slope the value of b would be negative and the formula would be y=a-bx. The line forms an average between the points.
The graph shows regression to be y=44.5+10.05x. Using the graph and the regression estimate, a 7 year old aircraft would have a maintenance cost of £114.85 per year; an 11 year old aircraft would cost £155.05 in maintenance per year. If you follow the line of recession on the graph, b therefore has a positive slope which is demonstrated in the equation as it is a positive value. The graph shows that the older an aircraft the higher its maintenance costs per year, this is also confirmed in the regression prediction.