...Linear Regression and Correlation Chapter 13 McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008 GOALS Understand and interpret the terms dependent and independent variable. Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate. Conduct a test of hypothesis to determine whether the coefficient of correlation in the population is zero. Calculate the least squares regression line. Construct and interpret confidence and prediction intervals for the dependent variable. 2 Regression Analysis - Introduction Recall in Chapter 4 the idea of showing the relationship between two variables with a scatter diagram was introduced. In that case we showed that, as the age of the buyer increased, the amount spent for the vehicle also increased. In this chapter we carry this idea further. Numerical measures to express the strength of relationship between two variables are developed. In addition, an equation is used to express the relationship. between variables, allowing us to estimate one variable on the basis of another. 3 Regression Analysis - Uses Some examples. Is there a relationship between the amount Healthtex spends per month on advertising and its sales in the month? Can we base an estimate of the cost to heat a home in January on the number of square feet in the home? Is there a relationship between the miles per gallon achieved by large...
Words: 2248 - Pages: 9
...Generates the inflation rate in % based on prices pc Generate / x = log(y) taking logs Generate / dlx = dlog(x) dlx = log(x) – log(x(-1)) Growth rate in continuous time Generate / y = exp(x) exp(x) as command: series x=0 Trend variable (linear): Generate / t = @trend Standard normal distributed realizations: Generate / x = nrnd Lags, lagged variables, taking differences: Generate / x1 = x(-1) x1(t) = x(t-1), Lag 1 of x Generate / dx = d(x) dx(t) = x(t) – x(t-1) = (1-B)x(t) first difference Generate / d2x = d(x,2) d2x(t) = dx(t) – dx(t-1) = (1-B)^(2)x(t) taking first differences twice Generate / d12x = d(x,0,12) d12x(t) = x(t) - x(t-12) = [1-B^(12)]x(t) seasonal difference for monthly data Generate d12_1x = d(x,1,12) d12_1x(t) = (1-B)[1-B^(12)]x(t) Geneartion of dummy variables: seasonal dummies: s=1,2,3,... Generate / ds = @seas(s) as command: series ds = @seas(s) Generate / d1 = 0 and manually in View/Spreadsheet use Edit+/p-value for x of a test statistic as command: (N-, t-, scalar p scalar p scalar p scalar p scalar p Chi2-, F-distribution) = 1 - @cnorm(x) = 1 - @cnorm(abs(x))*2 = 1 - @ctdist(x,df) = 1 - @cchisq(x,df) = 1 - @cfdist(x,df1,df2) 1-sided, right 2-sided 1-sided, right df ... degrees of freedom Determinant of correlation matrix: (as command) group grpx x1 x2 x3 x4 matrix x = @convert(grpx) group...
Words: 669 - Pages: 3
...SUBJECT REVIEW Regression Methods in the Empiric Analysis of Health Care Data GRANT H. SKREPNEK, PhD ABSTRACT OBJECTIVE: The aim of this paper is to provide health care decision makers with a conceptual foundation for regression analysis by describing the principles of correlation, regression, and residual assessment. SUMMARY: Researchers are often faced with the need to describe quantitatively the relationships between outcomes andpre d i c t o r s , with the objective of ex p l a i n i n g trends, testing hypotheses, or developing models for forecasting. Regression models are able to incorporate complex mathematical functions and operands (the variables that are manipulated) to best describe the associations between sets of variables. Unlike many other statistical techniques, regression allows for the inclusion of variables that may control for confounding phenomena or risk factors. For robust analyses to be conducted, however, the assumptions of regression must be understood and researchers must be aware of diagnostic tests and the appropriate procedures that may be used to correct for violations in model assumptions. CONCLUSION: Despite the complexities and intricacies that can exist in re gre s s i o n , this statistical technique may be applied to a wide range of studies in managed care settings. Given the increased availability of data in administrative databases, the application of these procedures to pharmacoeconomics and outc o m e s assessments may result in...
Words: 9010 - Pages: 37
...BEC1 STUDY GUIDE INTRODUCTION (CHAPTER 1 – MUNRO E-BOOK) Know the definition of population, sample, parameter, & statistic Be able to identify and/or provide examples of descriptive statistics & inferential statistics Know the properties of & be able to identify or provide examples of quantitative vs. categorical variables BASIC CONCEPTS (CHAPTER 2 – MUNRO E-BOOK) Know the definition of data, individuals, variables, independent variable, dependent variable, random assignment, treatment group, and control group. Know the properties of the 4 levels of measurement (nominal, ordinal, interval, ratio) Know the properties of discrete and continuous variables Know and understand the properties that distinguish experimental methods from correlational methods DISPLAYING DATA (CHAPTER 2 – MUNRO E-BOOK) Know what a distribution is and why examining a distribution can be helpful/useful Know how to interpret information from: Simple frequency distributions (grouped & ungrouped*) Relative frequency distributions (proportions* & percents*) Cumulative frequency distributions* Histograms Bar graphs* Stem-and-leaf displays You also should know how to construct those with an * beside them Know the definition of percentile rank Be able to identify and/or describe different shapes of distributions: Normal, symmetrical, skewed, unimodal, & bimodal distributions CENTRAL TENDENCY (CHAPTER 2 – MUNRO E-BOOK) Understand conceptually each of the 3 measures of central tendency: Mode, Median & Mean Know...
Words: 6621 - Pages: 27
...Strengths and Limitations of Correlational Design Walden University FPSY 6115-3 Understanding Forensic Psychology Research Strengths and Limitations of Correlational Design Correlational research designs are used to determine if a relationship exists between two or more variables and also describes the relationship amongst them (Stangor, 2011). The data can be results from observational research, questionnaires, or experiments and a scatterplot is often used to yield a visual of the collected data and the patterns of relationships can be described as positive linear, negative linear, nonlinear (independent), or curvilinear (Stangor, 2011). Correlational research identifies relationships between variables but does not explain a cause and effect and therefore within the design there are strengths and weaknesses. One strength of correlational research is it provides a visual image between variables in graphical form which displays the strength of the relationship. The research can be quick and easy and the predictor variables cannot be manipulated and while this design can predict a strong or not strong association between variables a main weakness is it does not imply causation which can lead to inaccurate conclusions (Stangor, 2011). Two research articles were reviewed and will be discussed to obtain a better understanding of correlational research. The first article Family and Social Factors as Predictors of Drug Misuse and Delinquent Behavior in Juveniles (Sharma, Sharma...
Words: 741 - Pages: 3
...university of phoenix | Correlation Paper | | | Amber Kluever | 2/29/2016 | | Correlation is a measure of association that tests whether a relationship exists between two variables. It indicates both the strength of the association and its direction, direct or inverse. I am trying to find out if there is a relationship between A. PTSD and B. AODA, whether there is a relationship between the two depends on the strengths between them. Each method views variables not in isolation, but instead as systematically and meaningfully associated with, or related to, other variables. For example, using correlation coefficient which indicate the strength of association between two variables the (X,Y) it also describes correlation that reflects mutual relations of r of 1.0 (positive or negative) indicates an perfect linear relation, while 0 indicates that neither X or Y can be predicted by a linear equation. In these types of cases when the r is positive then there is an increase in both X and Y. Now if the r is negative it’s an increase in just the X and a decrease in the Y. Another method that is commonly used is the dichotomous variable also knows as the discrete variable which has two separate parts. An example of this would be measuring sound waves but having to measure in two different parts one high and the other low. The advantages of correlation are that it is used for research to be carried out, either by using experiments or taking surveys. The major advantage...
Words: 351 - Pages: 2
...Simple Linear Regression Previously we tested for a relationship between two categorical variables, but what if are interest involved two quantitative variables? You may recall from algebra the line equation: Y = mX + b where Y is the dependent variable; m is the slope of the line; X is the independent variable; and b is the y-intercept. In algebra you usually had a perfect line, i.e. each X and Y pair fell exactly on the line, however in statistics this rarely the case. The concept begins by considering some dependent variable, e.g. Final Exam. In looking at such scores you realize that not all of them are the same (i.e. not everyone scores the same on the final) so one may wonder what would explain this variation. One possibility is performance on the midterms. To see if there might be a linear relationship between these two variables one would start with a Scatterplot. [pic] From this plot you can see that a potential linear relationship exists as for the most part higher midterm avergages appear to coincide with higher final exam scores. To get a numerical aspect for a linear relationship one would consider the correlation with the symbol of r. From Minitab this correlation is 0.67. Thus correlation is a measure of the strength of a linear relationship between two quantitative variables. If a perfect linear relationship existed the correlation would be one. However, not all relationships are positive. For instance consider the variables Weight and...
Words: 296 - Pages: 2
...TWO-VARIABLE REGRESSION MODEL: THE PROBLEM OF ESTIMATION * The PRF is an idealized concept, since in practice one rarely has access to the entire population of interest. Generally, one has a sample of observations from population and use the stochastic sample regression (SRF) to estimate the PRF. * Two generally used methods of estimation: 1) Ordinary least squares (OLS) and 2) Maximum likelihood (ML). We will focus on the OLS method. METHOD OF ORDINARY LEAST SQUARE (OLS) The statistical properties of OLS make it one of the most attractive and power method used in estimating parameters. The method of ordinary least squares is attributed to Carl Friedrich Gauss, a German mathematician. Recall the two-variable Population Regression Function (PRF): Eqn 1……. Yi=β1+β2Xi+ μi However, as you can recall, we rarely have population data, so we have to depend on sample information to estimate the relevant parameters. Therefore we have to estimate it from the Sample Regression Function (SRF). Eqn 2…… . Yi=β1+β2Xi+ μi Eqn 3……. Yi=Yi+μi * Yi= estimator of E(YXi) is the estimated (conditional mean) value of Yi How is the SRF determined? From Equation 3, we can see that Eqn 3……. Yi=Yi+μi Eqn 3a……. μi=Yi-Yi substituting Yi with equation 2 yields Eqn 3b……. μi=Yi-β1-β2Xi...
Words: 3270 - Pages: 14
...Probability, Statistics, and Forecasting OPRE 433 Fall 2013 Regression Report Xie Gehui (gxx24@case.edu) Dec 2, 2013 I. Introduction The data set given contains more than one independent variable, so the target of our regression analysis is to build an appropriate multiple regression model. To realize this target, we have to build a multiple linear regression model to test the regression assumptions: model appropriateness, constant variance, independence, and normality. Certainly we need to modify the data set or the model itself to satisfy these assumptions, and at last get the model acceptable. In the original data set that we are going to deal with in this report, there are 20,640 observations of 8 explanatory variables labeled X1, X2, X3, X4, X5, X6, X7, X8 and 1 dependent variable labeled Y. All of the 9 variables are continuous. II. Method of analysis To check the model appropriateness assumption, we need to make sure the functional form is correct. The residual plot will show the pattern suggesting the form of an appropriate model. To check the validity of the constant variance assumption, we need to examine residual plots. A residual plot with a horizontal band appearance suggests that the spread of the error terms around 0 is not changing much as the horizontal plot value increases. Such a plot tells us that the constant variance assumption approximately holds. To check the independence assumption, we need to detect if any positive autocorrelation...
Words: 1536 - Pages: 7
...in a different IT company. In the following table you are provided with his yearly salary for these 13 years. |1990 |7,000 | |1991 |8,000 | |1992 |9,200 | |1993 |10,100 | |1994 |11,000 | |1995 |48,000 | |1996 |50,000 | |1997 |52,000 | |1998 |57,000 | |1999 |63,000 | |2000 |67,000 | |2001 |72,000 | |2002 |103,000 | |2003 |108,000 | Perform a linear regression on his salary, and provide the linear regression eequation, the correlation coefficient and the corresponding graph. Use the equation to predict his salary in the years 2008 and 2010, other factors staying equal. Is the correlation coefficient positive or negative and what this represents? Perform the linear regression in both Excel and by hand, remember that technology many times truncates numbers and do not allow us to use the regression equation as given. Verify the equation and your results by hand. To solve this problem we have to set up the problem in an MS Excel spreadsheet and use the chart Wizard to create the regression line. The solution provided by the Excel is the following. As you can realize the solution provided by the Excel is y= 7935.2 x – 2E +07 while the R2 = 0.9256. Many software, including Excel, truncate the final numbers when these are really long integer...
Words: 413 - Pages: 2
...2001 Submitted by Arun Avudainaygam LINEAR AND ADAPTIVE LINEAR MULTIUSER DETECTION IN CDMA SYSTEMS Project Website: http://arun-10.tripod.com/mud/mud.html SECTION 0 Introduction Multiuser detection is a technology that spawned in the early 80’s. It has now developed into an important, full-fledged field in multi-access communications. Multiuser Detection (MUD) is the intelligent estimation/demodulation of transmitted bits in the presence of Multiple Access Interference (MAI). MAI occurs in multi-access communication systems (CDMA/ TDMA/ FDMA) where simultaneously occurring digital streams of information interfere with each other. Conventional detectors based on the matched filter just treat the MAI as additive white gaussian noise (AWGN). However, unlike AWGN, MAI has a nice correlative structure that is quantified by the cross-correlation matrix of the signature sequences. Hence, detectors that take into account this correlation would perform better than the conventional matched filter-bank. MUD is basically the design of signal processing algorithms that run in the black box shown in figure 0.1. These algorithms take into account the correlative structure of the MAI. 0.1 Overview of the project This project investigates a couple of different approaches to linear multiuser detection in CDMA systems. Linear MUDs are detectors that operate linearly on the received signal statistic i.e., they perform only linear transformations on the received statistic....
Words: 8974 - Pages: 36
...relationship between one dependent and one independent variable. After performing an analysis, the regression statistics can be used to predict the dependent variable when the independent variable is known. Regression goes beyond correlation by adding prediction capabilities. Types Of Regression Analysis: Most widely used two types of regression analysis are- I [pic] Linear Regression Analysis: When the regression is conducted by two variables or factors then is called linear regression analysis. Multiple regression analysis: Multiple regression analysis is a technique for explanation of occurrence and calculation of future actions. A coefficient of correlation among variables X also Y is a quantitative index of connection involving these two variables. In squared type, while a coefficient of purpose specifies the quantity of difference in the principle variable Y that is accounted for through the deviation in the analyst variable X. [pic][pic][pic][pic]Examples for Linear Regression Analysis: ABC a manufacturing co. where the production cost depends on their raw materials cost. Now, For the given set of x(tk in million) and y ( tk in thousand per unit) values, determine the Linear Regression and also find the slope and intercept and use this in a regression equation. |X |Y | |50 |4.2 | |51 |3.1 | |52 |5...
Words: 797 - Pages: 4
...Linear Regression I would like to know if people who enjoy thrill seeking have tattoos. I believe thrill seeking and tattoos go hand in hand. Most people I know are adventurous, risk takers, and daredevils and all of them have tattoos. I have a strong feeling that the correlation between the two will have a strong positive relationship. X= Tattoos Y= Thrill Seeking The scatter plot shows an extremely rough linear pattern but there is an upward sloping. Line of best fit: y = 0.9148x +25.505 Analysis: 1. r = .14 little or no correlation 2. R^2 = 2% 2% of the variance in thrill seeking is accounted by tattoos. 3. Slope = 0.0196(m) For every 1 tattoo people have there is an increase we expected of 0.9148 in thrill seeking. Conclusion: Between these two variables, there are no correlations between the two. It was shocking to see there is no relationship between the two. I truly believed people who are thrill seekers have tattoo. T-Test Independent 2 Sample My gym teacher believes that males are stronger than females and that is why males have more tattoos. The scale is determine by the number of tattoos both males and females have. Eighty-four males and one hundred and eleven females responded. The males average 39 (s.d. 1.42) while the females average 38 (s.d. 0.98). At the .10 significance level, test to see if there is a difference between males having more tattoos than females? Ho: Null Hypothesis Males equal Females Ha: Null Hypothesis...
Words: 478 - Pages: 2
...A PRELIMINARY ASSESSMENT OF THE IMPACT OF E-COMMERCE TECHNOLOGIES IN SUPPLY CHAIN MANAGEMENT This empirical study assesses the organizational impact of using eCommerce technologies in supply chain management utilizing the following constructs: system quality, information quality, system usage, and user satisfaction. A sample data set was collected from maquiladoras in Juárez, Mexico to investigate relationships among these constructs. A SEM analysis was undertaken, using AMOS, on the dataset. The analysis provided statistically significant relationships among some constructs. Keywords: Supply Chain Management, eCommerce Technologies, e-Enabled Supply Chain Management INTRODUCTION The use of eCommerce technologies (the Internet/World Wide Web, intranets, and extranets) in supply chain management (SCM) is a relatively recent phenomenon. Accordingly, very few studies have been conducted to date on the extent to which eCommerce technologies have been utilized in SCM, and, more importantly, on whether or not e-enabled supply chain management (eSCM), with the use of such technologies, has brought about improvements in managing supply chains. DeLone and McLean [1] proposed interrelationships among six IS dimensions in what is referred to as the ‘DeLone and McLean (D&M) IS Success Model’. The six dimensions in the D&M model are (1) system quality, (2) information quality, (3) system usage, (4) user satisfaction, (5) individual impact, and (6) organizational impact. While DeLone and...
Words: 2268 - Pages: 10
...What variables affect the difference in crime rates throughout the neighborhoods of a city? By Anna Burns Introduction: This project is a focus on how variables such as population, ethnicity, and income affect crime rates throughout different neighborhoods throughout a city. I feel that this information finding this information could be useful to many people. For example if you are looking to buy a new home or even start a new business, you’ll probably want it located in a safe neighborhood. This study will help identify the signs of a safe neighborhood. Knowing why crime rates are higher in some areas may also help prevent the crime rate to rise in other neighborhoods. For example, if crime rates are higher in neighborhoods with a higher percent of vacant houses, a city might give incentives for home buyers to buy houses in those neighborhoods to fill those vacant houses. Data and Variables: Most of my data came from the 2010 census, so my sample is only from 2010. With only one year of data some results may be skewed. I tried getting data from other censuses but it was not easily retrievable and some parts of the data were missing. The different quantitative variables I am using separated by neighborhood are median household income, total population, ethnic makeup (divided into white, Hispanic/Latino, Black/African America, ect…) total crimes, and housing units (vacant, occupied). The categorical variables I am using are neighborhood, community (group of neighborhoods...
Words: 2544 - Pages: 11