Free Essay

Bmgt230 Review Sheet

In:

Submitted By dchagares
Words 912
Pages 4
Bar Graphs & Pie Charts: follow area principle, (pie has to add up to 100%) Histograms: rectangles class intervals based on frequency Boxplots: less informative than histogram, (can’t tell shape unless unimodal) Statistics: Descriptive- how we cope w/ numbers [graphical methods (histograms, boxplots) numerical methods (mean, median)] Variable Types: QUANTITATIVE (numerical): Discrete- numbers are a certain gap apart, can’t have decimals (# kids in a house) Continuous- numbers can be arbitrarily close together (age, height). QUALITATIVE (categorical): Ordinal- there is a natural order associated (ex. GPA A, B, C or level of satisfaction, bad, good, great) Nominal: No order associated (color of eyes) Time Series Data- Values over time Cross Sectional Data -specific point in time Interval Data – no meaningful 0, cant divide or multiply but the difference b/w 2 values is meaningful. Ratio Data – opposite of interval, can compare to 0 (height). Selection Bias – sampling scheme problem, systematic tendency to exclude one kind of individual (call at 2 pm everyday). Non Response- subjects don’t answer Response Bias- Subjects lie, interviewer effect Under coverage Bias- portion of population is not sampled. Sampling Error- sample to Sample differences are reflected Sample- part of the population of interest. We examine & do have data Sample Size- fraction of pop u have sampled doesn’t matter; sample size does. Parameter- a number describing a characteristic of population (average student debt) something of interest in the population (unknown). Statistic- a number describing a characteristic of a sample (known) comes from samples Sampling Frame- list of population (phone book, register voter list) Effective population, SRS picks equally from whole frame Statistical Sampling- Simple Random- each individual in pop has equal chance of Being selected Stratified Random- divide pop into sub groups (strata) according to common characteristic, do SRS from each sub group Cluster- groups are a mix of diff people, take a SRS of clusters and look at all individuals in chosen cluster. Data – values along w/ context Variables – the aspect/characteristic that differs from subject to subject, individual to individual Marginal Distribution- look at each categorical variable separately in a two way table by studying row totals & Column totals, they are expressed in counts or percentages, distribution of one variable in respect of the other one Simpsons Paradox- association or comparison that holds true for all of several groups can reverse direction when the data are combined (aggregated) to form a single group (better hospital- patient condition is lurking)

Mean & Median- similar for symmetric distributions, mean moves in direction of skewed distribution (not robust highly influenced by outliers) median is robust, Symmetry- mean=median, Right Skewed- median<mean, Left Skewed- median> mean. Range- Max – Min (Not robust) IQR = Q3 – Q1 Where Middle 50 % is, It is the last sensitive to outliers of any of our measures of the spread (it is robust).

Variance & Standard Deviation: Both not Robust Variance – measures the spread above the mean, average of the squared deviations from the mean. Conditional Distribution: how one variable is affected Correlation- Never has units, good linear model can’t infer that x causes y, if R= 0 no linear association, not robust (highly influenced by Outliers), r has no units, r(x, y) = r(y, x), r is scale invariant (if you add more units r does not change) Correlation does not imply causation association not causation, correlation is linear association, It tells us about strength (scatter) & direction of the linear relationship b/w two quantitative variables, measure of spread in both the x & y directions in the linear relation Ecological correlation- based on rates or averages-tend to overstate strength of associations. Confounding Variable- effects on a response variable cannot be distinguished from each other, can be explanatory variables or lurking variables Regression Line- the line that minimizes the sum of the squared residuals, as x increases by 1 unit the response on average changes by the value of the slope b1.

Residuals: the distance from each point to the least-squared regression line gives us potentially useful info about the contribution of Individual data points to the overall pattern of scatter. Residual = observed – predicted, sum of residuals = 0 Conditions for Regression Model: Linearity, Independence, Equal Spread (residual plots check this) Residual plots- distance b/w y-observed & y-predicted, if fan shape no constant spread. Funnel shape = no equal spread. Random: if individual outcomes are uncertain but there is a regular distribution of outcomes in a large # of repetitions Probability: the portion of times the outcome would occur in a very long series of repetitions Probability models: describe, mathematically the outcome of random processes (two parts) > 1. Sample Space: set or list of all possible Outcomes of a random process, an event is a subset of the sample space Probability Properties: Probabilities range from 0 (no chance of event) to 1 (event has to happen) so if P(A) = 0 -> A is impossible Because some outcome must occur on every trial the sum of the probabilities for all possible outcomes must be exactly 1. Mutually Exclusive: (also known as disjoint) -events contains no common outcomes, Intersection is empty, they can’t both happen Simple Events: contains one outcome, they are mutually exclusive, equally likely Independence: are not related, knowing A doesn’t give -info about B, A does not affect B, (ex. Two coin flips, roll a die twice, not independent- draw two cards, weather on two consecutive days)-----MUTUAL EVENTS ARE NOT INDEPENDENT! (and vice versa)

Similar Documents