4: Describing Bivariate Numerical Data
Correlation Coefficient
Denoted with the variable \(r\) (only for linear)
- Must be between the interval \([-1,1]\)
Analysis
Linear Relationship
When \(r\ge0\), there is positive linear relationship
When \(r\le0\), there is negative linear relationship.
When \(r=0\), there is no linear relationship and is perfectly random
Perfect Positive and Negative Correlation
When \(r=1\), there is perfect correlation
When \(r=-1\), there is perfect negative correlation.
Interpretation
The correlation coefficient of (r-value) indicates a (strong/moderate/weak) (positive/negative) linear relationship between (y-var) and (x-var)
Best Line of Fit (Least Squares Regression Line)
When \(\Sigma\text{residual}^2\) is the least
Residuals
A residual is defined as the \(\text{actual}-\text{predicted}\)
Residual Plots
Occurs after a regression, which can be used to determine whether a regression is fit for a specific model
Regressions
Form
Interpretation
The slope of the least squares regression line is (slope). This means for an increase in 1 (unit) in (x-var), there is an (increase/decrease) of (# with unit) in the predicted (y-var)
Influential Points
High Leverage Points: Totally change and has a big effect on the regression line, typically a large x-value
Outlier: Extreme y-value (does not effect x-value)
Coefficient of Determination \(r^2\)
Answers how much a linear relationship can be explained by a variable, with \(r^2 \in [0,1]\)
Approximately (\(r^2\)) of the variability of (y-var) can be explained by the linear relationship between (y-var) and (x-var)