Skip to content

4: Describing Bivariate Numerical Data

Correlation Coefficient

Denoted with the variable \(r\) (only for linear)

  • Must be between the interval \([-1,1]\)

Analysis

Linear Relationship

When \(r\ge0\), there is positive linear relationship

When \(r\le0\), there is negative linear relationship.

When \(r=0\), there is no linear relationship and is perfectly random

Perfect Positive and Negative Correlation

When \(r=1\), there is perfect correlation

When \(r=-1\), there is perfect negative correlation.

Interpretation

The correlation coefficient of (r-value) indicates a (strong/moderate/weak) (positive/negative) linear relationship between (y-var) and (x-var)

Best Line of Fit (Least Squares Regression Line)

When \(\Sigma\text{residual}^2\) is the least

Residuals

A residual is defined as the \(\text{actual}-\text{predicted}\)

Residual Plots

Occurs after a regression, which can be used to determine whether a regression is fit for a specific model

Regressions

Form

\[ \hat{y}=a+bx \]

Interpretation

The slope of the least squares regression line is (slope). This means for an increase in 1 (unit) in (x-var), there is an (increase/decrease) of (# with unit) in the predicted (y-var)

Influential Points

High Leverage Points: Totally change and has a big effect on the regression line, typically a large x-value

Outlier: Extreme y-value (does not effect x-value)

Coefficient of Determination \(r^2\)

Answers how much a linear relationship can be explained by a variable, with \(r^2 \in [0,1]\)

Approximately (\(r^2\)) of the variability of (y-var) can be explained by the linear relationship between (y-var) and (x-var)