2: Graphical Methods for Describing Data Distributions
After asking a question and collecting data, the next step would be display and analyze data
Data Types
Categorical Data
Data which is in words. Examples would include yes, no, favourite color, etc.
Numerical Data
Data in numbers
Discrete
Only \(n\) numbers (countable)
Continuous
Any numeric value
Displaying Data
Bar Chart
y-axis: frequency or relative frequency (otherwise known as percentage)
The bars are separated and can be in any order
Comparative Bar Chart
Features two or more bars stacked beside each other to compare. y-axis will typically be relative frequency as there may be size differences
Describing Bar Charts
Which categories are bigger or smaller, and if comparative, what are the difference and similarities between categories
Numerical Data
Dot Plot
Number line with dots (stack dots if more than one occurrence)
Histogram
Typically have ranges/relative frequency. Some features of histograms is that precise data is hidden, but it allows for quick summary. The rectangles on the histogram are touching, compared to a bar chart. Additionally, the x-axis is typically continuous numerical, with each bar ranging from \([a,b)\). For situations involving discrete numerical, the label is kept in the center
Stem and Leaf
Comprised of a table, with the stem defining each leaf. Typically will have a legend and example, and no commas are used unless there are multi-digits
Stem | Leaf |
---|---|
1 | 2 3 4 |
2 | 5 6 7 |
Comparative Stem and Leaf
Two stem and leaf charts, with a shared leaf column. The left group will ascend from right to left
Group A | Stem | Group B |
---|---|---|
6 5 4 | 2 | 1 2 3 |
3 2 1 | 3 | 4 5 6 |
Cumulative Relative Frequency Plot
Most noticeable features would include only positive slope, as well as the y-axis typically ending at 100% or at any equivalent
Time Series Plots
The x-axis is typically time, y-axis is observed value. Features connected line segments
Describing Distributions
Shapes
Unimodal
One center hump, can be referred as approximately symmetric or a bell curve
Left Skew
Most of the data lies on the right
Right Skew
Most of the data lies on the left
Bimodal
Two humps
Multi-modal
Multiple humps
SOCS
Shape: Examples would include bell curve, approximately symmetric, skew, bimodal, and multimodal (including outliers)
Outliers: Extreme values
Center: Where most of the data is (in most cases, this would either be median or mean)
Spread: High/Low Spread
Bivariable Numerical Data
Scatter Plots
Select one variable which causes y to change, typically with the \(y \text{ vs } x\) format