Skip to content

2: Graphical Methods for Describing Data Distributions

After asking a question and collecting data, the next step would be display and analyze data

Data Types

Categorical Data

Data which is in words. Examples would include yes, no, favourite color, etc.

Numerical Data

Data in numbers

Discrete

Only \(n\) numbers (countable)

Continuous

Any numeric value

Displaying Data

Bar Chart

y-axis: frequency or relative frequency (otherwise known as percentage)

The bars are separated and can be in any order

Comparative Bar Chart

Features two or more bars stacked beside each other to compare. y-axis will typically be relative frequency as there may be size differences

Describing Bar Charts

Which categories are bigger or smaller, and if comparative, what are the difference and similarities between categories

Numerical Data

Dot Plot

Number line with dots (stack dots if more than one occurrence)

Histogram

Typically have ranges/relative frequency. Some features of histograms is that precise data is hidden, but it allows for quick summary. The rectangles on the histogram are touching, compared to a bar chart. Additionally, the x-axis is typically continuous numerical, with each bar ranging from \([a,b)\). For situations involving discrete numerical, the label is kept in the center

Stem and Leaf

Comprised of a table, with the stem defining each leaf. Typically will have a legend and example, and no commas are used unless there are multi-digits

Stem Leaf
1 2 3 4
2 5 6 7

Comparative Stem and Leaf

Two stem and leaf charts, with a shared leaf column. The left group will ascend from right to left

Group A Stem Group B
6 5 4 2 1 2 3
3 2 1 3 4 5 6

Cumulative Relative Frequency Plot

Most noticeable features would include only positive slope, as well as the y-axis typically ending at 100% or at any equivalent

Time Series Plots

The x-axis is typically time, y-axis is observed value. Features connected line segments

Describing Distributions

Shapes

Unimodal

One center hump, can be referred as approximately symmetric or a bell curve

Left Skew

Most of the data lies on the right

Right Skew

Most of the data lies on the left

Bimodal

Two humps

Multi-modal

Multiple humps

SOCS

Shape: Examples would include bell curve, approximately symmetric, skew, bimodal, and multimodal (including outliers)

Outliers: Extreme values

Center: Where most of the data is (in most cases, this would either be median or mean)

Spread: High/Low Spread

Bivariable Numerical Data

Scatter Plots

Select one variable which causes y to change, typically with the \(y \text{ vs } x\) format