Problem: Analyzing Dice Rolls
Problem
Let’s consider a problem where we roll a six-sided die multiple times and analyze the results. Each roll of the die produces one of six possible outcomes: 1, 2, 3, 4, 5, or 6. We will calculate the mean, mode, median, several quantiles, maximum value, minimum value, and identify outliers. We will also visualize the results using a bar plot and introduce the concept of the probability mass function (PMF). Finally, we will explain how to calculate probabilities using PMF with Python code.
Generating Example Dataset
Let’s generate a dataset representing the outcomes of rolling a six-sided die 100 times:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generating dice rolls data (100 rolls)
np.random.seed(42) # for reproducibility
rolls = pd.Series(np.random.randint(1, 7, size=100))
# Custom function to calculate mode
def calculate_mode(data):
values, counts = np.unique(data, return_counts=True)
max_count_index = np.argmax(counts)
return values[max_count_index]
# Calculating mean, mode, and median
mean_roll = rolls.mean()
mode_roll = calculate_mode(rolls)
median_roll = rolls.median()
# Calculating quantiles
quantiles = rolls.quantile([0.25, 0.5, 0.75])
q1 = quantiles[0.25]
q3 = quantiles[0.75]
# Calculating maximum and minimum
max_roll = rolls.max()
min_roll = rolls.min()
# Calculating interquartile range (IQR) and identifying outliers
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
outliers = rolls[(rolls < lower_bound) | (rolls > upper_bound)]
print(f"Mean: {mean_roll}")
print(f"Mode: {mode_roll}")
print(f"Median: {median_roll}")
print(f"Quantiles: \n{quantiles}")
print(f"Max: {max_roll}")
print(f"Min: {min_roll}")
print(f"Lower Bound for Outliers: {lower_bound}")
print(f"Upper Bound for Outliers: {upper_bound}")
print(f"Outliers: \n{outliers}")
Data Analysis
- Mean, Mode, Median, Quantiles, Maximum, Minimum, Outliers:
- Mean: Average of the dice rolls.
- Mode: The most frequently occurring roll.
- Median: The middle roll when sorted.
- Quantiles: Values that divide the dataset into equal-sized intervals.
- Maximum: The highest roll.
- Minimum: The lowest roll.
- Outliers: Rolls that lie outside the typical range defined by 1.5 times the interquartile range (IQR).
Plotting Bar Plot
# Plotting Bar Plot
plt.figure(figsize=(10, 5))
plt.bar(rolls.value_counts().index, rolls.value_counts().values, color='blue', edgecolor='black')
plt.title('Bar Plot of Dice Rolls')
plt.xlabel('Dice Roll')
plt.ylabel('Frequency')
plt.show()
Bar Plot:
- x-axis: Represents the outcomes of the dice rolls (1, 2, 3, 4, 5, 6).
- y-axis: Represents the frequency (number of occurrences) of each outcome.
- Meaning: The bar plot shows how frequently each outcome appears in the dataset.
Introducing the Probability Mass Function (PMF)
The probability mass function (PMF) describes the probability of a discrete random variable taking on a particular value. PMF provides the probability of each possible outcome of the dice rolls.
Calculating PMF and Interval Probabilities
# Calculating the PMF
pmf = rolls.value_counts(normalize=True).sort_index()
print(f"PMF: \n{pmf}")
# Visualizing PMF
plt.figure(figsize=(10, 5))
plt.bar(pmf.index, pmf.values, color='blue', edgecolor='black')
plt.title('Probability Mass Function (PMF) of Dice Rolls')
plt.xlabel('Dice Roll')
plt.ylabel('Probability')
plt.show()
# Calculating the probability of a specific outcome (e.g., roll = 4)
prob_of_4 = pmf[4]
print(f"Probability of roll 4: {prob_of_4}")
# Calculating the probability of outcomes in a specific range (e.g., roll between 3 and 5)
prob_3_to_5 = pmf.loc[3:5].sum()
print(f"Probability of rolls between 3 and 5: {prob_3_to_5}")
PMF Bar Plot:
- x-axis: Represents the outcomes of the dice rolls (1, 2, 3, 4, 5, 6).
- y-axis: Represents the probability of each outcome.
- Meaning: The PMF bar plot shows the probability of each dice roll outcome.