Continuous Variables

Calculating Mean, Mode, Median, Quantiles, Maximum, Minimum, and Outliers

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import gaussian_kde, mode
 
# Given grades data
grades = pd.Series([50.0, 50.0, 47.0, 97.0, 49.0, 3.0, 53.0, 42.0, 26.0, 74.0,
                    82.0, 62.0, 37.0, 15.0, 70.0, 27.0, 36.0, 35.0, 48.0, 52.0,
                    63.0, 64.0])
 
# Calculating mean, mode, and median
mean_grade = grades.mean()
mode_grade = mode(grades).mode[0]
median_grade = grades.median()
 
# Calculating quantiles
quantiles = grades.quantile([0.25, 0.5, 0.75])
q1 = quantiles[0.25]
q3 = quantiles[0.75]
 
# Calculating maximum and minimum
max_grade = grades.max()
min_grade = grades.min()
 
# Calculating interquartile range (IQR) and identifying outliers
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
outliers = grades[(grades < lower_bound) | (grades > upper_bound)]
 
print(f"Mean: {mean_grade}")
print(f"Mode: {mode_grade}")
print(f"Median: {median_grade}")
print(f"Quantiles: \n{quantiles}")
print(f"Max: {max_grade}")
print(f"Min: {min_grade}")
print(f"Lower Bound for Outliers: {lower_bound}")
print(f"Upper Bound for Outliers: {upper_bound}")
print(f"Outliers: \n{outliers}")

Plotting Histogram, Density Plot, and Box Plot

# Plotting Histogram
plt.figure(figsize=(15, 5))
 
plt.subplot(1, 3, 1)
plt.hist(grades, bins=10, edgecolor='black')
plt.title('Histogram of Grades')
plt.xlabel('Grade')
plt.ylabel('Frequency')
 
# Plotting Density Plot
plt.subplot(1, 3, 2)
sns.kdeplot(grades, shade=True)
plt.title('Density Plot of Grades')
plt.xlabel('Grade')
plt.ylabel('Density')
 
# Plotting Box Plot
plt.subplot(1, 3, 3)
sns.boxplot(grades)
plt.title('Box Plot of Grades')
plt.xlabel('Grade')
 
plt.tight_layout()
plt.show()

Explanation of Plots

Histogram:
- x-axis: Represents the grade values.
- y-axis: Represents the frequency (number of occurrences) of each grade range.
- Meaning: The histogram shows how frequently each grade range appears in the dataset.
Density Plot:
- x-axis: Represents the grade values.
- y-axis: Represents the probability density.
- Meaning: The density plot is a smoothed version of the histogram. It shows the distribution of grades and helps identify the areas where grades are more concentrated.
Box Plot:
- x-axis: Represents the grade values.
- y-axis: Not applicable here as it’s a vertical plot.
- Meaning: The box plot shows the five-number summary of the data: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It also highlights any potential outliers.

Introducing the Probability Density Function (PDF)

The Probability Density Function (PDF) describes the likelihood of a continuous random variable to take on a particular value. For a given interval, the area under the curve of the PDF over that interval represents the probability of the variable falling within the interval.

Calculating Probability Density and Interval Probabilities

# Calculate the probability density function
kde = gaussian_kde(grades)
density_at_37 = kde(37)[0]
print(f"Density at 37: {density_at_37}")
 
# Calculate probability between 36 and 38
prob_36_to_38 = kde.integrate_box_1d(36, 38)
print(f"Probability between 36 and 38: {prob_36_to_38:.4f}")

Explanation of PDF and Interval Probability

Probability Density at a Point (e.g., 37):
- Meaning: The density value at x = 37 indicates how dense the data is around the score of 37. It is not a probability but a density value.
- Example: Density at 37: 0.014779821322454553 means the data is relatively dense around 37.
Probability over an Interval (e.g., 36 to 38):
- Meaning: The probability of grades falling between 36 and 38 is the area under the PDF curve between these two points.
- Example: Probability between 36 and 38: 0.0296 means there is a 2.96% chance that a randomly selected grade from this dataset will fall within this range.

Hua Wang

Explorer

Continuous Variables

Plotting Histogram, Density Plot, and Box Plot

Explanation of Plots

Introducing the Probability Density Function (PDF)

Calculating Probability Density and Interval Probabilities

Explanation of PDF and Interval Probability

Table of Contents

Graph View

Backlinks