Statistics

PDF (Probability Desity Function) and PMF (Probability Mass Function)

These functions help describe the distribution of random variables and form the backbone of statistical modeling, machine learning, and data-driven decision-making.

Probability Mass Function (PMF) – Used for discrete random variables. e.g dice rolls
Probability Density Function (PDF) – Used for continuous random variables. e.g height measurements

Probability Mass Function (PMF)

A Probability Mass Function (PMF) is a function that gives the probability of a discrete random variable being exactly equal to some value. It is used for discrete random variables, where the set of possible outcomes is countable.

Key Properties of PMF

The PMF assigns probabilities to each possible outcome.
The sum of all the probabilities for all possible outcomes must equal 1.
It is defined for discrete random variables, such as counting numbers or categories.

Mathematical Definition

Let $X$ be a discrete random variable with possible values The PMF of $X$ , denoted as , satisfies the following conditions:

$0 \le P(X = \! x) \le1$
$\sum P(X = \! x) = 1$

Example of PMF

Let’s consider a fair six-sided die. The outcome of rolling the die is a discrete random variable that can take values 1, 2, 3, 4, 5, or 6, each with an equal probability of occurring.

For a fair die, the PMF is:

$P(X = x) = \frac{1}{6}, \quad x \in \{1, 2, 3, 4, 5, 6\}$

import numpy as np
import matplotlib.pyplot as plt

# Define the possible outcomes of a fair 6-sided die
outcomes = np.array([1, 2, 3, 4, 5, 6])

# Probability for each outcome 
# (fair die, so each has equal probability)
probabilities = np.array([1/6] * 6)

# Plot the PMF
plt.bar(outcomes, probabilities, width=0.5, color='skyblue')
plt.title("Probability Mass Function of a Fair 6-Sided Die")
plt.xlabel("Outcome")
plt.ylabel("Probability")
plt.show()

Probability Density Function (PDF)

A Probability Density Function (PDF) is used for continuous random variables and describes the probability of the variable falling within a particular range, rather than taking a single value.

Mathematical Definition

For a continuous random variable $X$ with PDF , the probability that $X$ falls within an interval is given by:

$P(a \leq X \leq b) = \int_a^b f(x) \, dx$

Key Properties of PDF

$f(\! x) \geq 0 \, \text{for all} \, \quad x$
The total area under the curve is 1:

$\int_{-\infty}^{\infty} f(x) \, dx = 1$

Unlike PMFs, the value of does not represent a direct probability but rather a density.

Example: Normal Distribution

A commonly used PDF in statistics is the Normal Distribution

$f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{-\frac{(x - \mu)^2}{2\sigma^2}}$

where:

$\mu$ is the mean,
$\sigma$ is the standard deviation.

Relevance to Data Science

Understanding PDFs and PMFs is essential for:

Statistical Modeling:
- PMFs are used in discrete models like Poisson regression and binomial distribution.
- PDFs are used in continuous models like linear regression (assuming normal errors).
Machine Learning:
- Naïve Bayes Classifier assumes a Gaussian (normal) distribution for continuous features, making PDF crucial.
- Generative models (e.g., Variational Autoencoders) rely on PDFs to model data distributions.
Anomaly Detection:
- Many anomaly detection techniques assume normality in data distribution and use PDFs to detect outliers.

Example in Data Science: Naïve Bayes Classifier

A Gaussian Naïve Bayes classifier assumes that each feature follows a normal distribution:

$P(X|Y) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{-\frac{(X - \mu)^2}{2\sigma^2}}$

from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=2, n_classes=2, random_state=42)

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naïve Bayes Classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predictions
y_pred = gnb.predict(X_test)

# Accuracy
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")

PDF (Probability Desity Function) and PMF (Probability Mass Function)

Probability Mass Function (PMF)

Key Properties of PMF

Mathematical Definition

Example of PMF

Probability Density Function (PDF)

Mathematical Definition

Key Properties of PDF

Example: Normal Distribution

Relevance to Data Science

Example in Data Science: Naïve Bayes Classifier

Table of Contents

You may also like to read

Percentiles and Moments

Mean, Standard Deviation and Variance

Basic Statistics

Popular content

contact us