Home / NumPy

Random sampling and probability distributions in NumPy

When working with data, it is often necessary to generate random samples that follow certain probability distributions. NumPy, a popular Python library for scientific computing, provides functions that allow us to easily accomplish this task. In this article, we will explore how to perform random sampling and work with probability distributions using NumPy.

Random sampling

NumPy offers a variety of functions to generate random samples from different probability distributions. These functions reside in the numpy.random module, which needs to be imported before using them. Here are some of the commonly used random sampling functions:

Uniform distribution

The numpy.random.uniform function generates samples from a uniform distribution over a specified range. It takes arguments for the minimum and maximum values of the range, and an optional argument for the number of samples.

import numpy as np

samples = np.random.uniform(0, 1, size=10)
print(samples)

Output: [0.83789222 0.48397432 0.72498423 0.06976454 0.85715861 0.15038215 0.6526364 0.54157047 0.13226867 0.38004432]

Normal distribution

The numpy.random.normal function generates samples from a normal distribution with a specified mean and standard deviation. It also supports an optional argument for the number of samples.

samples = np.random.normal(0, 1, size=10)
print(samples)

Output: [-0.581825 -0.03870262 -0.21917679 1.86945342 1.78443998 0.53423595 -0.33531442 0.48049153 -0.96408401 -0.43320646]

Binomial distribution

The numpy.random.binomial function generates samples from a binomial distribution with a specified probability of success and number of trials. It can also accept an optional argument for the number of samples.

samples = np.random.binomial(10, 0.5, size=10)
print(samples)

Output: [4 4 5 6 7 7 5 6 5 4]

These are just a few examples of the probability distributions that can be sampled using NumPy. Consult the NumPy documentation for more information on the available distributions and their respective functions.

Probability distributions

In addition to generating random samples, NumPy also provides functions to work with probability distributions. These functions can be found in the numpy.random module alongside the random sampling functions.

Probability density function (PDF)

The probability density function (PDF) of a probability distribution describes the likelihood of obtaining a given value. NumPy provides functions to evaluate the PDF for different distributions. For example, the numpy.random.normal function has a corresponding PDF function called numpy.random.normal.

import matplotlib.pyplot as plt

x = np.linspace(-5, 5, 100)
pdf = np.random.normal.pdf(x, 0, 1)

plt.plot(x, pdf)
plt.xlabel('x')
plt.ylabel('Probability density')
plt.title('Normal distribution PDF')
plt.show()

This code snippet generates a plot of the PDF of the standard normal distribution.

Cumulative distribution function (CDF)

The cumulative distribution function (CDF) of a probability distribution gives the probability of obtaining a value less than or equal to a particular value. NumPy allows us to calculate the CDF using functions like numpy.random.normal and numpy.random.binomial.

x = np.linspace(-5, 5, 100)
cdf = np.random.normal.cdf(x, 0, 1)

plt.plot(x, cdf)
plt.xlabel('x')
plt.ylabel('Cumulative probability')
plt.title('Normal distribution CDF')
plt.show()

This code snippet generates a plot of the CDF of the standard normal distribution.

These are just some of the functions and methods available in NumPy to work with probability distributions. By leveraging these tools, you can easily generate random samples and analyze various probability distributions, enabling you to gain insights into your data and make statistical inferences.

With NumPy's comprehensive set of random sampling and probability distribution functions, you have the necessary tools to work with a wide variety of practical data scenarios.