When working with data, it is often necessary to generate random samples that follow certain probability distributions. NumPy, a popular Python library for scientific computing, provides functions that allow us to easily accomplish this task. In this article, we will explore how to perform random sampling and work with probability distributions using NumPy.
NumPy offers a variety of functions to generate random samples from different probability distributions. These functions reside in the numpy.random
module, which needs to be imported before using them. Here are some of the commonly used random sampling functions:
The numpy.random.uniform
function generates samples from a uniform distribution over a specified range. It takes arguments for the minimum and maximum values of the range, and an optional argument for the number of samples.
import numpy as np
samples = np.random.uniform(0, 1, size=10)
print(samples)
Output:
[0.83789222 0.48397432 0.72498423 0.06976454 0.85715861 0.15038215
0.6526364 0.54157047 0.13226867 0.38004432]
The numpy.random.normal
function generates samples from a normal distribution with a specified mean and standard deviation. It also supports an optional argument for the number of samples.
samples = np.random.normal(0, 1, size=10)
print(samples)
Output:
[-0.581825 -0.03870262 -0.21917679 1.86945342 1.78443998 0.53423595
-0.33531442 0.48049153 -0.96408401 -0.43320646]
The numpy.random.binomial
function generates samples from a binomial distribution with a specified probability of success and number of trials. It can also accept an optional argument for the number of samples.
samples = np.random.binomial(10, 0.5, size=10)
print(samples)
Output:
[4 4 5 6 7 7 5 6 5 4]
These are just a few examples of the probability distributions that can be sampled using NumPy. Consult the NumPy documentation for more information on the available distributions and their respective functions.
In addition to generating random samples, NumPy also provides functions to work with probability distributions. These functions can be found in the numpy.random
module alongside the random sampling functions.
The probability density function (PDF) of a probability distribution describes the likelihood of obtaining a given value. NumPy provides functions to evaluate the PDF for different distributions. For example, the numpy.random.normal
function has a corresponding PDF function called numpy.random.normal
.
import matplotlib.pyplot as plt
x = np.linspace(-5, 5, 100)
pdf = np.random.normal.pdf(x, 0, 1)
plt.plot(x, pdf)
plt.xlabel('x')
plt.ylabel('Probability density')
plt.title('Normal distribution PDF')
plt.show()
This code snippet generates a plot of the PDF of the standard normal distribution.
The cumulative distribution function (CDF) of a probability distribution gives the probability of obtaining a value less than or equal to a particular value. NumPy allows us to calculate the CDF using functions like numpy.random.normal
and numpy.random.binomial
.
x = np.linspace(-5, 5, 100)
cdf = np.random.normal.cdf(x, 0, 1)
plt.plot(x, cdf)
plt.xlabel('x')
plt.ylabel('Cumulative probability')
plt.title('Normal distribution CDF')
plt.show()
This code snippet generates a plot of the CDF of the standard normal distribution.
These are just some of the functions and methods available in NumPy to work with probability distributions. By leveraging these tools, you can easily generate random samples and analyze various probability distributions, enabling you to gain insights into your data and make statistical inferences.
With NumPy's comprehensive set of random sampling and probability distribution functions, you have the necessary tools to work with a wide variety of practical data scenarios.
noob to master © copyleft