# Gaussian distribution from eight points

Given only few data points from a distribution, we can approximate it by fitting a line to them. Below you see points generated from the Gaussian distribution and the resulting line to make it visible.

``` import numpy as np import matplotlib.pyplot as plt def gaussian(x, mean, std): variance = std**2 return (1/np.sqrt(2*np.pi*variance))*np.exp(1)**(-(x - mean)**2/(2*variance)) points = [] point_count = 8 mean, std, stddevs = 0, 20, 4 mean_diff = std*stddevs/2 mean_diff_sqrt = np.sqrt(mean_diff) x_vals = np.linspace(mean - mean_diff, mean + mean_diff, point_count) for i in range(point_count): x = x_vals[i] y = gaussian(x,mean,std) points.append([x,y]) points = np.array(points) x, y = points[:,0], points[:,1] poly_coeffs = np.polyfit(x, y, point_count-1) poly_func = np.poly1d(poly_coeffs) plt.scatter(x, y, c='k', s=10) x_new = np.linspace(x[0], x[-1], 40) y_new = poly_func(x_new) plt.plot(x_new, y_new, 'k-') plt.xlim([x[0] - mean_diff_sqrt, x[-1] + mean_diff_sqrt]) plt.ylim([0.5*min(y), 1.1*max(y)]) plt.tight_layout() plt.show() ```

The result looks like this:

Interestingly, with six points we would still say be able to recognize the distribution type, although the shape would be slightly less pronounced. The same idea can be used with almost any other distribution whose function is continuous. This allows us to pass only a small number of data points to the plotting function instead of tens of thousands, which results in much faster execution.