ML04: Kernel Density Estimation

Published in

Analytics Vidhya

3 min readOct 13, 2020

This is a continuation of Mathematics behind Machine Learning Series.

Kernel density estimation is a non-parametric model also know as KDE, it’s a technique that lets you create a smooth curve given a set of data. KDE basically centers a kernel function at each data point and smooths it to get a destiny estimate.

https://graphworkflow.com/eda/distributional-form/

The motivation behind the creation of KDM was that Histograms are not smooth, they depend on the width of the bins and the endpoints of the bins, KDMs reduce the problem by providing smoother curves.[1] This can be useful if you want to visualize just the “shape” of some data, as a kind of continuous replacement for the discrete histogram.

Let’s have a deeper dive and understand KDE

Parametric models have a fixed number of adaptable parameters, independent of the amount of data. Ex: Logistic regression, K-means clustering.
Non-parametric models have a variable number of parameters i.e. parameters change to adapt to the amount of data. In simple terms in order to make predictions the models looks at some(mostly all) data points in order to make a decision. Ex: Kernel Density Estimators, SVMs, Decision Trees.

Density Estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function[2]. So our goal for a new point x is to estimate p(x), where p(x) is the probability of x being a part of the density function.

Kernel Density Estimation is smoothening the data by convolving each point x with some ‘kernel’. Each point is represented as the center for a kernel density function and the final curve is normalized convolution(sum) of all kernels at that point.

Comparison of the histogram and kernel density estimate constructed using the same data. The six individual kernels are the red dashed curves, the kernel density estimates the blue curves simply by adding the red curves.

Kernels parameters

Kernel function: This is the most important parameter as it decides how your data will be represented. The most commonly used kernels are ‘gaussian’ and ‘exponential’.

2. Kernel function bandwidth (h): Changing the bandwidth changes the shape of the kernel i.e. it either expands or squeezes the function. A lower bandwidth means only points very close to the current position are given any weight, which leads to the estimate looking squiggly; a higher bandwidth means a shallow kernel where distant points can contribute.

3. Number of components (k): This is not a parameter that directly relates to the kernel function but relates to our model. k is basically the number of bins or buckets in which to divide our data into.

It is also very simple to use Kernel Density Estimators in scikit-learn.

import sklearn
kde = sklearn.neighbours.KernelDensity(kernel='gaussian', bandwidth=1.0)

I hope this article provides some intuition for how KDE works. They are important in order to understand Support Vector Machines coming up in the next lesson. Hope you enjoyed learning it!

For questions/feedback you can reach me at my LinkedIn or at my Website.

Happy learning!

References

[1]http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0405/MISHRA/kde.html

[2] Wikipedia: https://en.wikipedia.org/wiki/Density_estimation

ML04: Kernel Density Estimation

Let’s have a deeper dive and understand KDE

Kernels parameters

References

Written by Vaibhav Malhotra