A few words on ensembling methods

Have you ever wondered how some people get much better results while training their models than others? Especially in cases when different people use the same dataset and code. The answer is Ensembling methods and models – bagging, boosting and stacking to improve results of deep learning and machine learning.

There are countless papers explaining and describing various methods of combining predictions of base models in machine learning. Also, there are quite a few open source implementations of combining predictions, like Spark MLLib, TensorFlow or Theano. All these implement various ensemble methods, such as Bagging, Boosting and Stacking.

Deep learning and machine learning are complex problems that utilize several different methods (or models). The biggest challenge with this is developing an accurate method to combine the results from each model. In this blog post we’re going to provide a comprehensive overview of three different ensemble methods: Boosting, Stacking, and Bagging. We’ll review the technical definitions for each method as well as provide example code in Python.

Machine learning may not be a new concept, but it is still quite complex and requires some expertise. The real problem is that we don’t always know how to achieve our goals in ML.  Enter ensemble methods and models.

When we talk about machine learning and artificial intelligence, it usually refers to either deep learning with neural networks or regression/classification with decision trees, random forest or other methods such as logistic regression and support vector machines. Supervised learning is used for classification problems like crypto sentiment analysis, where we want the AI to predict a category. Unsupervised is used when there are no labels. A nice introduction blog on sentiment analysis is here.

The problem of weak signals detection is a natural close relative of the well-studied problem of classification. It is also related to regression, clustering, outlier detection and anomaly detection. In this sample paper we review the state of art in each of these fields and identify essential techniques that can be used to create new ensembling methods.