Links to resources

Our platform BittsAnalytics has set up a website for tracking status of our dashboard and our API endpoints at

There are several good documentation tools for producing API documentation. We previously used

However, for the latest iteration we selected Slate, available at

If you are interested in machine learning topics, you may find this telegam channel useful:

Notion is a great way for project management, we use it together with slack. Notion allows one to set up a website with articles, find one for our topics here:

A few cool pictures with our experiments on topic of style transfer between images can be found at

If you are interested in the difference between 3 terms: artificial intelligence, machine learning and deep learning, check out our article on

A few words on ensembling methods

Have you ever wondered how some people get much better results while training their models than others? Especially in cases when different people use the same dataset and code. The answer is Ensembling methods and models – bagging, boosting and stacking to improve results of deep learning and machine learning.

There are countless papers explaining and describing various methods of combining predictions of base models in machine learning. Also, there are quite a few open source implementations of combining predictions, like Spark MLLib, TensorFlow or Theano. All these implement various ensemble methods, such as Bagging, Boosting and Stacking.

Deep learning and machine learning are complex problems that utilize several different methods (or models). The biggest challenge with this is developing an accurate method to combine the results from each model. In this blog post we’re going to provide a comprehensive overview of three different ensemble methods: Boosting, Stacking, and Bagging. We’ll review the technical definitions for each method as well as provide example code in Python.

Machine learning may not be a new concept, but it is still quite complex and requires some expertise. The real problem is that we don’t always know how to achieve our goals in ML.  Enter ensemble methods and models.

When we talk about machine learning and artificial intelligence, it usually refers to either deep learning with neural networks or regression/classification with decision trees, random forest or other methods such as logistic regression and support vector machines. Supervised learning is used for classification problems like crypto sentiment analysis, where we want the AI to predict a category. Unsupervised is used when there are no labels. A nice introduction blog on sentiment analysis is here.

The problem of weak signals detection is a natural close relative of the well-studied problem of classification. It is also related to regression, clustering, outlier detection and anomaly detection. In this sample paper we review the state of art in each of these fields and identify essential techniques that can be used to create new ensembling methods.

Bitcoin sentiment analysis

Crypto markets are much more susceptible to the hype and mood of investors than e.g. stock markets. Latter is a different market as the companies value is defined by discounted cash flow valuation and is largely dependent on some observable – financial statements. Most of the time spent by financial analysis is thus on analysing financial statements and predicting revenue, net profit, cashflow, e.g. free cash flow.

In case of cryptocurrencies there is no comparable thing. This is why the value of given coin is much more dependent on the opinion of others, investors, about the coin.

This has led to an enlarged importance of opinions and sentiments about coins in terms of given altcoin or bitcoin valuation.

Bitcoin sentiment analysis can thus be an important in assessing or predicting the bitcoin price developments.

Bitcoin sentiment analysis is done by regularly collecting tweets and other social media posts about bitcoin and then determining the sentiment (positive or negative) for each social media post. This is usually done by using data science and machine learning techniques, first training machine learning model, using e.g. sklearn that is able to predict the sentiment (positive or negative, 1 or 0) for given text. And then deploying this on the stream of tweets and other social media posts.

The latter needs to be managed with some kind of data pipeline, e.g. employing spark or other libraries for this purpose.

Sentiment analysis is just one of many text classification models. Others include product categorization, news classification, product tagging and others.

Product categorization is e.g. especially important for the eCommerce ecosystem where the online stores often want to determine categories of product that they sell. They can thus allow their customers an easier search for their products.

Product tagging on the other hand is a more modern variant of the product categorization, where online stores do not assign one category to given product but rather one or more tags. This allows the user of online stores an even more refined search for their products.

Sentiment Analysis with Machine Learning, Opinion Mining


Sentiment analysis has become a popular method in recent years to learn about opinions of clients about product and services. It is used both for academic purposes and in commerce.

It is essentially mining for data which are then evaluated for subjective opinions or sentiments.

Valuable information can come from websites selling products and services, e.g. reviews on Amazon or Tripadvisor. Even larger data sets of sentiment data can be obtained from analysing data produced on social media platforms like Twitter, Instagram and others.

Historically, the first phase of sentiment analysis focused on determining the overall sentiment or sentiment polarity of sentences, paragraphs or entire documents.

However, companies have become more demanding in recent period and they are not only interested in overall sentiment of texts about their products and services. They want to know more details about what the customers are talking about:

  • which specific products are mentioned in customer opinions
  • what aspects of products or services are mentioned (e.g. for hotel possible aspects can be location, service, price)
  • what is the opinion, sentiment on these aspects as gathered from customer reviews

Aspect Based Sentiment Analysis

This approach is also known under a specific name – Aspect Based Sentiment Analysis (ABSA).

ABSA is essentially interested learning more about specific aspects of products or services. ABSA consists of several methods:

  • identification of relevant entities
  • extraction of their features and aspects (also sometimes called aspect extraction)
  • using so-called aspect terms to find out the sentiment about a particular feature or aspect (sentiment polarities are positive, neutral and negative)

How do we determine the aspects? One can use several approaches, from deep learning to dependency parsing. A great library to do dependency parsing and extract aspects is spacy. Also often used dependency library is Stanford CoreNLP.

Sentiment analysis, i.e. determining sentiment of aspects or whole sentences can be done by training machine learning or deep learning models. I will show you the code how you can train a rather large and accurate model for sentiment classification by yourself.

Training sentiment classifier using machine learning involves:

  • preparing a suitable labelled data set (we will use Stanford labelled data set of tweets)
  • using a specific machine learning model, e.g. Support Vector Machines are very suitable for this text classification taks
  • training the model on data set
  • evaluating the results (check precision, recall, f-score and accuracy)
  • use the model in production to produce insights

There are companies that have built sentiment classification systems and offer sentiment analysis consulting to build you a sentiment classification system that is customised for your needs.

Sentiment analysis can be applied on many types of texts

Sentiment analysis allows you to extract sentiment from a wide array of possible texts:

  • tweets
  • instagram posts
  • product reviews
  • restaurant reviews
  • hotel reviews
  • surveys
  • emails
  • tickets (support)

Sentiment analysis or opinion mining is a great solution for companies that have big data in form of unstructured texts, e.g. email communications with customers. It allows them to gain valuable information and actionable insights from this repositories of data.

Training a sentiment classifier based on SVM (Support Vector Machines) and using Stanford 140 data set

We will use the Scikit-learn library to train a sentiment classifier based on SVM.

We will be using the Stanford 140 data set. You can download it from this website:

Note the data format, it has 6 fields:

0 – the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
1 – the id of the tweet
2 – the date of the tweet
3 – the query. If there is no query, then this value is NO_QUERY.
4 – the user
5 – the text of the tweet

Let us load the libraries:

Next step is loading the data from Stanford 140 data set and preprocess it:

We will use TF-IDF representation of tweets before feeding them to the SVM model:

We next train the model using the linear SVM from scikit-learn:

After converging, we can evaluate the accuracy of the model by calculating precision, recall and f1-score:

The sentiment classifier trained on Stanford 140 data set has a good accuracy of 82%: