Machine learning in a nutshell – part 3: improving audience segmentation with unsupervised learning

Customer Experience 4 December 2018

In our first article of this series dedicated to machine learning, we saw that the principle goal of machine learning is to automate common tasks, with the help of computers. To do so, machine learning algorithms try to mimic human learning based on a mathematical model (if you’re just joining us, you can catch up here).

Machine learning solutions can fall into 3 categories: supervised learning (discussed here), unsupervised learning, and reinforcement learning. Let’s talk about unsupervised learning: how does it work, and when can it be applied?

Supervised versus unsupervised learning

While supervised learning uses past data to predict future events, and tries to answer specific questions (such as “what is this object?”), unsupervised learning does not seek to optimise a particular task. Instead, the goal is simple: group observations into a predetermined number of groups using predefined criteria (defined by the project team). Essentially, unsupervised learning is a way to categorise elements into different groups using their characteristics.

Imagine that you are given objects of different shapes and sizes: circles, squares, or stars that are blue, red, green, or yellow. If you are asked to create three groups of objects, you would group them by shape. For four groups, you would use colour. For 12 groups, you would use shape and colour. For any other number, things get a bit trickier and classifications become more specific. Imagine now that you are given a huge box of LEGO with all sorts of shapes and colours. This would definitely complicate things!

This type of problem can be solved with unsupervised learning algorithms on a much larger scale. You provide a list of various objects and criteria, and a desired number of groups. The algorithm takes care of the rest!

Unlike supervised learning, which weighs characteristics against one another to better predict future events, unsupervised learning has no weight bias. Thus it does not consider it “better” to categorise by shape than by colour, for example. Of course, a human could interfere with the algorithm so that it focuses on certain characteristics, but the algorithm will only ever use what it is given. Consequently, the choice of characteristics is crucial to getting relevantly defined groups. If you wanted to sort the LEGO from earlier, you would group the blocks by shape, colour, or size so that the groups would be usefully organised. Sorting them by the date or time each piece was purchased would be less relevant.

Applications in digital marketing

While supervised learning responds to a specific question, like “will this user revisit my site in the coming days?”, unsupervised learning serves more descriptive and informative needs. For example, unsupervised learning can help to segment audiences by characteristic, and thus can be useful in creating targeted campaigns aimed at a specific group of users.

Just a few years ago, client databases were often (or perhaps still are!) evaluated using the RFM method, to judge customer value from a marketing perspective, using “Recency” (how recently did the customer purchase or interact with the brand?), “Frequency” (how often does the customer purchase over a certain period of time?), and “Monetary value” (how much does the customer spend over a certain period of time?) criteria. Depending on how many groups are desired, each RFM criterion can be broken down into smaller groups of common characteristics. For example, if you want to divide a group into 8, each criterion would be split into two groups:


Thanks to available online data, we can now consider a much larger number of criteria (or dimensions) to segment client groups more precisely, such as by user device, user traffic source, number of page views, etc. With so many descriptors available for millions of users, it becomes quite complicated for humans to break down the data into a manageable number of groups.

Once again, the decision over what criteria to use in segmentation is vital. For example, classifying users by their eating habits is probably not very relevant to an online store selling shoes. Other criteria could be helpful in theory but might not be possible to use in practice, depending on the activation scenario. It could be difficult, for example, to target parents via an e-mail campaign if data on family situations was never collected in the first place.

When the variables have been carefully chosen (ideally by a subject matter expert), and the desired number of groups has been defined (or a range has been given, from which the algorithm can choose the best number), the algorithm will deliver a set of rules that will split the users into groups. A heat map will then often be provided to make this information easier to understand. The idea here is to calculate the average value of each characteristic for each group, such as the average number of page views for each group, or the percentage of users on each device for each group. This process results in a data visualisation like the one below.

Here, we can see that a brand’s audience is being analysed using 13 variables divided into four categories: familiarity with brand, length and quality of visits, attitude towards sections, and engaging actions. The analysis using these 13 variables identified five user groups, which are then sorted from least to most engaged with the brand.

In digital marketing, unsupervised learning is most useful as a first step to learn more about and better understand your own data (and thus your clients/prospects!). It can also be used to reduce volume by grouping observations together when there is simply too much information. In practice, if you want to group observations together for a specific purpose such as improving click rate, do not forget to track the performance of your particular activation case according to choices made.

Last, but not least, it is important to remember that results from unsupervised learning – even more than supervised learning – depend heavily on data scientists making choices that will ultimately serve your business goals!


Translated from french to english by Niamh Cloughley

Would you like another cup of tea?