Data lakes: a treasure within marketers’ reach

Data Architecture 8 January 2019

What’s a data lake again?

“A data lake is a wide database fed by scattered data flows. These flows discharge into an actual lake of data, and the variety of these streams matches the company’s wide array of departments”, states Pierre Harand, Managing Director of fifty-five in France.

The asset of a data lake resides in its ability to store all kinds of data, whether online or offline, personal or not, from a variety of roles (marketing, finance, HR, product, and so on) that can be explored, reconciled and refined – in other words, enchanced – to meet various business needs.

Plenty of opportunities when it comes to marketing activation

These “data lakes” are excellent ways to integrate data intelligence into all departments of a company, especially marketing.

For more than 10 years, the valorisation of consumer data has become widespread with all sorts of solutions for data collection, native connectors for activation (including between web analysis and DSP tools for media activation), integrated marketing offers based on cross-platform identifiers for a 360° customer knowledge and relationship… In short, everything is done to support marketing teams in the implementation of a ROI-focused digital strategy.

The data lake therefore plays a key role, serving both as a container and a playground for data initiatives. It makes it possible to define and deploy various use cases that can be tested and, where relevant, automated.

Segmentation is a prime example. While it is a simple one, it produces high added value. Standard albeit information-rich variables – from analytics and/or CRM tools enhanced by machine learning algorithms – allow the quick identification of marketing personas. Furthermore, in deploying customised communication strategies on all levers (media, CRM, website and app), you optimise the impact and ROI of your campaigns, as well as your customers’ engagement.

Thanks to user scoring techniques, you can also identify prospects with great potential. This way, you only engage media investments when the latter are highly likely to generate conversions, which dramatically reduces acquisition and lead generation costs.

Let us not forget product or content recommendations, customer journey personalisation, the retargeting of users following shopping cart abandonment, churn analysis and predictions… All these ROI-oriented use cases can be rapidly adopted following the implementation of a data lake.

Deploying a data lake: a myriad of challenges…

Although it is becoming mainstream for data teams to build data lakes in order to support corporate endeavours such as those of marketing, several challenges do arise. Indeed, it is necessary to:

  • Identify and prioritise business needs that later turn into technical ones.
  • Take into account the existing context, whether it be the organisation of teams, technologies in place and/or initiatives that have already been launched.
  • Build and rely on a team that blends business, legal, technical and analytical skills, but also statistical and IT-oriented ones, while defining a clear framework for corporate governance.
  • Guarantee the quality of incoming data standing by the “garbage in, garbage out!” saying. In other words, if a system’s incoming data is of poor quality, the same will likely apply to outgoing data.
  • Ensure data safety and meet the GDPR’s personal data requirements.
  • Achieve all this at a reasonable cost!

… that you can overcome!

To meet these challenges, tools are sometimes necessary, but what you really need to do is identify and organise the contributions and responsibilities of each collaborator. Here are some guidelines on which fifty-five’s approach partly relies, and which have enabled the success of the data lake projects performed with our clients:

  • Build a cross-functional team dedicated to the project: a Project Manager, a Data Engineer, a Data Analyst and, if need be, a Data Scientist. Also plan regular and select meetings to share everyone’s progress while aligning their objectives. Such a team may include the company’s internal or external collaborators.
  • Choose a versatile project manager who has sharp business skills, a fair understanding of the technical challenges at hand and an advanced knowledge of the company’s data framework.
  • Driven by corporate goals, aware of the technical complexities and constraints linked to the context and existing parameters, he or she will hence serve as the guarantor of adapted governance and architecture; his or her role becomes paramount, because it is the business objectives that should define the infrastructure to be deployed, and not the other way around.
  • If the data team decides to receive support from a partner – because one or several given skills are lacking in-house – ensure the presence of teams day after day, independence from partner solutions and the transmission of expertise, as these are key elements to reach mid and long-term success.
  • Use the market’s cloud technologies; with Amazon Web Services (AWS) and Google Cloud Platform, for instance, you can quickly activate a large set of solutions with little engagement. These tools can relieve teams from infrastructure-related constraints so that they may focus their concerns on corporate issues and on their own data framework. Defining and prioritising business goals becomes a priority once more. Furthermore, these solutions are available at highly competitive costs!
  • Develop a simple and pragmatic architecture: 3 priority data sources (maximum!), native connectors, a storing and processing solution, schedulers*, and a first data set that can be activated for a POC (Proof of Concept) on a simple use case with measurable performance; then, iterate!
  • If you start off with a simple architecture and build by iterations, managing data quality and deploying safety measures becomes far easier. The first version is operational after a few months at most, and the cost becomes reasonable – tens of thousands of euros – since it is the return on investment that defines whether or not the use case should be industrialised, as well as the iterations.
  • And, above all, don’t hesitate to move on if a given path you’ve undertaken leads you towards deadlock.

In a nutshell, everyone can build and exploit a data lake, so long as it is pragmatic, iterative and driven by business objectives.

In 2018, only 30% of large corporations effectively owned a data lake. In spite of this, data lakes are actually a treasure within marketers’ reach, making simple and ROI-focused activations possible, with performance that can be measured directly. So, go for it!

Once you’ve achieved this first level of maturity, you’ll be in a position to tackle more complex data science issues. Measuring the impact of your influencers to optimise your media mix, content recommendations based on the user’s history log or cognitive profile, chatbots or voice recognition to better understand and engage your customers – all these innovative topics will, as early as tomorrow, make your brand experience even more unique!

*Schedulers: solutions that allow the launch of all processes automatically (for example, in continuous mode, at a set hour each day or every week).


Translated from the original French by Charles Rogers.

Would you like another cup of tea?