Ahhh, big data. This concept has revolutionised the business world, and determines the strategies of many companies who must follow the trend as best they can. But big data has also created a whole new range of experts. A whole new series of jobs has emerged, including data analysts, data scientists, and data engineers.
Though the Harvard Business Review called being a data scientist “the Sexiest Job of the 21st Century” (and data analysts are not the only ones mentioned, Antoine) the job still remains mysterious. So what does being a data scientist mean today? And what can they bring to a business?
First of all, what purpose does data science serve?
Today, collecting and analysing data can be applied to various domains, which leads to multiple challenges for professionals across industries. Online merchants might wish to create personalised discount coupons or product recommendations, while energy providers, insurance, and telecom firms might want to identify clients likely to terminate their contracts (or “churn”). Data science can help – though players must keep a critical view of the limitations in what kind of models can be established. Concretely, data exploitation raises ethical questions from the general public and professionals alike, particularly on subjects like automation or even artificial intelligence. In the United States, for example, a pair of Stanford researchers declared that they had managed to determine whether or not a person is homosexual based on an algorithm that analyses facial images using “deep neural networks.”. This example shows that using certain data, especially personal data, can be used for any purpose, for better or for worse. When algorithms are used to help with decision-making, we must thus be careful about possible interpretations and uses.
Where does the data scientist come in?
Data scientists can be seen as “data whisperers”. Once the raw data has been gathered, it must be processed and analysed to see what it “says” in order to guide decision-making. A data scientist does more than just applying a statistical model, then, as he or she must learn from the data. The skills of a data scientist include a savvy mix of maths (statistics or even machine learning), computer science, as well as a good business understanding, and notions in marketing.
I’ve been a data scientist at fifty-five for over two years now, and have worked on a wide range of issues. Each new project has its own specificities, but the methodology remains pretty similar. First of all, we have to understand what our client needs, defining the stakes and the limitations of the subject with business teams and marketers. Next comes the feasibility study for the project, with a team of engineers which is in charge of collecting data and verifying that it is reliable. If data isn’t collected properly, or if the volume is insufficient, it will be impossible to draw any conclusions. Incidentally, this is where some companies are not yet developed enough. Our job is to show them that data quality is paramount to getting a data project off the ground.
What happens once the project has been approved?
A descriptive analysis phase allows us to identify outliers, as well as an initial idea of major trends. For example, if a user has browsed 1000 pages of an online store in one day, even though the average is 10 pages per day per user, it could indicate that this user was a robot and the related data can be excluded from our dataset. It is essential that we are in close contact with the company during this step, as they have the best knowledge of their own industry. Our knowledge of banking, insurance, or retail cannot come close to the expertise of our clients. It is by working together that we can identify the best paths to explore.
Lastly, once the data has been analysed and cleaned up, the modelling can begin. Modelling consists of extracting and automating rules for decisions from our database, using machine learning algorithms.
Let’s look at a real example, to clarify.
An advertiser can come to fifty-five if it wants to qualify its audience – that is to say, if it wants to enrich the quantity and quality of the information that it has about this audience. For example, perhaps the advertiser wants to learn the engagement level for each user, or find socio-demographic patterns. To do this, the data scientist might use clustering, which creates homogeneous user groups based on defined variables, pre-selected with the advertiser. Once these groups have been created, we can define targeted activation strategies. Using browsing data, an online merchant could for instance group its users into “families” based on their profiles and their interest in a certain product type, in order to offer a customised user experience.
Each client request has more than one solution! Analysing results helps data scientists and their teams to establish different areas to exploit or not, depending on client objectives and strategy, which makes each project unique. So, do you think being a data scientist is sexy? You’re the judge!