Scroll to top of page

How we predicted the box-office revenue of "The Dark Tower”

Netflix is well-known for its data-driven approach to content creation, as seen with by the success of the $100m production “House of Cards” and Oscar-winning documentary “The Square”. This video-on-demand service provider, which disrupted the market in recent years, can rely on the data its 104 million subscribers generate. But who says you need 104 million subscribers to predict the success of a movie?

Using open data available to the public, we worked with MSc Business Analytics students at Imperial College Business School in London, to try and predict the commercial performance and critical reception of a movie.

How did we do it?
And, more interestingly, could content creators easily use open data to adopt a Netflix-like data-driven approach?

Step 1: We collected, structured and processed masses of data in a time-efficient manner

First of all, let’s define what we mean by “open data.” Open data is data that can be accessed freely by anybody and that generally does not contain personally identifiable information (PIIPIIPersonally Identifiable Information (PII) are specific information that allow an individual to be identified in a direct or indirect way. It can be his full name, address, email, birth date, or sets of anonymous data that make it possible to identify him. Learn more). The open data sources we used for this project are IMDB, The Numbers Databases, Box Office Mojo, and FXTOP for currency conversion. From these sources, we collected data points from 11,000 movies and classified them based on more than 300 criteria, such as:

  • popularity of actors and directors, based on number of movies and awards, number of likes and retweets, or career momentum
  • movie genre and size of the target audience; thrillers and dramas have wider mass appeal than documentaries and film noir
  • actor face recognition on promotional posters
  • production studios past successes
  • number of trailers
  • country of origin
  • age restriction
  • film duration
  • keywords extracted from text mining the movie description
  • contextual trends linked to release date and concurrent exchange rates

Step 2: We used machine learningmachine learningMachine learning is an artificial intelligence system that is based on the learning ability of algorithms. As this learning process relies on the repetition of an action, the accuracy of the results produced by machine learning algorithms improves over time.Learn more to build and iterate on our predictive model

Where human expertise and manpower would have been limited, these millions of data points were processed in just a few seconds using machine learning, and the algorithmalgorithmAn algorithm is a mathematical process designed to solve a problem or to obtain a result, using a finite number of operations. It can be translated into a computer program thanks to a programming language.Learn more drew correlations between parameters that could not have been seen manually.

Step 3: We applied the model to the US adaptation of Stephen King’s novel “The Dark Tower”

A few weeks later, we came up with a model based on a hundred variables that was supposedly twice as good as a simple rule-based model (e.g. based solely on past success of actors, directors and genres of the movie) in predicting box office success. In a fraction of a second, the model could predict the US box office performance of any given movie.

We decided to give it a go on a movie that had not been released yet: “The Dark Tower” by Nikolaj Arcel, starring Idris Elba and Matthew McConaughey, and with a budget of $60 million.

We predicted a total gross revenue of $70 million in the US. However, three months later the film had earned only $50 million.

We’re willing to admit that ours might not be as sophisticated as Netflix's model, which can dig into huge amounts of behavioural data... But does this mean that our model is faulty? Not necessarily, but to be sure we would have to test it out on hundreds of movies to assess its actual performance. Perhaps we should have taken more data history into account, or even added more data sources.

In short: building a predictive model is not easy, and there’s no magic recipe! It is a continuous process of iteration, testing, and learning, and it takes time. But every little helps, so if you’re reading this from the States would you please go and buy your ticket for “The Dark Tower”? :)

Want to learn more? Get in touch!

07-12-2017

close legal

À propos

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a venenatis dolor, non ornare ligula. Nam ultricies elementum tellus, sed pulvinar libero egestas nec. Fusce facilisis nulla vestibulum, commodo neque eget, dapibus lacus. Aliquam neque felis, sagittis nec consequat sed, commodo ac ipsum. Sed neque tortor, semper quis viverra et, malesuada et eros. Donec at dui ut ligula pharetra aliquet. Etiam dapibus semper orci. Integer efficitur dolor tortor, nec mattis elit placerat vel. Ut nulla enim, lacinia in pharetra id, convallis vitae massa. Donec neque est, tincidunt non ullamcorper commodo, tincidunt non turpis. Pellentesque viverra enim a sapien placerat, ut volutpat mauris condimentum. Proin tincidunt sollicitudin dui, sit amet condimentum ante commodo a. Aenean posuere aliquam purus, sed aliquam magna sagittis finibus. Morbi molestie feugiat feugiat. Phasellus tempus in dolor vel maximus. Cras efficitur sagittis lorem porta iaculis. Maecenas sed hendrerit urna. In mattis posuere purus, sit amet placerat arcu posuere quis. Etiam nec arcu nec magna interdum maximus. Integer sit amet lacus neque. Curabitur interdum molestie magna, in scelerisque tellus iaculis sed. Sed nec metus ut purus efficitur laoreet a quis eros. Proin dui dui, dignissim eget risus sit amet, bibendum condimentum velit. Maecenas in justo eu elit eleifend consectetur. Aenean scelerisque fringilla sollicitudin. Nam sem nibh, pharetra nec lacus non, mollis interdum odio. Aliquam sollicitudin posuere nibh sed eleifend.

Édition

55 SAS, 5 — 7 rue d'Athènes

75009 Paris

+33 1 76 21 91 37

Hébergement

OVH SAS

2, rue Kellermann

59100 Roubaix

+33 8 20 69 87 65

Publication

Lan Anh Vu Hong

Crédits photo

Mats Carduner, Adobe Stock & Unsplash

Vous avez aimé nos nouvelles fraîches sur l'état du marché brandtech ? Inscrivez vous à notre newsletter