Box office forecasting using machine learning algorithms based on SNS data

Taegu Kim, Jungsik Hong, Pilsung Kang

Research output: Contribution to journalArticlepeer-review

89 Citations (Scopus)


We propose a novel approach to the box office forecasting of motion pictures using social network service (SNS) data and machine learning-based algorithms. We begin by providing a comprehensive survey of the forecasting algorithms and explanatory variables used in the motion picture domain. Because of the importance of forecasting in early periods, we develop three sequential forecasting models for predicting the non-cumulative and cumulative box office earnings: (1) prior to, (2) a week after, and (3) two weeks after release. The numbers of SNS mentions and their weekly trends are used as input variables in addition to the screening-related information. A genetic algorithm is adopted for determining significant input variables, whereas three machine learning-based nonlinear regression algorithms and their combinations are employed for building forecasting models. Experimental results show that the utilization of SNS data, machine learning-based algorithms and their combination made noticeable improvements to the forecasting accuracies of all the three models.

Original languageEnglish
Pages (from-to)364-390
Number of pages27
JournalInternational Journal of Forecasting
Issue number2
Publication statusPublished - 2015 Apr 1

Bibliographical note

Funding Information:
This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT, and Future Planning ( NRF-2014R1A1A1004648 ).

Publisher Copyright:
© 2014 International Institute of Forecasters.


  • Box office earning forecast
  • Forecast combination
  • Genetic algorithm
  • Machine learning
  • Social network service

ASJC Scopus subject areas

  • Business and International Management


Dive into the research topics of 'Box office forecasting using machine learning algorithms based on SNS data'. Together they form a unique fingerprint.

Cite this