Private attribute inference from Facebook's public text metadata: A case study of Korean users

Daeseon Choi, Younho Lee, Seokhyun Kim, Pilsung Kang

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)


Purpose - As the number of users on social network services (SNSs) continues to increase at a remarkable rate, privacy and security issues are consistently arising. Although users may not want to disclose their private attributes, these can be inferred from their public behavior on social media. In order to investigate the severity of the leakage of private information in this manner, the purpose of this paper is to present a method to infer undisclosed personal attributes of users based only on the data available on their public profiles on Facebook. Design/methodology/approach - Facebook profile data consisting of 32 attributes were collected for 111,123 Korean users. Inferences were made for four private attributes (gender, age, marital status, and relationship status) based on five machine learning-based classification algorithms and three regression algorithms. Findings - Experimental results showed that users' gender can be inferred very accurately, whereas marital status and relationship status can be predicted more accurately with the authors' algorithms than with a random model. Moreover, the average difference between the actual and predicted ages of users was only 0.5 years. The results show that some private attributes can be easily inferred from only a few pieces of user profile information, which can jeopardize personal information and may increase the risk to dignity. Research limitations/implications - In this paper, the authors' only utilized each user's own profile data, especially text information. Since users in SNSs are directly or indirectly connected, inference performance can be improved if the profile data of the friends of a given user are additionally considered. Moreover, utilizing non-text profile information, such as profile images, can help increase inference accuracy. The authors' can also provide a more generalized inference performance if a larger data set of Facebook users is available. Practical implications - A private attribute leakage alarm system based on the inference model would be helpful for users not desirous of the disclosure of their private attributes on SNSs. SNS service providers can measure and monitor the risk of privacy leakage in their system to protect their users and optimize the target marketing based on the inferred information if users agree to use it. Originality/value - This paper investigates whether private attributes of SNS users can be inferred with a few pieces of publicly available information although users are not willing to disclose them. The experimental results showed that gender, age, marital status, and relationship status, can be inferred by machine-learning algorithms. Based on these results, an early warning system was designed to help both service providers and users to protect the users' privacy.

Original languageEnglish
Pages (from-to)1687-1706
Number of pages20
JournalIndustrial Management and Data Systems
Issue number8
Publication statusPublished - 2017

Bibliographical note

Funding Information:
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B0 3930729), National Research Foundation of Korea (NRF) grant funded by the Korean Government (Ministry of Science, ICT & Future Planning (MSIP)) (NRF-2015R1A2A2A04007359, NRF-2016 R1A4A1011761), and Institute for Information & communications Technology Promotion (IITP) grant funded by the Korean Government (MSIP) (No. 2017-0-00349, Development of media streaming system with machine learning using quality of experience (QoE)).

Publisher Copyright:
© 2017 Emerald Publishing Limited.


  • Age
  • Facebook
  • Gender
  • Machine learning
  • Marital/relationship status
  • Private attribute

ASJC Scopus subject areas

  • Management Information Systems
  • Industrial relations
  • Computer Science Applications
  • Strategy and Management
  • Industrial and Manufacturing Engineering


Dive into the research topics of 'Private attribute inference from Facebook's public text metadata: A case study of Korean users'. Together they form a unique fingerprint.

Cite this