TY - GEN
T1 - Label Quality in AffectNet
T2 - 6th Asian Conference on Pattern Recognition, ACPR 2021
AU - Kim, Doo Yon
AU - Wallraven, Christian
N1 - Funding Information:
Acknowledgments. This work was supported by Institute of Information Communications Technology Planning Evaluation (IITP; No. 2019-0-00079, Department of Artificial Intelligence, Korea University) and National Research Foundation of Korea (NRF; NRF-2017M3C7A1041824) grant funded by the Korean government (MSIT).
Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - AffectNet is one of the most popular resources for facial expression recognition (FER) on relatively unconstrained in-the-wild images. Given that images were annotated by only one annotator with limited consistency checks on the data, however, label quality and consistency may be limited. Here, we take a similar approach to a study that re-labeled another, smaller dataset (FER2013) with crowd-based annotations, and report results from a re-labeling and re-annotation of a subset of difficult AffectNet faces with 13 people on both expression label, and valence and arousal ratings. Our results show that human labels overall have medium to good consistency, whereas human ratings especially for valence are in excellent agreement. Importantly, however, crowd-based labels are significantly shifting towards neutral and happy categories and crowd-based affective ratings form a consistent pattern different from the original ratings. ResNets fully trained on the original AffectNet dataset do not predict human voting patterns, but when weakly-trained do so much better, particularly for valence. Our results have important ramifications for label quality in affective computing.
AB - AffectNet is one of the most popular resources for facial expression recognition (FER) on relatively unconstrained in-the-wild images. Given that images were annotated by only one annotator with limited consistency checks on the data, however, label quality and consistency may be limited. Here, we take a similar approach to a study that re-labeled another, smaller dataset (FER2013) with crowd-based annotations, and report results from a re-labeling and re-annotation of a subset of difficult AffectNet faces with 13 people on both expression label, and valence and arousal ratings. Our results show that human labels overall have medium to good consistency, whereas human ratings especially for valence are in excellent agreement. Importantly, however, crowd-based labels are significantly shifting towards neutral and happy categories and crowd-based affective ratings form a consistent pattern different from the original ratings. ResNets fully trained on the original AffectNet dataset do not predict human voting patterns, but when weakly-trained do so much better, particularly for valence. Our results have important ramifications for label quality in affective computing.
KW - AffectNet
KW - Affective computing
KW - Crowd annotation
KW - Facial expression recognition
UR - http://www.scopus.com/inward/record.url?scp=85130309381&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-02444-3_39
DO - 10.1007/978-3-031-02444-3_39
M3 - Conference contribution
AN - SCOPUS:85130309381
SN - 9783031024436
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 518
EP - 531
BT - Pattern Recognition - 6th Asian Conference, ACPR 2021, Revised Selected Papers
A2 - Wallraven, Christian
A2 - Liu, Qingshan
A2 - Nagahara, Hajime
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 9 November 2021 through 12 November 2021
ER -