Can ChatGPT-4v Recognize Emotion Expressions?
ChatGPT-4v, a recent iteration of OpenAI's advanced AI chatbot, has taken the world by storm with its ability to generate human-like text responses. While it excels in various text-based tasks, including surpassing human performance in exams such as the BAR, LSAT, and SAT, its capabilities in image-based tasks, particularly emotion recognition, remain uncertain. Here, I provide an important first examination of whether ChatGPT-4v can serve as a sufficient replacement for humans in tasks involving the interpretation of emotion expressions.
In 3 studies I explore ChatGPT-4v's ability to recognize emotion expressions, including facial expressions (Study 1), bodily expressions (Study 2), and multimodal expressions (i.e., face-and-body; Study 3). Studies include expressions posed by trained actors demonstrating scientifically-based nonverbal displays (Studies 1 and 2), and spontaneous real-world expressions captured in-the-wild (Study 3).
Basic Method across all studies
In each study ChatGPT-4v was queried using OpenAI’s API. Each query included an image from an emotion expression database, paired with the following prompt:
“What is the emotion label that best characterizes the expression being displayed in this image? Select from these options, and only respond with one word: Fear, Anger, Sadness, Surprise, Happiness, Disgust, Neutral, None of the above”
Each study reports ChatGPT-4v’s emotion recognition accuracy for each emotion category in each database. I compare ChatGPT-4v’s recognition performance to that of humans, retrieved from past research. In the following graphs, ChatGPT-4’s performance is visualized as blue columns, with human performance measured from the same stimuli visualized as red horizontal lines. Horizontal dotted lines in each graph represent the rate at which ChatGPT-4v would be expected to make an accurate selection if it was randomly selecting based on the provided answer choices (i.e., “chance”).
Study 1: Posed Facial Expressions
The first study focused on ChatGPT-4v's ability to recognize prototypical facial expressions of emotion. Images in this study were retrieved from the Warsaw Set of Emotional Facial Expression Pictures (WSEFEP; Olszanowski et al., 2015), a high-quality database of facial expressions coded using the Facial Action Coding System (FACS). Examples of photographs from this database can be found below:
Results
ChatGPT-4 recognized fear, anger, and sadness at rates significantly lower than humans, but recognized disgust, happiness, and surprise at rates comparable to human performance. ChatGPT-4 was especially poor at recognizing fear expressions (23%), which it frequently inaccurately identified as surprise (77%).
Study 2: posed Bodily Expressions
The second study examined ChatGPT-4v's ability to recognize emotions from posed bodily expressions of emotion. Images in this study were retrieved from the Bodily Expressive Action Stimulus Test (BEAST; De Gelder & Van den Stock, 2011). This database includes whole-body posed expressions (with the face blurred out) of four emotions: anger, fear, happiness, and sadness. Examples of photographs from this database can be found below:
Results
ChatGPT-4v’s ability to recognize bodily expressions of emotion was inferior to that of humans, for all emotion expressions. ChatGPT-4v failed to recognize bodily expressions of anger, happiness, and sadness (< 6%), instead misclassifying them as neutral (82%, 76%, 93%, respectively). Although fear was recognized at slightly higher rates (46%), it was still often confused with surprise or neutral.
Study 3: Multimodal in-the-wild Expressions
The third study examined ChatGPT-4's ability to recognize multimodal (i.e., face-and-body) real-world emotion expressions of fear and anger. This expression database (created by Abramson et al., 2009), includes photos of emotion expressions retrieved from real-life situations including fan brawls, protests, and haunted houses. Examples of photographs from this database can be found below:
Results
ChatGPT-4v recognized multimodal expressions of anger and fear at only modest rates (65% and 38%, respectively) that were significantly worse than humans. Consistent with previous studies, ChatGPT-4v confused fear with surprise.
Conclusion
Overall, ChatGPT-4v lags behind human performance in its ability to recognize emotion expressions. This was especially pronounced for bodily and multimodal expressions, and for fear expressions presented in any modality (which were consistently inaccurately identified as surprise). These insights serve as a cautionary tale for integrating ChatGPT-4v into applications requiring nuanced emotional perception. As LLMs continue to evolve, ongoing research will be crucial to bridging the gap between human and AI capabilities in emotion recognition.