Abstract
Understanding emotional states is pivotal for the development of next-generation human-machine interfaces. Human behaviors in social interactions have resulted in psycho-physiological processes influenced by perceptual inputs. Therefore, efforts to comprehend brain functions and human behavior could potentially catalyze the development of AI models with human-like attributes. In this study, we introduce a multimodal emotion dataset comprising data from 30-channel electroencephalography (EEG), audio, and video recordings from 42 participants. Each participant engaged in a cue-based conversation scenario, eliciting five distinct emotions: neutral, anger, happiness, sadness, and calmness. Throughout the experiment, each participant contributed 200 interactions, which encompassed both listening and speaking. This resulted in a cumulative total of 8,400 interactions across all participants. We evaluated the baseline performance of emotion recognition for each modality using established deep neural network (DNN) methods. The Emotion in EEG-Audio-Visual (EAV) dataset represents the first public dataset to incorporate three primary modalities for emotion recognition within a conversational context. We anticipate that this dataset will make significant contributions to the modeling of the human emotional process, encompassing both fundamental neuroscience and machine learning viewpoints.
| Original language | English |
|---|---|
| Article number | 1026 |
| Journal | Scientific Data |
| Volume | 11 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - Dec 2024 |
Funding
This work is supported by the Ministry of Science and Higher Education of the Republic of Kazakhstan for Prof. Dr. Adnan Yazıcı under the grant titled “Smart-Care: Innovative Multi-Sensor Technology for Elderly and Disabled Health Management” (AP23487613, duration 2024-2026) and by the Faculty Development Competitive Research Grant Programs of Nazarbayev University with reference number 20122022FD4109: “Intention Estimation from Behavior and Emotional Expression”. This work was also partially supported by the National Research Foundation of Korea (NRF) grant funded by the MSIT (No. 2022-2-00975, MetaSkin: Developing Next-generation Neurohaptic Interface Technology that enables Communication and Control in Metaverse by Skin Touch) and the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant, funded by the Korea government (MSIT) (No. 2019-0-00079, Artificial Intelligence Graduate School Program (Korea University)).
| Funders | Funder number |
|---|---|
| National Research Foundation of Korea | |
| Artificial Intelligence Graduate School Program | |
| Korea University | |
| Ministry of Education and Science of the Republic of Kazakhstan | 2024-2026, AP23487613 |
| MSIT | 2022-2-00975 |
| Nazarbayev University | 20122022FD4109 |
| Institute for Information and Communications Technology Promotion | 2019-0-00079 |
ASJC Scopus subject areas
- Statistics and Probability
- Information Systems
- Education
- Computer Science Applications
- Statistics, Probability and Uncertainty
- Library and Information Sciences
Fingerprint
Dive into the research topics of 'EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS