
My Role
Team lead -- Architecture Design, Data Preparation, Model Training
Team
-
Tech stacks
Python | TensorFlow | Pandas | CNN | LSTM
Overview
Develop a speech emotion recognition model using two different approaches which are convolutional neural network and Long Short Term Memory (LSTM) algorithm and compare the result between these two deep learning methods.
My Role
Team lead -- Architecture Design, Data Preparation, Model Training
Team
-
Tech Stacks
Python | TensorFlow | Pandas | CNN | LSTM
Overview
Develop a speech emotion recognition model using two different approaches which are convolutional neural network and Long Short Term Memory (LSTM) algorithm and compare the result between these two deep learning methods.
Architecture
This architecture enables accurate emotion classification by combining advanced data augmentation techniques, efficient preprocessing with MFCC feature extraction, and the powerful capabilities of LSTM and CNN models. The system seamlessly integrates data collection, processing, and training, allowing for rapid and scalable development while maintaining high accuracy in real-world scenarios.

Diagram of the Speech Emotion Recognition system model showcasing data flow and model components.
Model Performance
The LSTM model achieves an accuracy of 61.018%, demonstrating its ability to classify emotions in speech data. The model's precision, recall, and F1 score are 0.6806, 0.5533, and 0.6806 respectively, showcasing a balanced performance. Training and testing loss and accuracy graphs indicate consistent learning, although improvements can be made for higher generalization.

Graphs displaying training/testing loss and accuracy over epochs, along with performance metrics table.
Model Performance
The CNN model achieves an outstanding accuracy of 96.52%, demonstrating its strong capability in emotion recognition tasks. With precision, recall, and F1 scores of 0.9664, 0.9643, and 0.9653 respectively, it showcases exceptional performance across key metrics. The training and testing loss graphs indicate smooth convergence, while the accuracy graphs highlight robust generalization.

Graphs displaying training/testing loss and accuracy over epochs, along with performance metrics table.