Understanding Emotions in Speech

People express their feelings not just through words, but also through the way they talk - like the tone of their voice and how fast or slow they speak. Studies have shown that across different cultures, basic emotions like anger, joy, and sadness are expressed in similar ways through speech. We use something called spectrograms, which are pictures that show us the different sounds in speech, to teach our computer program to understand these emotions.


In our project, we use fancy computer techniques to figure out how people feel based on how they talk. We’ve trained a special computer program called a Convolution Neural Network to recognize seven different emotions from spoken words. This is important because emotions are not just about what you say, but also how you say it.

Technologies Used

Challenges Faced During Model Training

  • Making Computers Understand Feelings: Teaching computers to understand emotions from speech is like teaching a robot to understand human feelings. It’s not just about the words; it’s about how they’re said.
  • Different-Length Speech Samples: Imagine trying to compare drawings of different sizes. Similarly, our computer program struggles because speech recordings come in different lengths, making it hard to compare them.
  • Not Enough Emotion Examples: We don’t have many recordings of people expressing different emotions. It’s like trying to learn about different animals but only having pictures of one kind of animal.
  • Similar-Sounding Emotions: Some emotions sound very similar when people talk. It’s like trying to tell the difference between laughing and crying just by listening to the sounds.
  • Technical Stuff: We also deal with technical stuff like making sure the computers can handle the sounds properly and that our programs run smoothly. It’s like keeping the machines happy so they can do their job well.

How We Trained Our Model:

  • Diverse Sound Mix-Up: We played with different sounds to make our computer understand emotions better. It’s like giving it a variety of toys to play with so it can learn more.
  • Equal Sound Sizes: We made all the speech recordings the same length so the computer doesn’t get confused. It’s like making all the puzzle pieces the same size so they fit together perfectly.
  • Create More Emotion Examples: We made more recordings of people expressing different emotions. It’s like collecting more pictures of different animals to learn about them better.
  • Fine-Tune Listening Skills: We taught the computer to listen more carefully to pick up subtle differences between similar-sounding emotions. It’s like training your ears to distinguish between different types of music.
  • Technical Tweaks: We made sure the computers could handle the sounds properly and fixed any problems with our programs. It’s like making sure the machines are well-oiled and running smoothly so they can do their job properly.

Featured Images


After trying out lots of different ways, we found that one method worked really well, especially on a certain dataset called RAVDESS. This method helped our computer program get better at recognizing emotions. But it didn’t work as well on another dataset called TESS, which was already quite easy for the computer.


Our project shows that computers can learn to understand emotions from speech, but there’s still a lot of work to be done. In the future, we want to study more natural speech and use bigger datasets to improve our program even more. We also want to try out new techniques to make our computer program smarter.