Affective – 3-D emotional model for speech

Related work: Lui, S. 2013. “A preliminary analysis of the continuous axis value of the three-dimensional PAD speech emotional state model”. The 16th edition of the International Conference on Digital Audio Effects (DAFx), Maynooth, Ireland. [session chair]

[Download the paper] 

 

This is the traditionally used 2-D emotional model. The two axes namely Arousal (Energy) and Valence .

Screen Shot 2012-05-02 at 11.31.04 AM

 

When the 2-D model is applied to classify among the Big Six emotions (Joy, Angry, Fear, Disgust, Bored, Sad) and neutral: it cannot classify fear and disgust very well. 

So we propose to use a 3-D PAD model, with the 3rd axis: Aggressiveness. After several preliminary experiments, we define it as the fluctuation of the 2nd to 6th Log Frequency Power Coefficient (LFPC).

LUI_formula01

We use a German Speech database with 800 clips for training. The result is as follow:

LUI_figure01a

 

 

Figure 1. 3D view of the average value of the 800 emotional german speech clips.

LUI_figure01b

 

 

Figure 2. another perspective of Figure 1.

LUI_figure02

Figure 3. Aggressiveness of four negative emotion (400 clips)

 

Figure 4 shows that the classification result is around 81%. There is a significant improvement on Fear and Disgust. It is because by only using the energy and valence axis from the 2-D model, most other people can already classify all the other emotions except Fear and Disgust (since they are located in almost the same position in the 2-D model). We defined the 3rd axis which can separate Fear and Disgust apart, hence we are doing much better on Fear and Disgust than the others.

LUI_table07

Figure 4. Classification result.

Figure 5 shows that the three axes are quite orthogonal to each other, but there are room for improvement.

LUI_table05

Figure 4. PCC orthogonality of the three axes.