On Eigendecomposition-based Algorithms as Feature Extraction Techniques Used with Hidden Markov Model for the Detection of Whale Vocalisations

Whales use various unique sound signals for communication, echolocation, and social activities. These sounds are gathered using passive acoustic monitoring (PAM). Scientists have developed automated methods to analyse PAM data and identify different whale species. One common approach is using hidden Markov models (HMM). In his dissertation, Dr Ayinde Usman, under supervision of Professor Jaco Versfeld, explores a new way to improve HMM-based whale detection by adding feature extraction techniques based on eigendecomposition. 

Background and Methodology

The research conducted a principal components analysis (PCA) and dynamic mode decomposition (DMD) to uncover hidden patterns in whale sounds from PAM data. These feature extraction techniques were then enhanced using kernel methods.

The research also introduced new “ED-based hidden Markov models” (ED-HMMs), including PCA-HMM, PCAHMM, DMD-HMM, and kDMD-HMM. These models are grouped based on their underlying algorithm used for feature extraction (FE):

  • PC-based hidden Markov models (PC-HMMs)
  • DMD-based hidden Markov models (DM-HMMs)

The research tested each model on PAM datasets containing sounds from southern right whales (SRW) and humpback whales (HW). The performance of each model was evaluated using:

  • True positive rate (TPR)
  • Precision (PREC)
  • Error rate (ERR)
  • F1 scores

The research found that the models’ performance can vary depending on several factors, including the size of the feature vectors, the amount of training data available, and the specific whale species being analysed.

Research Results

The models performed well across various evaluation metrics. Within the PC-HMMs, the kPCA-HMM surpassed the PCA-HMM in terms of true positive rate (TPR) and precision (PREC), while also having a lower error rate (ERR). However, the kPCA-HMM is computationally more expensive than the PCA-HMM.

Similarly, for the DM-HMMs, the kDMD-HMM outperformed the DMD-HMM in TPR and PREC, with a lower ERR and computational cost.

The comparison revealed that PC-HMMs reached stable performance faster than DM-HMMs, suggesting that PC-HMMs are less complex in terms of dimension. Despite this, DM-HMMs achieved better overall performance, although they required higher dimensions.

The reliability of all developed models was confirmed by their F1 scores, with each model achieving an F1 score greater than 0.9 at its optimal dimension. Lastly, the results of the proposed ED-HMMs are compared with the existing FE techniques used with HMM in the literature for the detection of whale vocalisations. 

Fig 1: The waveform and spectrogram views of humpback whale vocalisations

Recommendations and Future Research

The ED-HMMs demonstrated superior performance compared to existing HMM methods. Generally, all models performed better when trained with a larger number of samples, suggesting the use of large window sizes during training.

The experimental results highlighted that a model’s performance should be evaluated specifically for each species. It’s also crucial that the training data is either a subset of the testing data or, at the very least, comes from recordings in the same region. This helps prevent bias caused by natural variations in vocalisations within the same species.

The proposed ED-HMMs can be tested on other whale vocalisations to confirm their effectiveness further. Additionally, researchers studying the automatic detection of other vocalising animal species may find these models valuable for their work.

Download and read Dr Usman’s complete research at https://scholar.sun.ac.za/items/b51adf63-b495-4bf8-9771-aa7c079f4c46

[/vc_column_text][/vc_column][/vc_row]