Performance Evaluation of Some Machine Learning Models for Music Genre Classification
Article Sidebar
Main Article Content
Abstract: Music genre classification is a challenging task in the field of music information retrieval due to the overlapping characteristics of certain genres and the variability in audio quality. Several techniques have been developed to accurately classify music genre. However, these techniques have not been adequately analysed and compared. Hence, this study investigates the comparative performance of Convolutional Neural Network (CNN), Support Vector Machine (SVM), and Random Forest (RF) in music genre classification.
Mel-Frequency Cepstral Coefficients (MFCCs) were extracted from the audio samples using the Librosa library. Next, the three machine learning models - Convolutional Neural Network (CNN), Support Vector Machine (SVM) and Random Forest (RF) - were trained. The CNN model was designed with multiple convolutional and pooling layers, along with dropout for regularization. The SVM model was used to create an optimal hyperplane for classification, while the RF model utilized an ensemble of decision trees. Finally, the models were evaluated and compared using accuracy, precision, recall and F1 score.
The results of the evaluation and comparism indicate that CNN achieved 95% accuracy, 93% precision, 92% recall and 91% F-1 score. SVM achieved 93% accuracy, 90% precision, 80% recall and 70% F-1 score while RF achieved 77% accuracy, 77% precision, 72% recall and 60% F-1 score.
The result demonstrated that CNN outperformed SVM and RF interms of accuracy, precision, recall, and F-1 score: CNN is thereby recommended for Music Genre Classification, this finding underscore the efficiency of CNN addressing the challenges task in the field of music information retrieval and leading to the advancement of automated music classification system and improve the accessibility and enjoyment of digital music libraries.
Downloads
Downloads
References
Bergstra, J., Casagrande, N., Erhan, D., Eck, D., & Kégl, B. (2006). Aggregate features and AdaBoost for music classification. Machine Learning, 65(2), 473-484. DOI: https://doi.org/10.1007/s10994-006-9019-7
Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., & Slaney, M. (2008). Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE, 96(4), 668-696. DOI: https://doi.org/10.1109/JPROC.2008.916370
Choi, H., Sohn, K., & Kim, J. (2017). Transfer Learning for Music Genre Classification Using Deep Neural Networks. Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 632–637.
Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: https://doi.org/10.1109/ICASSP.2014.6854950
Essid, S., Richard, G., & David, B. (2006). Musical instrument recognition by pairwise classification strategies. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1401 DOI: https://doi.org/10.1109/TSA.2005.860842
Ghildiyal, A., Singh, K., & Sharma, S. (2020). Music genre classification using machine learning. In 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA) (pp. 1368–1372). doi: https://doi.org/10.1109/ICECA48744.2020.9271215 DOI: https://doi.org/10.1109/ICECA49313.2020.9297444
Humphrey, E. J., Bello, J. P., & LeCun, Y. (2012). Moving beyond feature design: Deep architectures and automatic feature learning in music informatics. Proceedings of the 13th International Society for Music Information Retrieval Conference.
Li, X., Ogihara, M., & Kitahara, I. (2003). A support vector machine classifier for music genre classification. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '03, 2003 (Vol. 3, pp. III–53–56).
Lidy, T., & Rauber, A. (2005). Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. Proceedings of the 6th International Conference on Music Information Retrieval.
Logan, B., & Salomon, A. (2001). Music classification by tempo and beat-occurrence features. IEEE Transactions on Multimedia, 3(3), 341–348.
Ndou, N., Ajoodha, R., & Jadhav, A. (2021). Music genre classification: A review of deep learning and traditional machine-learning approaches. In 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS) (pp. 351–356). doi:https://doi.org/10.1109/IEMTRONICS52076.2021.9455366 DOI: https://doi.org/10.1109/IEMTRONICS52119.2021.9422487
Ren, D., Liu, Y., & Li, J. (2019). Musical genre classification based on Gaussian mixture models of spectral and temporal features. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6939–6943). IEEE.
Schlüter, J., & Böck, S. (2014). Improved musical onset detection with convolutional neural networks. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: https://doi.org/10.1109/ICASSP.2014.6854953
Sturm, B. L. (2014). A simple method to determine if a music information retrieval system is a “horse”. IEEE Transactions on Multimedia, 16(6), 1636-1644. DOI: https://doi.org/10.1109/TMM.2014.2330697
Turnbull, D., Renals, S., & Gillett, M. (2003). Towards content-based classification of popular music using linguistic and audio features. IEEE Transactions on Audio, Speech, and Language Processing, 11(6), 709–717.
Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293-302. DOI: https://doi.org/10.1109/TSA.2002.800560
West, R., Schnitzer, D., & Brown, G. (2002). Probabilistic classification and segmentation of audio data using hidden Markov models. Journal of New Music Research, 31(2), 203 214.
Zhang, Y., & Yang, Q. (2018). An overview of multi-task learning. National Science Review, 5(1,30-43. DOI: https://doi.org/10.1093/nsr/nwx105

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in our journal are licensed under CC-BY 4.0, which permits authors to retain copyright of their work. This license allows for unrestricted use, sharing, and reproduction of the articles, provided that proper credit is given to the original authors and the source.