INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue II, February 2025
www.ijltemas.in Page 18
Performance Evaluation of Some Machine Learning Models for Music
Genre Classification
Olojede, Ojo Abraham
1;
Stephen Olatunde,
Olabiyisi
2;
Oluwaseun. O Alo
3
1&2
Department of Computer Science Ladoke Akintola University of Technology, Ogbomoso,Oyo state Nigeria
3
Department of Information System Ladoke Akintola University of Technology Ogbomoso, Oyo state, Nigeria
DOI : https://doi.org/10.51583/IJLTEMAS.2025.1402003
Received: 16 February 2025; Accepted: 21 February 2025; Published: 03 March 2025
Abstract: Music genre classification is a challenging task in the field of music information retrieval due to the overlapping characteristics
of certain genres and the variability in audio quality. Several techniques have been developed to accurately classify music genre.
However, these techniques have not been adequately analysed and compared. Hence, this study investigates the comparative performance
of Convolutional Neural Network (CNN), Support Vector Machine (SVM), and Random Forest (RF) in music genre classification.
Mel-Frequency Cepstral Coefficients (MFCCs) were extracted from the audio samples using the Librosa library. Next, the three machine
learning models - Convolutional Neural Network (CNN), Support Vector Machine (SVM) and Random Forest (RF) - were trained. The
CNN model was designed with multiple convolutional and pooling layers, along with dropout for regularization. The SVM model was
used to create an optimal hyperplane for classification, while the RF model utilized an ensemble of decision trees. Finally, the models
were evaluated and compared using accuracy, precision, recall and F1 score.
The results of the evaluation and comparism indicate that CNN achieved 95% accuracy, 93% precision, 92% recall and 91% F-1 score.
SVM achieved 93% accuracy, 90% precision, 80% recall and 70% F-1 score while RF achieved 77% accuracy, 77% precision, 72% recall
and 60% F-1 score.
The result demonstrated that CNN outperformed SVM and RF interms of accuracy, precision, recall, and F-1 score: CNN is thereby
recommended for Music Genre Classification, this finding underscore the efficiency of CNN addressing the challenges task in the field of
music information retrieval and leading to the advancement of automated music classification system and improve the accessibility and
enjoyment of digital music libraries.
Key word: Performance, Evaluation, Machine, music, genre.
I. Background to the Study
The rise of machine learning (ML) and artificial intelligence (AI) has opened new avenues for addressing this challenge. ML algorithms,
particularly those designed to handle complex and high-dimensional data, have shown promise in learning and recognizing patterns in
audio signals (Tzanetakis, et al., 2002). These advancements have made it possible to emulate human-like understanding of music while
leveraging computational efficiency (Sturm, 2014).
Early attempts at automated music genre classification utilized traditional machine learning algorithms such as k-nearest neighbors
(KNN), support vector machines (SVM), and decision trees. These methods achieved moderate success by extracting handcrafted features
from audio signals, such as tempo, pitch, and timbre. However, they often struggled with capturing the intricate and hierarchical nature of
musical compositions (Casey et al., 2008).
Recent advancements in deep learning, particularly convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have
demonstrated superior performance. These models automatically learn hierarchical feature representations from raw audio data, capturing
the nuances of musical elements more effectively (Humphrey, et al., 2012). CNNs excel at learning spatial hierarchies, making them
suitable for processing spectrograms, while RNNs are adept at handling temporal sequences, making them ideal for modeling the
temporal dynamics of music (Dieleman et al., 2014).
Despite the significant progress in music genre classification, several challenges remain. One of the primary challenges is the subjective
nature of genre definitions. Different cultures and communities often have varying interpretations of what constitutes a particular genre,
leading to ambiguities and inconsistencies in genre labels (Lidy et al., 2005). Furthermore, the evolution of music over time results in the
emergence of new genres and subgenres, complicating the classification task (Bergstra et al., 2006).
Another challenge lies in the variability of audio quality and production techniques. Songs within the same genre can exhibit considerable
differences in their audio characteristics due to variations in recording equipment, production styles, and mixing techniques (Essid, et al.,
2006). This variability can introduce noise into the feature extraction process, affecting the performance of classification models.
To address these challenges, researchers have explored various feature extraction methods and data augmentation techniques. Feature
extraction methods aim to capture relevant audio characteristics while minimizing the impact of noise and variability. Techniques such as
Mel-frequency cepstral coefficients (MFCCs), chroma features, and spectral contrast have been widely used to represent audio signals in
a compact and informative manner (Tzanetakis et al., 2002). Data augmentation techniques, on the other hand, involve artificially
increasing the size and diversity of training datasets by applying transformations such as pitch shifting, time stretching, and adding
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue II, February 2025
www.ijltemas.in Page 19
background noise (Schlüter et al., 2014). These techniques help improve the generalization ability of classification models by exposing
them to a broader range of variations.
In addition to feature extraction and data augmentation, ensemble learning methods have shown promise in improving classification
performance. Ensemble learning involves combining multiple models to make predictions, leveraging the strengths of different algorithms
and reducing the risk of overfitting. Techniques such as bagging, boosting, and stacking have been applied to music genre classification,
leading to enhanced robustness and accuracy (Bergstra et al., 2006).
Moreover, recent research has focused on interpretability and explain ability of classification models. Understanding how and why a
model makes certain predictions is crucial for building trust and gaining insights into the underlying characteristics of music genres.
Techniques such as saliency maps, attention mechanisms, and feature importance analysis have been employed to provide explanations
for model predictions, allowing researchers to identify the most influential features and patterns (Choi et al., 2017).
Another emerging area of research is transfer learning, which involves leveraging knowledge from pre-trained models on related tasks to
improve performance on music genre classification. Transfer learning has been particularly effective in scenarios where labeled data is
scarce, as it allows models to benefit from the vast amount of data and knowledge available in other domains (Zhang et al., 2018). By
fine-tuning pre-trained models on specific music datasets, researchers have achieved significant improvements in classification accuracy
and efficiency (Dieleman et al., 2014).
This study aims to build upon these advancements by exploring state-of-the-art machine learning techniques for music genre classification
on performance evaluation of some machine learning model for music genre classification. It seeks to address the limitations of previous
methods and improve the accuracy and robustness of genre classification models. By examining the underlying characteristics of different
music genres and leveraging sophisticated algorithms, this research endeavors to enhance our understanding of music and contribute to
the development of more effective music recommendation and organization systems (Pons, et al., 2016). This study not only contributes
to the academic discourse on music genre classification but also has practical implications for the future of digital music consumption,
offering a deeper understanding of the soul of music through the lens of machine leaching (Zhang et al., 2018).
Statement of the Problem
The music industry has experienced a rapid expansion in the digital age, resulting in an overwhelming volume of music available across
various platforms. As a consequence, efficient organization and categorization of music have become essential for both consumers and
service providers. Traditional methods of classifying music genres rely heavily on human annotation, which is time-consuming,
subjective, and prone to inconsistencies. This manual process struggles to keep pace with the increasing rate at which new music is
produced and distributed.
There is a pressing need for automated systems that can accurately classify music into genres to enhance music recommendation, search
ability, and overall user experience. However, music genre classification presents several challenges, including the inherent overlap
between genres, the complexity of musical features, and the need for robust models capable of handling diverse and extensive datasets.
Objectives of the study
The aim of this research is to develop and evaluate a system for automatically classifying music into different genres based on the
analysis of their audio features.
Objectives
i. Extract relevant audio features from music samples using libraries like Librosa. These features might include Mel-Frequency
Cepstral Coefficients (MFCCs) that capture the timbral characteristics of the music.
ii. Implement and train a machine learning model (SVM, Random Forest, CNN) to classify music genres based on the extracted audio
features.
iii. Compare the models’ performance using metrics classification accuracy, perception recall and F1-score as performance metrics.
Significance of the Study
The significance of this study lies in its potential to transform the landscape of music genre classification and enhance the overall user
experience in digital music platforms. By advancing the capabilities of machine learning models, this research addresses critical
challenges associated with the accurate and robust classification of music genres, which is essential for efficient music organization,
discovery, and recommendation (Tzanetakis et al., 2002). Improved classification systems can lead to more personalized music
recommendations, thereby enriching the user experience and increasing engagement with digital music services (Choi et al., 2017).
Scope of the Study
This study focuses on advancing music genre classification by developing and evaluating state-of-the-art machine learning models,
including deep learning architectures, to address current limitations and challenges. The scope includes a comprehensive review of
existing methodologies, implementation of advanced models such as CNNs and RNNs, and exploration of techniques for feature
extraction and data augmentation. It aims to tackle issues related to genre subjectivity and audio variability, assess model performance in
terms of accuracy and interpretability, and evaluate practical applications for improving user experience in digital music platforms.
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue II, February 2025
www.ijltemas.in Page 20
II. Literature Review
Music Genre Classification
Music genre classification is the process of categorizing music into different genres based on their inherent characteristics and attributes.
This field of study has gained significant attention due to its potential applications in music recommendation systems, retrieval, and
organization. With the ever-growing body of musical content available, efficient and accurate classification methods are essential for
helping listeners navigate and discover new music.
Music genre classification has become a pivotal task in music information retrieval, aiming to automatically categorize music tracks into
predefined genres. The rapid growth of digital music collections necessitates the development of efficient and accurate classification
systems. Traditional approaches relied on manual annotation and rule-based systems, but the advent of machine learning has
revolutionized this field, enabling the automatic extraction of features and classification of music into genres (Tzanetakis et. al, 2002).
In recent years, machine learning and deep learning techniques have become prominent tools for music genre classification.
Ghildiyal,et.al, 2020 proposed a machine learning-based approach. They employed a combination of feature extraction and classification
algorithms, demonstrating the potential of machine learning for this task. Similarly, Ndou, et.al, 2021, presented a comprehensive review
of both deep learning and traditional machine-learning approaches for music genre classification in their 2021 IEEE paper. This review
provided valuable insights into the state-of-the-art methods and their respective strengths and limitations.
The process of music genre classification typically involves several steps, as outlined in the provided information. The first crucial step is
feature extraction, where relevant features and components are extracted from audio files while removing noise. Linguistic content
identification plays a vital role in this process. Once the features are extracted, they can be used to train classifiers or compared with
existing datasets to predict the genre of new, unseen audio files.
Building upon traditional machine learning methods, Elachkar et al, 2021 introduced a novel approach that combines reduction and dense
blocks for music genre classification. The work showcases the potential of hybrid models in improving classification accuracy. In another
innovative approach, Goulart et al, 2011. Proposed a feature extraction method based on entropy fractal features and an SVM classifier.
This method achieved almost 100% accuracy in classifying blues, classical, and lounge genres
Linguistic features refer to the textual content and semantic information associated with music, particularly the lyrics of a song. The
incorporation of linguistic features in music genre classification offers a unique perspective by leveraging the expressive and stylistic
aspects of language. This approach is particularly useful for genres that exhibit distinct lyrical characteristics, such as the use of slang,
themes, or specific vocabulary. By analyzing lyrics, classification models can identify patterns and associations that may not be apparent
from audio signals alone.
Turnbull, et. al, 2003 were among the first to explore the potential of linguistic features in music genre classification. In their seminal
work, they proposed a system that utilized both audio and linguistic content. They employed decision trees as the classification algorithm,
providing interpretability to the decision-making process. Their system considered various linguistic features, including n-grams, word
frequencies, and syntactic patterns, to capture the stylistic nuances of different genres.
Howard, et. al,2011 further advanced the use of linguistic features by focusing on a multilingual setting. They presented a system for
automatic lyrics-based music genre classification that could handle songs with lyrics in multiple languages. This work addressed the
challenges of language variability and demonstrated the effectiveness of linguistic features across different linguistic contexts.
Extracting meaningful features from lyrics is crucial for effective classification. Mayer, et. al, 2008, proposed the use of rhyme and style
features for musical genre classification. They analyzed the rhyme schemes and stylistic elements in lyrics, such as the use of alliteration
or assonance, which could provide discriminative information for certain genres. Ying, et. al,2012 explored genre and mood classification
using lyric features, including word frequencies, part-of-speech tags, and sentiment analysis. They demonstrated that linguistic features
could capture not only genre-specific characteristics but also the emotional content conveyed through lyrics.
Statistical Features in Music Genre Classification
Statistical features play a crucial role in music genre classification by capturing patterns, distributions, and relationships within the data.
These features provide a higher-level representation of the underlying characteristics of music, offering insights that complement spectral
and temporal features. By modeling the statistical properties of audio signals, classification models can uncover valuable information for
discriminating between different genres. This section explores the use of statistical features, their extraction techniques, and their impact
on improving the accuracy and robustness of music genre classification systems
Ren et al., 2018 were among the first to propose the incorporation of statistical features for music genre classification. They computed
various statistical measures from spectral and temporal features, including mean, standard deviation, skewness, and kurtosis. By modeling
the distribution and variability of these features, their system could capture genre-specific patterns that might not be apparent from raw
audio signals. For example, genres like classical music often exhibit different statistical characteristics compared to more dynamic genres
like electronic dance music.
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue II, February 2025
www.ijltemas.in Page 21
Feature Extraction Techniques for Music Genre Classification
Feature extraction plays a pivotal role in music genre classification, as it involves identifying and isolating relevant characteristics from
audio signals that can effectively represent different musical genres. Early approaches to feature extraction focused on hand-crafted
features, leveraging domain knowledge and audio processing techniques to derive meaningful representations.
One of the most widely used feature extraction methods is based on spectral analysis. Spectral features capture the frequency content of
audio signals, providing insights into the tonal and harmonic characteristics of music. Spectral features such as the short-time Fourier
transform (STFT) and the mel-frequency cepstral coefficients (MFCCs) have been extensively employed in music genre classification
tasks. MFCCs, in particular, have proven effective due to their ability to model the human auditory system's perception of sound.
Tzanetakis, et. al., 2002 utilized spectral features, among other hand-crafted features, to train classifiers for genre classification.
Temporal features, as explored by Logan, et.al.,2001 offer another perspective by analyzing the evolution of audio signals over time.
These features capture patterns and variations in characteristics such as tempo, rhythm, and dynamics. By considering the temporal
dimension, classifiers can identify genre-specific trends and patterns that may not be apparent from spectral features alone. Temporal
features are especially useful for distinguishing genres with distinct rhythmic patterns, such as electronic dance music and jazz.
Turnbull, et. al., 2003 introduced a unique approach by incorporating linguistic content into the feature extraction process. They proposed
extracting features from lyrics and textual information associated with the music. This method leveraged the semantic content of songs,
allowing for the identification of themes, moods, and genres based on lyrical similarities. This approach is particularly effective for genres
with distinct lyrical styles, such as rap and country music.
In addition to spectral and temporal features, statistical features have also been utilized for music genre classification. Statistical features
capture patterns and distributions in the audio data, often modeling higher-level characteristics. For example, Ren et al., 2018 proposed a
method that utilized statistical features derived from spectral and temporal information. They computed statistical measures such as mean,
standard deviation, and skewness from spectral and temporal features, enhancing the discriminative power of the classification model.
With the advent of deep learning, feature extraction techniques have evolved to include learned representations. Convolutional neural
networks (CNNs) and recurrent neural networks (RNNs) have been employed to automatically learn relevant features from raw audio data
or spectrograms. These models can capture complex patterns and relationships that may not be apparent to hand-crafted feature extraction
methods. For instance, Choi et al., 2017 proposed a CNN-based architecture that directly learns from spectrogram inputs, achieving state-
of-the-art performance in music genre classification.
Traditional Machine Learning Methods for Music Genre Classification
Traditional machine learning algorithms have played a significant role in the development of music genre classification systems. These
algorithms rely on hand-crafted features and supervised learning techniques to build models that can categorize music into distinct genres.
While deep learning has gained prominence in recent years, traditional machine learning methods continue to offer valuable contributions,
particularly in terms of interpretability and efficiency.
One of the most widely adopted traditional machine learning methods for music genre classification is the support vector machine (SVM).
SVMs are powerful classifiers capable of handling complex decision boundaries and high-dimensional feature spaces. Li, et. al., 2003
were among the early proponents of using SVMs for this task. They proposed a music genre classifier based on SVMs, demonstrating
their effectiveness in modeling non-linear relationships between audio features and genre labels. SVMs have the advantage of being
versatile and capable of handling both binary and multi-class classification problems, making them suitable for the diverse nature of
music genres.
Decision trees and their ensemble variants, such as random forests and gradient boosting machines; have also been successfully applied to
music genre classification. These methods offer interpretability and robustness to noisy data. Turnbull, et,al.,2003, the work incorporating
linguistic content, utilized decision trees as the classification algorithm. Decision trees provide a hierarchical representation of the
decision-making process, making it easier to understand the criteria used for genre classification. Ensemble methods, such as random
forests, improve upon the limitations of individual decision trees by combining multiple trees, leading to enhanced accuracy and
generalization.
Another influential traditional machine learning algorithm is the k-nearest neighbors (kNN) classifier. The kNN algorithm classifies a
new data point based on the majority class among its k nearest neighbors. This method has been particularly useful for music genre
classification due to its simplicity and non-parametric nature. West,et. al., 2002, in their probabilistic approach, employed the kNN
classifier to assign genre labels based on the similarity of audio features. The advantage of kNN lies in its adaptability, as it does not
require explicit model training and can handle dynamic feature spaces effectively.
kNN has been successfully combined with other machine learning algorithms to enhance music genre classification performance.
Turnbull et. al, utilized decision trees as the primary classification algorithm but incorporated kNN for feature selection. kNN was used to
identify the most relevant features, improving the overall accuracy of the system. This hybrid approach leveraged the strengths of both
algorithms, showcasing the benefits of combining multiple techniques.
Liu et al., proposed a hybrid model that integrated kNN with convolutional neural networks (CNNs) for music genre classification. The
kNN component captured local patterns and relationships within the feature space, while the CNN component extracted high-level
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue II, February 2025
www.ijltemas.in Page 22
representations from spectrogram inputs. Their hybrid model achieved superior performance compared to standalone kNN or CNN
architectures.
While traditional machine learning methods have contributed significantly to the field, they also have certain limitations. They often rely
heavily on feature engineering, which can be time-consuming and may not capture all relevant characteristics. Additionally, traditional
methods may struggle with large and high-dimensional feature spaces, which have led to the exploration of more scalable and flexible
deep learning techniques.
Therefore, deep learning has revolutionized the field of music genre classification, offering unprecedented accuracy and scalability. By
leveraging large datasets and advanced neural network architectures, deep learning models can automatically learn relevant features and
patterns from raw audio data, eliminating the need for extensive hand-crafted feature engineering. This section explores the emergence
and impact of deep learning techniques in music genre classification, highlighting their advantages, challenges, and potential future
directions.
III. Methodology
In the process of evaluating of some selected machine learning models for music genre classification and to evaluate their performance,
the following steps were involved;
i. Data Collection: The dataset obtained from Kaggle was preprocessed into a usable array of data. Data filtering, scaling,
normalization, cropping and conversion into Data frame were the preprocessing techniques used on both the training and testing
datasets.
ii. Data Pre-processing: Selection and extraction of relevant features from the dataset was done using Mel-frequency cepstral
coefficients - MFCCs, Spectrograms which allows the selection of only highly correlated features. Chroma Features technique was
also employed to regularized the dataset in order to avoid overfitting
iii. Model Development: The dataset was trained and tested on the selected three algorithms Random Forest, Convolutional Neural
Networks, and Support Vector Machine (SVM) Algorithms
iv. Model Evaluation: Evaluate the performance of the model using Random Forest, CNN, and SVM as performance metrics.
Data Collection
To make the quality of data and improve model accuracy, advanced preprocessing techniques were also employed. For instance,
standardization ensured all features followed a consistent scale with zero mean and unit variance, which is essential for algorithms
sensitive to feature magnitude. The dataset was further split into training and testing sets in a ratio that allowed sufficient data for model
training while reserving a portion for unbiased evaluation. These preprocessing steps collectively ensured that the dataset was free from
inconsistencies, balanced, and suitable for robust machine learning analysis.
Methodology and Implementation
The methodology for developing and evaluating the system for music genre classification involved a structured approach that integrated
data preprocessing, feature engineering, model training, and evaluation. The process began with dataset preparation, which included
cleaning, normalization, and formatting the data into a structured format compatible with machine learning algorithms. Key features such
as Mel-Frequency Cepstral Coefficients (MFCCs), chroma features, and spectrograms were extracted to represent the audio data
effectively. Feature selection techniques, like Grid Search Cross-Validation, were employed to identify the most relevant attributes, while
K-Fold Cross-Validation ensured robust validation by dividing the dataset into multiple folds for training and testing. Python was the
primary programming language used, with libraries such as pandas and NumPy for data manipulation, and librosa for feature extraction
from audio files. This methodology provided a solid foundation for the effective training of the selected machine learning models.
Operation Flowchart
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue II, February 2025
www.ijltemas.in Page 23
Flowchart of music genres
ROC Curve and AUC
The Receiver Operating Characteristic (ROC) curve illustrated the trade-off between true positive and false positive rates at various
threshold levels. The Area Under the Curve (AUC) quantified the model's ability to distinguish between genres, with values closer to 1
indicating better performance.
These metrics were computed for each model using libraries such as scikit-learn, ensuring consistent evaluation across algorithms. The
comparative analysis of these metrics highlighted the strengths and weaknesses of each model, providing insights into their suitability for
music genre classification.
Equation 3.7.4
Where: FP is the number of force positive TN is the number of true Negative
As the threshold for classification varies, both TPR and FPR change, and plotting them result in the ROC curve. The curve shows the
trade- off between sensitivity and specifity for a model across all possible threshold.
Objective Summary: The primary objective of this research was to develop an automated system for classifying music genres, aiming to
enhance the efficiency and accuracy of genre identification using machine learning techniques. Traditional music classification methods
rely heavily on manual categorization or heuristic rules, which can be inconsistent and time-intensive. This study aims to replace such
methods by utilizing computational models to automate genre classification based on audio features, particularly Mel-Frequency Cepstral
Coefficients (MFCCs) as shown in Table 4.1.
Significance of Accurate Classification: Genre classification is not only crucial for music recommendation systems but also plays a
pivotal role in digital music libraries, content tagging, and cataloging. Accurate genre classification enhances the listening experience and
helps in developing personalized recommendations for users on platforms like Spotify, Apple Music, and YouTube Music as shown in
Table 4.1.
Evaluation Criteria: The model's success was measured primarily by classification accuracy and F1-score. These metrics were chosen
because they offer insight into both the precision and recall of the model, allowing for a comprehensive understanding of its performance
across different genres as shown in Table 4.1.
Table 4.1 Objective Summary
Objective
Description
Feature Extraction
Extract MFCCs and other audio features from each genre.
Model Training
Train CNN on extracted features to classify genres.
Evaluation Metrics
Use accuracy and F1-score to measure the model’s performance.
FPR = FP
FP+FN
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIV, Issue II, February 2025
www.ijltemas.in Page 24
IV. Conclusion
The paper conclude and recommendation, the Convolutional Neural Network (CNN) model demonstrated a robust classification
performance, achieving an overall accuracy of 85%. This result was complemented by precision, recall, and F1-score metrics, which
allowed us to assess the model's effectiveness across each genre. Precision and recall metrics for each genre highlighted areas where the
model excelled and where it faced challenges, such as distinguishing between similar genres (e.g., Rock and Metal). Overall, the model
showed strengths in recognizing genres with distinct audio features like Classical and Jazz, achieving over 90% accuracy in these cases.
These findings underscore the potential of using CNNs for genre classification in music, leveraging audio features like MFCCs.
V. Recommendation
To enhance the model’s accuracy and robustness in classifying music genres, it is recommend exploring additional audio features beyond
MFCCs, such as spectral contrast and chroma, and implementing data augmentation techniques to diversify the dataset. Hyper parameter
tuning through more extensive methods like grid search or Bayesian optimization, as well as experimenting with deeper CNN
architectures, could yield further improvements. Additionally, incorporating metrics like AUC and confusion matrix analysis could
provide deeper insights into genre-specific misclassifications. For practical applications, adapting the model for real-time music
classification or recommendation systems and leveraging transfer learning with pre-trained models could prove valuable. Expanding the
model to include sub-genres would also improve its versatility and real-world relevance.
References
1. Bergstra, J., Casagrande, N., Erhan, D., Eck, D., & gl, B. (2006). Aggregate features and AdaBoost for music classification.
Machine Learning, 65(2), 473-484.
2. Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., & Slaney, M. (2008). Content-based music information retrieval:
Current directions and future challenges. Proceedings of the IEEE, 96(4), 668-696.
3. Choi, H., Sohn, K., & Kim, J. (2017). Transfer Learning for Music Genre Classification Using Deep Neural Networks.
Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 632637.
4. Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. Proceedings of the 2014 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP).
5. Essid, S., Richard, G., & David, B. (2006). Musical instrument recognition by pairwise classification strategies. IEEE
Transactions on Audio, Speech, and Language Processing, 14(4), 1401
6. Ghildiyal, A., Singh, K., & Sharma, S. (2020). Music genre classification using machine learning. In 2020 4th International
Conference on Electronics, Communication and Aerospace Technology (ICECA) (pp. 13681372). doi:
https://doi.org/10.1109/ICECA48744.2020.9271215
7. Humphrey, E. J., Bello, J. P., & LeCun, Y. (2012). Moving beyond feature design: Deep architectures and automatic feature
learning in music informatics. Proceedings of the 13th International Society for Music Information Retrieval Conference.
8. Li, X., Ogihara, M., & Kitahara, I. (2003). A support vector machine classifier for music genre classification. In IEEE
International Conference on Acoustics, Speech, and Signal Processing, ICASSP '03, 2003 (Vol. 3, pp. III5356).
9. Lidy, T., & Rauber, A. (2005). Evaluation of feature extractors and psycho-acoustic transformations for music genre
classification. Proceedings of the 6th International Conference on Music Information Retrieval.
10. Logan, B., & Salomon, A. (2001). Music classification by tempo and beat-occurrence features. IEEE Transactions on
Multimedia, 3(3), 341348.
11. Ndou, N., Ajoodha, R., & Jadhav, A. (2021). Music genre classification: A review of deep learning and traditional machine-
learning approaches. In 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS) (pp. 351356).
doi:https://doi.org/10.1109/IEMTRONICS52076.2021.9455366
12. Ren, D., Liu, Y., & Li, J. (2019). Musical genre classification based on Gaussian mixture models of spectral and temporal
features. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 69396943). IEEE.
13. Schlüter, J., & Böck, S. (2014). Improved musical onset detection with convolutional neural networks. Proceedings of the 2014
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
14. Sturm, B. L. (2014). A simple method to determine if a music information retrieval system is a “horse”. IEEE Transactions on
Multimedia, 16(6), 1636-1644.
15. Turnbull, D., Renals, S., & Gillett, M. (2003). Towards content-based classification of popular music using linguistic and audio
features. IEEE Transactions on Audio, Speech, and Language Processing, 11(6), 709717.
16. Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio
Processing, 10(5), 293-302.
17. West, R., Schnitzer, D., & Brown, G. (2002). Probabilistic classification and segmentation of audio data using hidden Markov
models. Journal of New Music Research, 31(2), 203 214.
18. Zhang, Y., & Yang, Q. (2018). An overview of multi-task learning. National Science Review, 5(1,30-43.