INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIII, Issue X, October 2024
www.ijltemas.in Page 129
and dependencies among characteristics in the data. If the purpose of the analysis is to categorize data by class, then the new
information about the classes to which data belongs is the new information. As a result, the algorithms are classified into two
categories: supervised and unsupervised algorithms.
Algorithms that are supervised.
The output conditions are not explicitly represented in the data set when mining is "unsupervised" or "undirected": the task of an
unsupervised algorithm is to discover automatically inherent patterns in data without prior information about which class the data
could belong to, and it does not involve any supervision. The purpose of a model like this is to find data patterns in a large number
of input variables. Even though an unsupervised learning algorithm was not developed for prediction tasks, the model it produces
can sometimes be utilized for them. This category includes clustering methods and association rules.
Supervised algorithms are those that take data from a known class to develop models, and then predict the class to which new
data will belong based on the model. This category includes categorization methods. The process of learning a function that maps
data into one of many predetermined classes is referred to as data classification methods. An input data set including vectors of
attribute values and their matching classes is provided to any classification method that is based on inductive learning. A
classification technique's purpose is to create a model.
It allows for the automatic classification of future data points based on a set of predefined features. These systems use a set of cases
as input, each of which belongs to one of a restricted number of classes and is specified by the values for a set of fixed characteristics.
They provide a classifier that can reliably predict which class a new case belongs to as an output. Decision trees, induction rules or
classification rules, probabilistic or Bayesian networks, neural networks, and hybrid techniques are the most frequent classification
approaches. In this study, we investigated the impact of three algorithms for intelligent data analysis: C4.5, Multilayer Perceptron,
and Naive Bayes. There are many different classifiers in the literature, and it is impossible to choose the best because they differ in
many aspects such as learning rate, amount of data for training, classification speed, robustness, and so on.
(Wu and Kumar, 2009). These methods are used to create classification models, with the goal of predicting the class (student
success) to which a fresh unlabeled sample will belong. The three categorization strategies chosen are utilized to determine the best
method for predicting student performance.
The Naive Bayes algorithm (NB)
A basic classification approach based on probability theory, specifically the Bayesian theorem (Witten and Frank, 2000). It's named
naïve because it simplifies difficulties by depending on two key assumptions: prognostic characteristics are conditionally
independent with familiar categorization, and there are no hidden traits that might influence the prediction process. This classifier
is a potential approach to probabilistic knowledge discovery, as well as an extremely efficient data categorization technique. One
of the most extensively used and popular neural networks is the multilayer perceptron (MLP) algorithm. The input layer of the
network is made up of sensory components, one or more hidden layers of processing elements, and the processing elements' output
layer (Witten and Frank, 2000). MLP is particularly well suited to approximate a classification function that divides the example
indicated by the vector attribute values into one or more classes (where we are unfamiliar with the connection between input and
output attributes). C4.5 is the most prevalent and, in many ways, the most frequently utilized decision tree algorithm today. In 1993,
Professor Ross Quinlan created the C4.5 decision tree method, which is the outcome of research that dates back to the ID3 algorithm
(originally suggested by Ross Quinlan in 1986). Handling missing values, classification of continuous characteristics, decision tree
pruning, rule formulation, and other features are included in C4.5. The divide and conquer strategy is used in the basic design of
C4.5 algorithms to create an appropriate tree.
III. Result and Discussions
This package was written in the Java programming language, and it is now widely regarded as the most capable and complete
bundle of machine learning algorithms available in academic and nonprofit settings. It is typical to study the influence of input
factors during students' prediction success, in which the impact of specific input variables of the model on the output variable has
been investigated, in order to gain a better understanding of the relevance of the input variables. For the evaluation of input variables,
four tests were used: Chi-square test, One R-test, Info Gain test, and Gain Ratio test. The following metrics were used to track the
results of each test: Attribute (name of the attribute), Merit (measure of goodness), Merit dev (deviation, i.e. measure of goodness
deviation), Rank (average position occupied by attribute), Rank and dev (deviation, deviation takes attribute's position), Rank and
dev (deviation, deviation takes attribute's position).
Diverse algorithms provide substantially different outcomes, i.e., they each account for attribute relevance in a different way. Instead
of picking one method and trusting it, the end result of attribute ranking is the average value of all the algorithms. Three supervised
data mining algorithms were used to preoperative assessment data to predict course outcome (passed or failed), and the learning
techniques' performance was assessed based on their prediction accuracy, simplicity of learning, and user pleasant qualities. The
results show that the Nave Bayes classifier beats decision tree and neural network approaches for prediction. It has also been said
that an effective classifier model must be accurate as well as understandable to instructors. Because data mining techniques were
performed after the data was obtained, this study was based on typical classroom situations. This approach, however, may be utilized