INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIII, Issue VII, July 2024
www.ijltemas.in Page 175
AI-Driven Model for Contract Law Cases
Nnaemeka .C Onyemelukwe,Ogochukwu C Okeke
Computer Science Department, Chukwuemeka Odumegwu Ojukwu University, Nigeria
DOI: https://doi.org/10.51583/IJLTEMAS.2024.130721
Received: 04 July 2024; Accepted: 13 July 2024; Published: 20 August 2024
Abstract: The subjective nature of human analysis results in inconsistent decision-making, which seriously jeopardizes the
accuracy and fairness of the verdicts in contract disputes. Legal practitioners confront the difficult task of organizing and
evaluating numerous precedent/statutes case materials promptly as the amount of contracts keeps increasing. This time constraint
not only makes it more difficult to resolve contract issues on time but also makes the legal system more complicated and unclear.
There has never been a greater need for contract litigation to undergo an extensive change. The researcher proposed two
complementary methods for retrieving legal documents: BM25 and an aggregated Bidirectional Long Short-Term Memory
(BiLSTM) model with a Convolution Neural Network (CNN)
Keywords: Legal Retrieval, Bilstm, Litigation, Bm25, Cnn
I. Introduction
Contract litigation, a cornerstone of the legal system, plays a pivotal role in resolving disputes and upholding the sanctity of
agreements. In this domain, the accurate interpretation of contractual terms and conditions is paramount, as it influences the
outcomes of legal proceedings. [1] However, the processes governing contract litigation have long grappled with inherent
challenges, comprising inefficiency, subjectivity, limited scalability, and the risk of errors. The manual analysis of precedent
cases and interpretation of complex legal documents (statute cases)is not only labour-intensive but also prone to human error.
This often results in delays in resolving disputes and escalates costs for all parties involved.
In order to increase the variety of analytical approaches available for the analysis of legal documents, the researcher has focused
their research efforts on effective artificial intelligence (AI) and machine learning (ML) technologies. This has opened up
significant new research avenues for specialists in computer science and law. Replacing legal regulations with executable code is
one such move. A legal document records and validates an agreement between two or more parties when all parties sign it at the
conclusion.
The model is based on the integration of two complimentary methods for retrieving legal documents: an aggregated Bidirectional
Long Short-Term Memory (BiLSTM) model with a Convolutional Neural Network (CNN) and Best Match 25 (BM25). A
technique for retrieving information called BM25 efficiently ranks documents according to their length and phrase frequencies
[3]. However, the CNN-BiLSTM model is able to extract contextual information and semantic correlations from the text [7].
Combining both of these strategies presents an all-encompassing methodology that enhances the retrieval of legal documents by
merging conventional keyword-based retrieval with deep learning approaches. This hybrid technique, which is especially useful
for retrieving legal information, combines the best features of both approaches to improve the relevance and accuracy of
document rating.
II. Statement Of The Problems
Based on the literature review, the majority of studies on legal contract litigation concentrated on preprocessing techniques and
models, which left an opportunity for recall and precision-based performance enhancement to address significant implications for
the legal process practices based on the following problems:
A. Compliance Risk and insufficient precedents/statute retrieval: Legal professionals struggle with compliance risk and
restricted access to precedents and statute cases, leading to difficulties in developing strong model arguments based on
prior cases. This weakens their overall case and increases the risk of noncompliance if legal obligations if not properly
identified and addressed during the development stage of litigation machine learning
B. Limited Scalability when it comes to timely retrieval: Most existing models like key search techniques for legal
professionals encounter difficulties in managing and analyzing numerous contracts litigation as contract volumes rise.
This limitation impedes the timely resolution of contract disputes.
Aim and Objectives of the Study
The study aims to leverage AI and ML techniques to develop an advanced contract analysis system that can efficiently extract
relevant information from legal documents, identify key clauses, and analyze contract terms and conditions. In other to design a
hybrid model that combines the capabilities of AI technologies with the expertise of legal professionals. This integration aims to
enhance the accuracy and reliability of contract analysis by leveraging the strengths of both human judgment and AI capability.
Additionally, the study seeks to improve the efficiency and speed of contract litigation processes by automating routine tasks
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIII, Issue VII, July 2024
www.ijltemas.in Page 176
through AI-driven contract analysis. This objective aims to reduce the time and effort required for contract review, enabling faster
resolution of disputes.
Incorporating AI technologies, the research aims to address the issue of inconsistencies in contract analysis. The objective is to
develop a system that provides consistent and impartial analysis, leading to fair and reliable decision-making in contract
litigation.
III. Literature Review
Artificial Intelligence (AI) and Machine Learning (ML) in Legal Applications
Artificial Intelligence (AI) and Machine Learning (ML) are increasingly being applied in various aspects of the legal field to
streamline processes, enhance efficiency, and provide better insights. [10].
Related work
[1], [5], [6], and[9] have developed various methods to improve legal outcomes. Hassan used data mining to develop a
technology-assisted review for the Supreme Court of Pakistan and the Pakistan Bar Council. Messina and Thenmozhi used a
precedent-setting method to retrieve comparable previous case documents. Ahmad's methodology uses a hybrid deep learning-
based decision support system to forecast court outcomes.
According to their research, a lot of legal discourse and argumentation formats are built around precise ideas that serve as useful
logical models. Some aspects of addressing legal problems, including determining the meaning of legal jargon in common
parlance, don't readily fit into the logical framework and are better suited for empirical research. The complementary functions of
logical reasoning with rules and semantic reasoning with case facts spawned a variety of hybrid reasoning systems that combined
rule-based and case-based reasoning [2] [4] or other forms of semantic analysis [8]. Hybrid approaches aim to mimic the ability
of human attorneys to construct arguments that include precedent-setting facts along with normative standards, such as laws and
regulations.
IV. System Analysis and Methodology
The approach combines BM25 with two neural networks namely CNN and BiLSTM,
In the system analysis of the model's hybridization, the research proposed the following architectural stages :
Data processing
The data Pre-processing methods indulge in the process of preparing the data included with the filtering process of the dataset.
There are various stages involved in the pre-processing method.:
a) pre-processing the data
b) text lower casing
c) punctuation and special character removal
d) stop word removal
e) stemming / lemmatization
Tokenization
Legal document precedents, statutes, cases, and queries are tokenized and then represented numerically.
CNN features Extraction using a filter to the input
Convolution:
Max Pooling:



󰇛 󰇜󰇛 󰇜
Connected Layer =
Where :
h represents the input feature map, which could be an image or feature map from a previous layer.
W represents the convolutional kernel or filter. These are small matrices used for feature extraction. During training, these kernels
are learned to detect various patterns or features within the input data.
Z represents the output feature map obtained by convolving the input feature map with the kernel.
hxy represents the value at position (x,y)(x,y) in the output feature map after applying max pooling.
h(x+i)(y+j)h(x+i)(y+j) represents the value at position (x+i,y+j)(x+i,y+j) in the input feature map.
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIII, Issue VII, July 2024
www.ijltemas.in Page 177
The notation max(i,j) denotes taking the maximum value over a specific region of the input feature map. Typically, this region is
defined by a small window or kernel.
BiLSTM integration
In this stage, the extracted characteristics features (output) from the CNN are the input to the BiLSTM layer forward and
backward network simultaneously which involves multiple gates and operations.
󰇛󰇜󰇟 󰇛󰇝󰇞 󰇝󰇞󰇝 󰇞 󰇜󰇠
󰇛󰇜󰇟 󰇛󰇝󰇞 󰇝󰇞󰇝 󰇞 󰇜󰇠
󰇛󰇜󰇟 󰇛󰇝󰇞 󰇝󰇞󰇝 󰇞 󰇜󰇠
󰇛󰇜󰇟 󰇛󰇝󰇞 󰇝󰇞󰇝 󰇞 󰇜󰇠
󰇛󰇜󰇟 󰇛󰇜󰇠
Where,
X_t is the input at time step t,
H_t is the hidden state at time step t,
Sigma is the sigmoid activation function,
Represents element-wise multiplication,
W and b are the weight matrix and bias vector parameters.
Combining BM25 with CNN-based BiLSTM (hybrid model)
Multiply the BM25 weights with the word embeddings obtained from the CNN-BiLSTM mode
󰇛 󰇜 󰇛󰇜󰇛󰇛 󰇜 󰇛 󰇜󰇜󰇛󰇛 󰇜  󰇛   󰇜󰇜



 
󰇛



󰇜
󰇛 󰇜
󰇛








󰇜
󰇛 󰇜
Where
󰇛󰇜 󰇛󰇛  󰇜󰇛 󰇜󰇜
N: 
n: Number of documents containing term t.
BM25(d,t): BM25 score of document d for a given term t
tf(t,d): Term Frequency of term tt in document d, representing how many times term t occurs in document d.
k1: A parameter that controls the scaling of term frequencies. It's typically set empirically.
b: Another parameter that controls the scaling of document length normalization. Also, set empirically.
d: Length of document dd, often measured as the total number of terms.
avgdl: Average document length in the document collection
Fig. 1 Data flow diagram of the Hybrid Model Advantages of the model
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIII, Issue VII, July 2024
www.ijltemas.in Page 178
Improved Precision and Recall: Deep learning models like CNN and BiLSTM, when combined with BM25, can achieve higher
precision and recall in information retrieval and analysis tasks compared to traditional keyword-based or rule-based approaches.
Contextual Understanding: Traditional methods often struggle to capture the contextual nuances and dependencies present in
legal documents. The combination of deep learning models allows for a better understanding of the context, making it more
suitable for tasks like contract interpretation.
Multimodal Data Handling: In cases where legal analysis involves various data types, such as text, images, or tables, the
combined model can efficiently process and integrate this multimodal information, which is a challenge for many traditional
methods.
Customization: The model can be fine-tuned and customized to specific legal domains, making it adaptable to the unique
requirements of contract litigation, while traditional methods may lack this flexibility.
Scalability: Deep learning models can be scaled to handle a large volume of legal documents, providing the advantage of
processing vast amounts of data efficiently.
Reduced Manual Intervention: Traditional methods often require extensive manual efforts in document search and analysis.
The combined model can automate and expedite many of these tasks.
Interpretability: While traditional methods are often more interpretable, newer models are making efforts to improve their
interpretability through techniques like attention mechanisms and explainable AI, thus narrowing the gap in this aspect.
V. System Design and Implementation
System Implementation
Hardware Requirement
High-performance CPU of Intel Core i5 or AMD Ryzen 5)
Random Access Memory (RAM) of 8GB to 16GB of RAM and above 256GB SSD is typically sufficient, but larger capacities
may be needed depending on the size of the dataset
Program model Specification
The researcher makes use of Anacondar Nevigator as the environment with Jupyter Notebook as the integrated development
environments ( IDE) software.
Programming Languages: Python program
Machine Learning Libraries: TensorFlow, PyTorch, sci-kit-learn, or Keras for building and training machine learning models.
Data Processing Tools: Pandas, NumPy, Seaborn, Matplotlib. Etc
Development Environments: Jupyter Notebook, PyCharm, or Visual Studio
VI. Result and Findings
The researcher compares several deep learning models and finds that the BM25-CNN-BiLSTM model performs better than
conventional techniques, a random sample of the IR-LegPrec dataset was used for training and validation .
Table 1 :Training/Validation on Accuracy & Loss
.Accuracy
MLP
LSTM
Hybrid model (BM25-CnnBilstm)
Training
0.875
0.824999988079071
0.9333333373069763
validation
0.93333333
0.800000011920929
0.9666666388511658
The multilayer perception (MLP) neural network and Long Short-Term Memory Networks (LSTM) accuracy in Table 1 show
that the training query-doc-set and object-case document accuracy are more difficult for the model to handle. This suggests that
the approach is not suitable for larger successful task 1 and task 2 training. This falls short of accurately capturing the relationship
between the input and output results.
The training and validation sets' accuracy has high outcomes which differ slightly from each other, in the hybrid model. This
difference shows consistency, reproducible findings, and improved accuracy when the function is executed numerous times.
VII. Summary
A major development in the field of legal document retrieval is the BM25 with CNN-based BiLSTM model. This methodology
provides a solid answer to the problems associated with contract law disputes by integrating the advantages of deep contextual
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIII, Issue VII, July 2024
www.ijltemas.in Page 179
awareness offered by neural networks with the capabilities of classic retrieval methods. Legal practitioners will have improved
tools for analysis and decision-making as a result of it not only improves the relevance and accuracy of document retrieval but
also fosters deeper insights into the legal texts.
VIII. Conclusion
The study's conclusions suggest that document analysis models that combine CNN-BiLSTM architectures with BM25 give
significant benefits in legal contract litigation. The Best Matching 25 (BM25) assigns a relevance score to each precedent &
statute case document in a collection about a query based on the frequency of query phrases in each document and their rarity
throughout the document collection.In order to offer initial document rankings based on word frequency and relevance, the
combined model makes use of BM25. The CNN's retrieved features are then used to improve these scores, and the BiLSTM
contextualizes them. Because of this integration, legal documents can be ranked in a more accurate and nuanced manner that
takes into account the whole context in which each term appears as well as the significance of individual terms.
These models demonstrate superior accuracy, precision, and efficiency compared to traditional methods, leading to more effective
contract analysis, case review, and legal research. Therefore, adopting these advanced models can significantly enhance the
capabilities and performance of legal professionals in handling contract litigation cases. Further research and development efforts
should focus on optimizing model parameters, expanding training datasets, and addressing domain-specific challenges to
maximize the potential impact of these models in the legal domain Application Areas
Financial Contracts Analysis:
Analyzing financial contracts, such as loan agreements, insurance policies, and investment contracts, to extract terms, conditions,
and clauses for risk assessment, compliance monitoring, and decision-making.
Healthcare Document Analysis:
Processing medical documents, including patient records, clinical trials, and research papers, to extract relevant information, such
as diagnoses, treatments, and medical history, for healthcare management, research, and decision support.
Government Document Processing:
Analyzing government documents, such as legislative texts, policy documents, and regulatory filings, to extract key provisions,
regulations, and compliance requirements for policy analysis, regulatory compliance, and public administration.
Business Contracts Review:
Reviewing business contracts, such as vendor agreements, partnership agreements, and service contracts, to identify obligations,
liabilities, and contractual terms for risk assessment, negotiation support, and contract management.
Future Study and Research
Analyze Transformer Models by Investigating the usage of sophisticated transformer-based models such as BERT, RoBERTa, or
LegalBERT for a deeper understanding of legal literature. You can also explore hybrid approaches that fuse state-of-the-art deep
learning models with traditional information retrieval techniques. Ongoing research expenditures to better understand the unique
characteristics of legal language and the most effective ways to mimic them.
References
1. Ahmad, S., Asghar, M. Z., Alotaibi, F. M., & Al‐Otaibi, Y. D. (2022). A hybrid CNN + BILSTM deep learning-based
DSS for efficient prediction of judicial case decisions. Expert Systems With Applications, 209, 118318.
https://doi.org/10.1016/j.eswa.2022.118318.
2. Branting, L.K.: Reasoning with Rules and Precedents: A Computational Model of Legal Analysis. Kluwer Academic
Publishers, Dordrect/Boston/London (2000)
3. Gomede, E., PhD. (2023, September 2). Understanding the BM25 ranking Algorithm - Everton Gomede, PhD - medium.
Medium. https://medium.com/@evertongomede/understanding-the-bm25-ranking-algorithm-19f6d45c6ce
4. Governatori, G., Bench-Capon, T., Verheij, B., Araszkiewicz, M., Francesconi, E., & Grabmair, M. (2022). Thirty years
of Artificial Intelligence and Law: the first decade. Artificial Intelligence and Law, 30(4), 481519.
https://doi.org/10.1007/s10506-022-09329-4
5. Hassan, M. U. (2022). Technology Assisted Review of Legal Documents. RIT Scholar Works.
https://scholarworks.rit.edu/theses/11395/
6. Jelali, S. E., Fersini, E., & Messina, E. (2015). Legal retrieval as support to eMediation: matching disputant’s case and
court decisions. Artificial Intelligence and Law, 23(1), 122. https://doi.org/10.1007/s10506-015-9162-1.
7. Rhanoui, M., Mikram, M., Yousfi, S., & Barzali, S. (2019). A CNN-BILSTM model for Document-Level Sentiment
Analysis. Machine Learning and Knowledge Extraction, 1(3), 832847. https://doi.org/10.3390/make1030048
8. Publications - Dr. B.J.G. (Bas) Testerink - Utrecht University. (2022).
https://www.uu.nl/staff/BJGTesterink/Publications
INTERNATIONAL JOURNAL OF LATEST TECHNOLOGY IN ENGINEERING,
MANAGEMENT & APPLIED SCIENCE (IJLTEMAS)
ISSN 2278-2540 | DOI: 10.51583/IJLTEMAS | Volume XIII, Issue VII, July 2024
www.ijltemas.in Page 180
9. Sampath, K., & Thenmozhi, D. (2022a). PReLCaP : Precedence Retrieval from Legal Documents Using Catch Phrases.
Neural Processing Letters/Neural Processing Letters, 54(5), 38733891. https://doi.org/10.1007/s11063-022-10791-z
10. Thomson Reuters Corporation. (2024, January 25). How AI and machine learning are shaping legal strategy.
https://www.thomsonreuters.com/en/careers/careers-blog/how-ai