A Scalable Attention-Driven Architecture for Modeling Complex Disease Transmission Dynamics at Population Scale

K. Madhavi; Potharaju Gouthami; Pabbath Reddy Pavan Kumar Reddy; K. Sri Ram Venkata Mani Pavan; Mohammad Sammad

doi:10.64751/ajmimc.2026.v5.n2(2).388

Authors

K. Madhavi Author
Potharaju Gouthami Author
Pabbath Reddy Pavan Kumar Reddy Author
K. Sri Ram Venkata Mani Pavan Author
Mohammad Sammad Author

DOI:

https://doi.org/10.64751/ajmimc.2026.v5.n2(2).388

Keywords:

Clinical Text Classification, Medical Transcription Analysis, Natural Language Processing (NLP), Ensemble Learning, Healthcare Text Analytics. Synthetic Minority Over-sampling Technique (SMOTE).

Abstract

The rapid expansion of unstructured clinical text within healthcare systems has created a critical need for automated solutions capable of extracting meaningful insights for clinical decision support. Medical transcription data contains essential information regarding patient conditions, treatments, and associated domains, yet its unstructured nature makes accurate interpretation complex. The problem addressed in this study is the automatic classification of clinical narratives into appropriate medical specialties. In a manual system, healthcare professionals or administrative staff read and interpret each document individually to assign categories based on their expertise, which is time-consuming, inconsistent, and unsuitable for large-scale data processing. This manual approach suffers from limitations such as lack of scalability, high dependency on human judgment, susceptibility to errors, and absence of standardization, thereby emphasizing the need for an efficient automated system. To overcome these challenges, the proposed framework integrates natural language processing techniques with advanced machine learning methods. Initially, text data is pre-processed using tokenization, stopword removal, and lemmatization to generate clean input. Contextual feature representations are then extracted using Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA), capturing semantic relationships within the text. To handle class imbalance, Synthetic Minority Over-sampling Technique (SMOTE) and K-Means Synthetic Minority Over-Sampling Technique (KMeansSMOTE) are applied. The resulting features are utilized to train multiple classifiers, including Adaptive Boosting Classifier (ABC), Extremely Randomized Trees Classifier (ETC), Random Forest Classifier (RFC), and Tree Alternating Optimization Classifier (TTC), enabling robust comparative analysis and improved predictive performance. The proposed system significantly enhances classification accuracy, reduces manual workload, and provides a scalable and reliable solution for intelligent healthcare text analytics