Scalable Multilingual Clinical Trial Text Classification using Transformer Embeddings with Real-Time Redis and Telegram Integration

Authors

  • B. Laxmi Pathi Author
  • Rejintal Shivashankar Author
  • Dounde Yash Author
  • Badavath Mahesh Author

DOI:

https://doi.org/10.64751/ajmimc.2026.v5.n2(1).293

Keywords:

Natural Language Processing, Language-Agnostic BERT Sentence Embeddings (LaBSE), Ensemble Oblique Trees (EOT), Oblique Decision Trees, Random Forests.

Abstract

For decades, clinical trial management has depended on systematic extraction of unstructured clinical narratives to support patient safety monitoring and eligibility assessment. Traditionally, this process required extensive manual effort from clinical experts who categorized protocol deviations and screened participants based on complex documentation. With the emergence of Natural Language Processing (NLP) in the early 2010s, statistical approaches such as TF-IDF and Word2Vec enabled the first wave of automation in structuring clinical text. However, oncology data remains highly complex, characterized by dense unstructured narratives, nested logical conditions (AND/OR/NOT), and specialized domain terminology. Conventional machine learning systems often fail to capture the high-dimensional semantic relationships necessary for robust classification, resulting in overlooked systemic signals in protocol deviations. More recent approaches include axis-parallel decision tree ensembles, such as Random Forests, and cloud-based Large Language Models (LLMs). While effective in certain settings, axis-parallel models are limited by their inability to model diagonal decision boundaries in embedded semantic spaces, reducing performance on tilted or non-linear clusters. Conversely, LLMs such as GPT-4 offer strong reasoning capabilities but introduce challenges related to patient data privacy, operational cost, and limited multilingual robustness without translation pipelines. To address these limitations, this work proposes a privacy-preserving, multilingual framework combining Language-Agnostic BERT Sentence Embeddings (LaBSE) with Ensemble Oblique Trees (EOT). By leveraging oblique hyperplanes, the model better partitions highdimensional embedding spaces. The proposed LaBSE–EOT system enables lightweight, locally deployable, and interpretable classification, improving cross-lingual clinical trial oversight while reducing dependency on cloud infrastructure and enhancing global healthcare scalability.

Downloads

Published

2026-04-23

How to Cite

B. Laxmi Pathi, Rejintal Shivashankar, Dounde Yash, & Badavath Mahesh. (2026). Scalable Multilingual Clinical Trial Text Classification using Transformer Embeddings with Real-Time Redis and Telegram Integration. American Journal of Management and IOT Medical Computing, 5(2), 575-586. https://doi.org/10.64751/ajmimc.2026.v5.n2(1).293