Decoding Cardio-Respiratory Acoustic Patterns via Multi-Dimensional Deep Representation Learning for Clinical Insight

B. Ram Mohan; Mettu Sathwika; Lingala Shilpa; Dasari Venu

doi:10.64751/ajmimc.2026.v5.n2(1).278

Authors

B. Ram Mohan Author
Mettu Sathwika Author
Lingala Shilpa Author
Dasari Venu Author

DOI:

https://doi.org/10.64751/ajmimc.2026.v5.n2(1).278

Keywords:

Biomedical Acoustic Signal Processing, Digital Auscultation, Time–Frequency Domain Analysis, Acoustic Feature Engineering

Abstract

Cardio-respiratory disorders account for approximately 30% of total mortality in India, with their incidence steadily rising due to factors such as increasing urban pollution, sedentary lifestyles, and delayed clinical diagnosis, as highlighted by the Indian Council of Medical Research. Conventional auscultation using stethoscopes is highly dependent on the clinician’s expertise and experience, which can lead to variability and potential inaccuracies in diagnosis. To address these limitations, this study presents an automated cardio-respiratory sound classification framework designed to enhance early detection and diagnostic consistency through machine learning. The proposed system incorporates a user-friendly Graphical User Interface (GUI) with role-based functionality, allowing administrators to manage model training while enabling end-users to perform real-time predictions. Audio signals are preprocessed using the Librosa library to extract informative acoustic features such as Mel-Frequency Cepstral Coefficients (MFCC), Chroma features, and Mel Spectrogram representations, ensuring comprehensive characterization of both heart and lung sounds. A range of machine learning models, including Quadratic Discriminant Analysis (QDA), Gradient Boosting Classifier (GBC), Naïve Bayes Classifier (NBC), and Logistic Regression Classifier (LRC), are developed and comparatively evaluated. Furthermore, a hybrid deep learning architecture combining a Bi-directional Convolutional Neural Network (BiCNN) with a Bi-directional Gated Recurrent Unit (BiGRU) is introduced to effectively capture both spatial (spectral) and temporal dependencies within the audio data. The system is capable of performing dual classification by identifying both heart sound types and lung sound types simultaneously. Its performance is rigorously assessed using standard evaluation metrics, including accuracy, precision, recall, F1-score, and Receiver Operating Characteristic–Area Under Curve (ROCAUC), demonstrating its potential as a reliable and efficient tool for clinical decision support.