Level of Education of Students Involved
Graduate
Faculty Sponsor
Hugh Gong
College
College of Engineering (COE)
Discipline(s)
Machine Learning Model
Presentation Type
Poster Presentation
Symposium Date
Spring 4-24-2025
Abstract
This project focuses on detecting fraudulent health insurance claims using machine learning techniques. The dataset includes various patient attributes, claim amounts, and medical details, which were preprocessed by handling missing values, encoding categorical features, and standardizing numerical data. Fraud detection was formulated as a classification problem, where claims in the top 5% of the cost distribution were labeled as potentially fraudulent. Several models, including Logistic Regression, Random Forest, and XG Boost, were trained and evaluated, with Random Forest providing the best performance after tuning.
To gain deeper insights, multiple visualizations were created to analyze fraud patterns based on age, region, smoking habits, blood pressure levels, and feature correlations. While initial models exhibited overfitting, techniques such as feature selection, SMOTE balancing, and adjusting fraud detection thresholds improved generalization. The final optimized model achieved a balance between high precision and recall, making it suitable for real-world applications. Though deployment was initially considered, the project concluded with a locally usable model for fraud prediction, ensuring robust, data-driven decision-making for healthcare fraud detection.
Keywords:
- Insurance Claims Classification
- Predictive Analytics
- Fraud Detection
- Machine Learning Models
- Random Forest Classifier
- Claim Amount Prediction
- Data Preprocessing
- Feature Importance Analysis
- Logistic Regression
- XG Boost
Recommended Citation
Katam, Geetha Reddy, "Fraud Detection in Health Insurance Claims using Machine Learning" (2025). Symposium on Undergraduate Research and Creative Expression (SOURCE). 1478.
https://scholar.valpo.edu/cus/1478
Biographical Information about Author(s)
A master’s student specializing in Analytics and modelling, with a background in electronics and communication engineering. Her professional experience spans data analytics, quality assurance, and machine learning applications in healthcare and insurance. She has worked on automated claim processing, chatbot development, and predictive analytics at R1 and Omega Healthcare. Currently, she is focused on leveraging machine learning for insurance claims classification to improve fraud detection and operational efficiency. Her future goal is to become a professional data analyst, applying predictive modeling techniques in healthcare and insurance analytics. This project stems from her hands-on experience in claims processing, automation, and data-driven decision-making, aligning with her passion for solving real-world industry challenges.