Level of Education of Students Involved

Graduate

Faculty Sponsor

Hugh Gong

College

College of Engineering (COE)

Discipline(s)

Machine Learning Model

Presentation Type

Poster Presentation

Symposium Date

Spring 4-24-2025

Abstract

This project focuses on detecting fraudulent health insurance claims using machine learning techniques. The dataset includes various patient attributes, claim amounts, and medical details, which were preprocessed by handling missing values, encoding categorical features, and standardizing numerical data. Fraud detection was formulated as a classification problem, where claims in the top 5% of the cost distribution were labeled as potentially fraudulent. Several models, including Logistic Regression, Random Forest, and XG Boost, were trained and evaluated, with Random Forest providing the best performance after tuning.

To gain deeper insights, multiple visualizations were created to analyze fraud patterns based on age, region, smoking habits, blood pressure levels, and feature correlations. While initial models exhibited overfitting, techniques such as feature selection, SMOTE balancing, and adjusting fraud detection thresholds improved generalization. The final optimized model achieved a balance between high precision and recall, making it suitable for real-world applications. Though deployment was initially considered, the project concluded with a locally usable model for fraud prediction, ensuring robust, data-driven decision-making for healthcare fraud detection.

Keywords:

Insurance Claims Classification
Predictive Analytics
Fraud Detection
Machine Learning Models
Random Forest Classifier
Claim Amount Prediction
Data Preprocessing
Feature Importance Analysis
Logistic Regression
XG Boost

Biographical Information about Author(s)

A master’s student specializing in Analytics and modelling, with a background in electronics and communication engineering. Her professional experience spans data analytics, quality assurance, and machine learning applications in healthcare and insurance. She has worked on automated claim processing, chatbot development, and predictive analytics at R1 and Omega Healthcare. Currently, she is focused on leveraging machine learning for insurance claims classification to improve fraud detection and operational efficiency. Her future goal is to become a professional data analyst, applying predictive modeling techniques in healthcare and insurance analytics. This project stems from her hands-on experience in claims processing, automation, and data-driven decision-making, aligning with her passion for solving real-world industry challenges.

Recommended Citation

Katam, Geetha Reddy, "Fraud Detection in Health Insurance Claims using Machine Learning" (2025). Symposium on Undergraduate Research and Creative Expression (SOURCE). 1478.
https://scholar.valpo.edu/cus/1478

Download

COinS

Symposium on Undergraduate Research and Creative Expression (SOURCE)

Fraud Detection in Health Insurance Claims using Machine Learning

Level of Education of Students Involved

Faculty Sponsor

College

Discipline(s)

Presentation Type

Symposium Date

Abstract

Biographical Information about Author(s)

Recommended Citation

Search

Browse

Author Corner

Valparaiso University Library

Symposium on Undergraduate Research and Creative Expression (SOURCE)

Fraud Detection in Health Insurance Claims using Machine Learning

Authors

Level of Education of Students Involved

Faculty Sponsor

College

Discipline(s)

Presentation Type

Symposium Date

Abstract

Biographical Information about Author(s)

Recommended Citation

Share

Search

Browse

Author Corner

Valparaiso University Library