"Fraud Detection in Health Insurance Claims using Machine Learning" by Geetha Reddy Katam
 

Level of Education of Students Involved

Graduate

Faculty Sponsor

Hugh Gong

College

College of Engineering (COE)

Discipline(s)

Machine Learning Model

Presentation Type

Poster Presentation

Symposium Date

Spring 4-24-2025

Abstract

This project focuses on detecting fraudulent health insurance claims using machine learning techniques. The dataset includes various patient attributes, claim amounts, and medical details, which were preprocessed by handling missing values, encoding categorical features, and standardizing numerical data. Fraud detection was formulated as a classification problem, where claims in the top 5% of the cost distribution were labeled as potentially fraudulent. Several models, including Logistic Regression, Random Forest, and XG Boost, were trained and evaluated, with Random Forest providing the best performance after tuning.

To gain deeper insights, multiple visualizations were created to analyze fraud patterns based on age, region, smoking habits, blood pressure levels, and feature correlations. While initial models exhibited overfitting, techniques such as feature selection, SMOTE balancing, and adjusting fraud detection thresholds improved generalization. The final optimized model achieved a balance between high precision and recall, making it suitable for real-world applications. Though deployment was initially considered, the project concluded with a locally usable model for fraud prediction, ensuring robust, data-driven decision-making for healthcare fraud detection.

Keywords:

  • Insurance Claims Classification
  • Predictive Analytics
  • Fraud Detection
  • Machine Learning Models
  • Random Forest Classifier
  • Claim Amount Prediction
  • Data Preprocessing
  • Feature Importance Analysis
  • Logistic Regression
  • XG Boost

Biographical Information about Author(s)

A master’s student specializing in Analytics and modelling, with a background in electronics and communication engineering. Her professional experience spans data analytics, quality assurance, and machine learning applications in healthcare and insurance. She has worked on automated claim processing, chatbot development, and predictive analytics at R1 and Omega Healthcare. Currently, she is focused on leveraging machine learning for insurance claims classification to improve fraud detection and operational efficiency. Her future goal is to become a professional data analyst, applying predictive modeling techniques in healthcare and insurance analytics. This project stems from her hands-on experience in claims processing, automation, and data-driven decision-making, aligning with her passion for solving real-world industry challenges.

Share

COinS