When a U.S. insurance company observed a worrying trend of rising claims and potential misuse, they enlisted Statistique to build and deploy an advanced fraud detection system. Using an automated approach that leveraged both predictive analytics and machine learning, this cutting-edge solution provided near-real-time alerts on fraudulent claims—ultimately protecting the insurer’s bottom line and its customers from inflated premiums.
Project Overview
- Client: A reputable insurance provider headquartered in the United States
- Objective: Detect and alert on suspicious or fraudulent claims in Medicare and other health-related insurance plans
- Scope: Integrate the fraud detection model into the company’s existing claims management system for automated and accurate screening
Key Steps in Our Solution
1. Feature Engineering
Healthcare fraud frequently involves collusion among physicians, providers, and beneficiaries. To capture this complexity:- Grouping by Provider Level: We aggregated numerical features—such as total claim amounts and frequency of visits—at the provider level. This approach highlighted unusual billing patterns that might slip through conventional screening.
- Fraud Ring Discovery: By examining how multiple actors interact (e.g., the same provider and group of beneficiaries appearing in suspicious claim patterns), we spotted potential fraud “hotspots.”
2. Logistic Regression Classifier
To interpret the probability of fraud clearly and efficiently, we used a Logistic Regression (LR) model:- Model Choice: LR was selected for its balance of accuracy, computational efficiency, and explainability—allowing the insurer to see exactly which features weighed most in labeling a claim suspicious.
- Performance Tuning: With adjustable thresholds, the insurer could tailor the model’s sensitivity (recall) and precision according to business needs.
3. Accuracy and Performance
Our model delivered:- ~0.90 Accuracy
- ~0.80 AUROC Score
- ~0.55 Kappa Score
Further Enhancements
- Expanded Fraud Data Continuously feeding the model with newly identified cases of fraud improves its ability to adapt, catching ever-evolving schemes.
- Ensemble Techniques & Parameter Tuning Combining Logistic Regression with additional algorithms (e.g., random forests or gradient boosting) can yield more robust results, particularly as fraud tactics become more sophisticated.
- ICD-9 Code Vectorization Converting medical codes into numerical vectors (e.g., via Count Vectorizer) can reveal deeper patterns in how procedures and diagnoses are related—boosting accuracy and uncovering niche fraud scenarios.
Business Impact & Recommendations
- Real-Time Alerts By integrating our model into their claims workflow, the client now receives automated red flags on suspicious claims, significantly reducing the manual audit burden.
- Provider Scrutiny & Oversight High-risk providers are easily isolated, enabling the client to perform targeted investigations and curtail malicious activity more rapidly.
- Regulatory Collaboration Insights from the system help government authorities and policymakers tighten rules and regulations around health claims—ultimately driving down fraud across the wider healthcare ecosystem.
- Economic Benefits Rooting out fraudulent activity keeps insurance premiums stable and prevents undue inflation within the healthcare sector, ultimately benefiting patients, businesses, and taxpayers alike.