Fraud Scoring without Supervised Data: A Junior Dev's Dilemma

Fraud Scoring without Supervised Data: A Junior Dev’s Dilemma

As a junior dev, I’ve been tasked with implementing fraud scoring in our rules-based fraud detection system. The catch? I don’t have access to production transaction data due to policy restrictions. This means I need to get creative with generating fake data and finding the right approach to scoring.

I’ve done some research and narrowed down my options to regression, classification with predict_proba(), or isolation forest. But I’m not convinced any of these will give me the results I need.

Has anyone else faced a similar challenge? How did you overcome the lack of supervised data? What other techniques or models should I consider? I’m eager to learn from your experiences and avoid any potential rework down the line.

One of my main concerns is ensuring my model can accurately predict the level of fraud risk. I’ve thought about using regression with a target value between 0 and 1, but I’m worried the model might predict values outside this range. Classification with predict_proba() seems like a good alternative, but I’m not sure if it’s the best approach. Isolation forest is another option, but I’m not familiar with its applications in fraud scoring.

Any advice or guidance would be greatly appreciated. I’m eager to learn and find the best solution for our system.

Leave a Comment

Your email address will not be published. Required fields are marked *