Fraud Detection Case Study

A two-day case study on fraud detection. The goal of this sprint was to create an end-to-end prediction platform. First, we were broken up into teams of three.

Our team began with feature selection and engineering. Some of the features we engineered were:

Count NaNs, or missing data, per column
A percentage of uppercase characters for each title
Event duration field

Based on the assumption that misclassifying true fraudulent cases cost us significantly higher than misclassifying true non-fraud cases, we modeled to minimize false negatives. After a train / test split, we iteratively tested the random forest model and selected the features that gave us the best result.

The model was designed to take one instance, classify it as fraud or not with associated probability scores, then save the results to a Mongo database. We then initialized a site on our local designed to receive one request and go through the previously described steps.

A server sent out live requests, or unseen data in JSON format, to the site we set up. We then classified and stored those new requests an the Mongo database. We coded up a dashboard on the splash page of the site for a quick-view of essential info. Essentially, we wanted to make potentially fraudulent cases accessible at a glance.

Dashboard example:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ![Dashboard Example](https://github.com/drewrice2/fraud-detection-case-study-DSI/blob/master/Dashboard_example.png) ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Technologies used:

Python 2.7
SKLearn's RandomForestClassifier and train_test_split
Mongo DB, via PyMongo
Flask
Pandas, numpy

Future steps would include:

Grid searching to optimize the model
Clean up the database
Test other models
NLP on event title and description
Make the dashboard look freakin’ sweet

NOTE: due to the nature of the sprint, some of the code is a bit hacky, so beware...

Scott Contri, Clay Porter, Drew Rice, 2016.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.ipynb_checkpoints		.ipynb_checkpoints
code		code
.DS_Store		.DS_Store
Dashboard_example.png		Dashboard_example.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection Case Study

About

Releases

Packages

Languages

drewrice2/fraud-detection-case-study-DSI

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection Case Study

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages