Credit Card Fraud Detection

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

This is a machine learning project that uses some anomaly detection algorithms in order to classify fraud and normal transactions.

Created a Deep Neural Network to classify the credit card transactions, getting an accuracy of 90 %.
Isolation Forest is the model with the best performance, with the highest accuracy.
Isolation Forest detected 57 errors versus Local Outlier Factor detecting 97 errors and Support Vector Machines detecting 1420 errors. When comparing error precision & recall for the 3 models , the Isolation Forest performed much better than the LOF and SVM with 42 % against 2% (LOF) and 3% (SVM).
In order to improve the performance of the models you can use:
- Deep Learning Models to detect more accurately fraud transactions
- More complex models of anomaly detection
- Use more samples of the fraud class, in this case there was a serious imbalance class problem and that is why the models are not the best.

Exploratory Data Analysis (EDA)

First, I plotted the numbers of instances per class in order to know how to handle the data. In this case, I found a very serious imbalance class problem, whereas for normal transactions there were 284315 instances and for the fraud transactions there were only 492 instances.

eda

Secondly, I plotted the frequency of transactions throughout time for each class. eda

And finally, I plotted the scatter of the amount of transactions throughout time for each class. eda

Data Visualization

TSNE

t-distributed Stochastic Neighbor Embedding (t-SNE). Is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. t-SNE has a cost function that is not convex, i.e. with different initializations we can get different results.

Class 0 - Normal Transactions
Class 1 - Fraud Transactions

eda

Using Deep Learning to Classify the Credit Card Transactions

By using Convolutional Neural Network and Balancing the Classes using undersampling for the Fraud Class, I got a tetsing accuracy 0f 89 %.

The Training Curve of the model for the accuracy is the following

eda

And the Confusion Matrix for the testing set, with 176 instances predicted correctly and only 21 incorrectly.

eda

Machine Learning Models Performance

The following are the classification reports for each model:

Isolation Forest

Detected Errors: 57
Precision Detecting Fraud Class: 42%
Macro F1 Score: 71%

Local Outlier Factor

Detected Errors: 97
Precision Detecting Fraud Class: 2%
Macro F1 Score: 51%

Support Vector Machine

Detected Errors: 1420
Precision Detecting Fraud Class: 3%
Macro F1 Score: 52%

As you can see, the CNN has a very good performance with a high accuracy value, and the Machine Learning model with the best performance is the Isolation Forest, in order to improve the classification problem you can use more complex anomaly detection models or more complex Neural Networks.

Link to GitHub Repository