The Accuracy Paradox: Why you may be measuring your Financial Crime Risk Model Wrong

6 min readJun 8, 2020

Financial Crime

Fighting financial crime is hard. Not only are financial institutions and governments around the world trying to find a needle in a haystack, but the needle also keeps moving as we look for it. Predictive Algorithms and Machine Learning have become commonplace in fighting financial crime, but the Accuracy Paradox tells us we may be measuring the wrong things to determine the effectiveness of our models.

Business implications are significant. By law, alerts generated must be reviewed and this requires additional resources that are hard to come by. However, the industry must realize that there are implications for reducing false positives and that a fine line must be walked choosing how much to optimize the accuracy parameters. Financial institutions are more concerned with missing actual issues since it has monetary implications such as reputational and financial damage. As an industry, we must be less concerned with reducing false positives and make sure we detect all true positives.

What is the Paradox?

The Accuracy Paradox, a well-established concept for analytics experts states that accuracy may not be the best metric to classify a model’s effectiveness. In other words, the model with the highest level of accuracy may have less predictive power. At first glance, this sounds counterintuitive, but I’ll explain how this is relevant in the financial crime world.

One of the challenges of building predictive models for Financial Crime is the lack of labeled data. When we use imbalanced datasets meaning datasets where instances of risk are very similar and far in between saying that a model is 90% effective classifying said risk pattern is misleading. As an industry, we need to strive for finding the balance between precision and recall.

What is precision?

Precision refers to the fraction of relevant instances among retrieved instances.

What is recall?

Recall refers to the fraction of the total amount of relevant instances that were retrieved.

When building predictive models, we will have to make a tradeoff between how many false positives are we willing to accept to obtain one or more true positives. In other words, are we comfortable with reducing the false-positive rates but also reducing the likelihood of identifying true risk by the same factor?

It’s clear that the accuracy of the model cannot be the only metric used to determine its effectiveness.

The Accuracy Paradox Across Analytics Approaches

One should be careful when drastically reducing false positives since this may equate to harming true positive detection. In the process of developing an optimal analytics strategy, the industry has created two main modeling techniques: Rules-Based and Machine Learning Based. While each technique has unique advantages and disadvantages they may also yield unintended results if model accuracy is not measured correctly or emphasized incorrectly.

Rules-Based Analytics

Before Machine Learning became popular, business logic was built-into rules. These types of models are still prevalent and in fact power a large portion of our financial system’s risk detection. Rules can be updated by varying levels of skills and are relatively easy to write. However, the primary shortcoming is that the rules and the thresholds may become stale yielding less relevant alerts. The visualization below shows how the value of this analytics type declines overtime. As the transactions and behavior changes, if the rules are left untouched, they eventually become ineffective. To address this issue financial institutions must be able to change parameters quickly and effectively. Even a highly accurate rules-based system may be ineffective as stated by the Accuracy Paradox. This is because the rules-based systems only identify static historical patterns and it’s unable to detect rapid fluctuations in the data.

Machine Learning

Machine Learning has taken the financial crime management industry as well as others by storm. The promise is that unlike traditional rules-based systems the models will automatically update and never become stale. The theory confirmed in other industries is that by continuing to use and build training data the system’s performance increases rapidly avoiding false positives. A few years into the machine learning revolution we know this is not completely true.

Even machine learning models depending on the type of model may require constant training to remain relevant. Achieving a constant flow of labeled risk events in an industry with low captured instances of true risk being detected makes the time to effectiveness longer than desired. In addition, given the regulatory constraints, the ability to change and deploy models as new versions are available after training is not truly dynamic since multiple stakeholders often need to sign off changes to the models

All things considered, machine learning brings a unique advantage to financial crime mitigation and when used in combination with rules-based detection it can significantly improve the quality of the results.

There are at least two major types of Machine Learning:

· Supervised

· Unsupervised

Supervised

This refers to models in which we provide a desired output for the algorithm and the system generates the relevant rules to identify the same outcome in the future. For the purposes of financial crime management, this is a major improvement. It means that instead of writing rules for the detection, we allow the machine to come up with those rules based on the data that we feed it. This is however still limited by the data that we feed it and in the case of financial crime the few instances of risk we know of. In addition to the lack of data, what we do feed into the system is what the algorithm will know how to identify. That means that if a new type of risk emerges it won’t be detected by the system. Different from rules-based, what happens with supervised learning models is that they significantly get better over time. The more data we feed them the better they become. Like the rules-based approach, adopting a model that is highly accurate could mean we only identify the exact same instances of risk we saw in the past. However, the algorithm will not have built-in variability if it’s not present in the dataset.

Unsupervised

As explained earlier, the supervised technique only finds known instances. This leaves us exposed to the ever-changing risk patterns. Bad actors continue to evolve and become more sophisticated by the day. That leaves even the best-supervised machine learning model in the dust. Unsupervised Machine Learning models are different since it doesn’t expect a predefined outcome as an input. Instead, some algorithms allow performing anomaly detection. Anomaly detection models have a fresh brain every time they run, increasing the likelihood that they can identify new types of risk we didn’t know existed. The threshold of accuracy for an Unsupervised Model must also be set carefully. Having a setting that is too narrow would only flag extreme cases excluding some instances of risk. Similarly, the anomaly detection process since it has varying weights depending on the time it’s run, requires accuracy measurements are made in context. Anomaly detection models can be more flexible since they can identify both acceptable anomalies as well as illegal activities, both of which may be of interest not only for the compliance and risk teams but also for business stakeholders.

Multi-Dimensional Analytics

Being able to combine the capabilities of all three of the analytics types is the closest we can get to highly performant financial crime detection. All industry participants must understand the tradeoffs to be made between increasing precision and the effects on identifying true risk. In general, it’s good to be skeptical about a highly performant model that advertises yielding close to perfect results. Instead, we should spend efforts on developing each of the analytic processes mentioned above further and more importantly on integrating their results. Finally, it’s key that there is traceability of results especially in machine learning use cases where thresholds may vary over time.

If you want to learn more about how to incorporate these technologies into your Financial Crime Detection process you can visit us at NICE Actimize Compliance

The Accuracy Paradox: Why you may be measuring your Financial Crime Risk Model Wrong

Written by Daniel Fernandez

No responses yet