Introduction:If Money Laundering was an industry it would be the 3rd biggest industry in the world after foreign exchange and the car industry. According to the United Nations Office on Drug and Crime, the worldwide amount of money laundered is 2 to 5% of the World GDP or between $500 billion and $1 trillion . Banks are a conduit through which most of the laundered money flows and so regulators and central banks have been demanding financial institutions to implement stronger controls to monitor transactions with an aim to identify and stop Anti-Money Laundering (AML) activities. In response, US financial firms are spending close to $25 billion per year on financial compliance as per Thompson Reuters with banks in Europe and Asia catching up. However currently the AML controls implemented in banks which are mostly rule-based have a false positive rate of more than 90% and have been ineffective in stemming the increase in the amount of money laundered through the financial system. As a case in point, banks have been fined more than $30 billion since 2008 for having breaches in Know Your Customer (KYC) and Sanction Screening in AML.
Current State: The current AML controls are not only ineffective but since all the false positives have to be investigated by compliance staff they create redundant expenses. The high rate of false positives is driven by the fact that the current AML controls use pre-set thresholds and rules for detection which have not been able to scale to the high volume and velocity of transactional data and also fail to capture the context of customer transactions. As a result, more than 90% of these alerts are dismissed after the first round of investigation by the Compliance staff and out of the remaining 10% only around 1-2% end up being credible enough to require regulators to be informed about them through filing of Suspicious Activity Reports (SAR’s) as per an article in Risk magazine.
Why ML for AML?: The ineffective current AML controls; vast amounts of transaction data and the complexity of the money laundering schemes create a perfect opportunity to use Artificial Intelligence (AI) Machine Learning (ML) methods since they have now been proven to detect patterns and trends in large datasets more effectively than humans. Realizing the promise of AI and ML, even the regulators are encouraging innovative approaches in AML. In December 2018, the main governing body of AML in the US, the Financial Crimes Enforcement Network (FinCEN) and its regulatory partners (Board of Governors of the Federal Reserve System, the Federal Deposit Insurance Corporation, the National Credit Union Administration, and the Office of the Comptroller of the Currency) issued a joint statement to “encourage banks and credit unions to take innovative approaches to combating money laundering”. This was a game-changer as it was a very public call to action encouraging financial institutions to experiment with new approach’s and was unusual coming from the regulators but reflects the gravity of the situation.
Use Cases: The primary areas of AML where ML can be vastly help are segmentation of user profiles to identify riskier clients; enriching customer transactional data from internal and external data to capture context and relationships between transactions/users and in using the enriched data to perform data mining to capture patters and trends in the transactional data . In the following sections, is a summary of different ML techniques being explored in these areas of AML with links to some relevant research papers and their summary;
- Customer Segmentation using Clusters: Clustering involves splitting a dataset into smaller groups based on a pre-defined measure of similarity with members in in a group being most similar to each other. In AML, K-means clustering can be used for grouping transactions with bank accounts into different groups (k) to identify most risky transactions. As an example, in a recent paper, k-means clustering was used by authors to analyze investment transactions and to group similar transactions into suspicious profiles and then used classification to segment customers into pre-defined categories of risk. Similarly, support vector machines (SVM) which is a supervised classification method can be used on labelled transactional data to separate incoming transactions into two classes – suspicious and non-suspicious transactions. As an example in a paper , SVM was used instead of traditional pre-defined rule-based suspicious transaction data filtering system with promising results. A full list of all data mining techniques with relevant papers has been painstaking assembled by the authors of the following paper  which is an invaluable summary.
- Risk Scoring : In another recent paper, an inherent risk scoring method was developed to distinguish between high and low inherent risk of KYC (Know Your Customer) profiles of customer. The novel method used relative comparisons (high/low) in absence of labelled data using conjoint analysis. The model was trained and tested on questionnaire results of choice-based responses made up of synthetic examples (based on optimal experimental design and using Monte Carlo simulation) to compensate for lack of available data. The model achieved an accuracy of 89% on a test set of customers to evaluate money laundering risk which was a marked improvement from current method.
- Transaction Monitoring using Graph Convolutional Network (GCN): Transaction Monitoring is similar to Financial Forensics and uses anomaly detection to identify a very small number of illicit transactions in a vast transactional database. GCN’s are a type of Deep Learning algorithm used for analyzing datasets composed of graphs and networks and it blends itself very well to financial forensic analysis for AML purposes. (For an excellent introductions to GCN, refer to following link). Graphs and networks are useful in AML context because instead of just analyzing individual transactions using a rule based approach, GCN can allow compliance staff to visualize and analyze relationships between transactions and accounts and assess their context which is needed for money laundering identification. In a recent paper recently published by MIT-IBM, GCN’s were used to analyze bitcoin transactions and distinguish between legitimate and illicit transactions with significant success rate outperforming traditional methods like logistic regression.
- Summary: Money laundering is a massive drain on the world’s financial, legal and economic institutions and current rule based AML controls with a false positive rate of 90% are just not adequate to detect and monitor them. AML is ripe for disruption and innovation through use of Artificial Intelligence (AI) and Machine Learning (ML) and even the regulators are encouraging the same. Key areas of AML where AI and ML have been shown to work through recent published papers are risk scoring; customer segmentation and transaction monitoring using clustering (k-means); classification (support vector machines) and deep learning (graph convolutional networks). These approaches shows us a glimpse of the near future state of AML controls and how new technology can help solve the seemingly insurmountable problem of money laundering as it exists now.
 U. N. O. on Drugs and Crime, “Unodc annual report 2014,” Online, 2014, accessed on jul. 10,2015. [Online]. Available: https://www.unodc.org/documents/AnnualReport2014/Annual Report 2014 WEB.pdf
 Keyan, Liu and Yu Tingting. “An Improved Support-Vector Network Model for Anti-Money Laundering.” 2011 Fifth International Conference on Management of e-Commerce and e-Government (2011): 193-196.
 Tang, Jun & Yin, Jian. (2005). Developing an intelligent data discriminating system of anti-money laundering based on SVM. 3453 – 3457 Vol. 6. 10.1109/ICMLC.2005.1527539.
 Salehi, Ahmad, Mehdi Ghazanfari and Mohammed Fathian. “Data Mining Techniques for Anti Money Laundering.” (2017).