3.1: Calibrating models: Safety in numbers?

When it comes to model calibration, banks sit between a rock and a hard place. Calibrate too aggressively and they’re swamped with alerts. Too loose and they are hit with fines. The regulators are on the case, too.

One of the most significant problems faced by organisations trying both to comply with regulations on surveillance and to identify genuine instances of misconduct, is the number of false positives generated by monitoring systems. Millions of alerts are generated and every single alert needs to be assessed at some level. At the end of the review process, less than 0.01% of these alerts are submitted as a STOR (Suspicious Transaction and Order Report).

This creates a huge resource demand and raises a number of questions about the whole surveillance process – whether it should focus only on high-risk alerts, whether all alerts need to be reviewed, and whether AI and other new technologies can be used to reduce the problem. Another fundamental question concerns system calibration: are the alert parameters in these (usually third-party) solutions being tuned correctly. 

Regulators are concerned
This is certainly a situation that concerns the regulators. In February 2019, Julia Hoggett, director of market oversight at the FCA, raised the issue: “Firstly, in the context of firms using off-the-shelf calibration settings for their alert parameters and using ‘average peer alert volumes as a measure of the appropriateness of their calibration’, we observe firms taking comfort from the perception that ‘others are also failing’ in the same way that they are.” 

“This can become an incredibly self-referential cycle – ‘I match my standards to others and then say I am not failing any worse than any others because I match my standards to theirs.’ As a regulator, I must say there is something rather depressing about that logic.”

And in Marketwatch 56, the FCA[1] has also stated publicly: “Relying on peer standards, such as popular ‘out of the box’ alert settings, average peer parameters and average peer output volumes, will not necessarily satisfy MAR requirements. In particular, firms may not meet the requirement that each firm’s surveillance arrangements, systems and procedures are appropriate and proportionate to the scale, size and nature of their business activity.”

Easier said than done
The problem is that this understates the difficulty of such calibration. Clearly, if firms are swamped with benign alerts, it may mean that they are calibrated too aggressively and should relax their criteria to focus only on the higher risk alerts. However, few firms are willing to dial back calibration because of the fear that they are penalised the other way.

So, for example, in the US in 2017, J.P. Morgan Securities[2] accepted an $800,000 fine for inadequate pre-trade controls and post-trade surveillance detected across transactions between 2012 and 2016. Part of the ruling stated that: “During 2015, JPMS used a series of post-trade surveillance reports run by a commercial non-proprietary third-party surveillance system to monitor and review customer trading activity to detect, escalate and ultimately prevent potentially violative or manipulative trading activity, including layering and spoofing. Pursuant to the parameters in the third-party surveillance system utilized by the firm, several thresholds must be met in order to generate layering and spoofing alerts on the firm’s exception reports. Certain of these thresholds, however, were set at levels that were unreasonable to detect activity that may be indicative of layering and spoofing activity.”

Not-so happy medium?
Caught between the rock of the alert factory and the hard place of that kind of enforcement action, both of which are arguably positions created by regulatory requirements not sophisticated enough for today’s markets, it’s easy to see why banks have taken the option of using average peer parameters.

Third-party solutions have hundreds of customisable parameters and many of them collect information across their user bases and indicate where the user base is setting alerts. So, for example, in a particular market, a significant trade size for Bank A might be $100 million, and for Bank B it might be $50 million. If the average setting across all the users of the system is $100 million, and Bank B simply accepts that average, then there is a mismatch because they may do no trades at those levels. 

This is not uncommon practice outside the large institutions. According to one global head of monitoring, surveillance and controls at a leading global bank: “What you will find is that many of the banks would have just taken that middle figure and said: ‘We’re safe in numbers, we’ll stay in the pack here and stick on that.’ And they will run their alerts on that basis.” 

Dynamic calibration
The leading banks do not take that approach. They calibrate for their business according to the asset class, and its characteristics including liquidity and the type of client on a continual basis. 

One says: “We employ individuals who have shown a particular aptitude in surveillance and who understand the mechanism of how a particular alert operates. They assess which ones work particularly well and which ones don’t. Some banks switch all 120 on because they feel they have to; we took a snapshot and said: ‘Okay what are the scenarios that concern us?’”.

Surveillance functions need to work rigorously to analyse the output of their alert systems and work with the vendors to ensure that alert parameters continue to match the bank’s activities but also to spot patterns that may indicate issues to be investigated and fed back into the model. 

“Calibration of third-party solutions is a challenge,” says the head of electronic trading risk and controls at a global bank. “What should the exact calibrations be? How often should you revisit those calibrations? In our view, every three to six months you should be running a calibration review.”

1LOD versus 2LOD
In order to do this, while these systems sit in the 2nd line, 1st line expertise may be needed to input into calibration, as well as playing a key role in real-time monitoring. As one surveillance chief says: “In an ultra-fast market, if you’ve got someone that’s going to chuck 500,000 orders at you in half a second, there’s no point compliance coming along tomorrow to say: ‘I think there was a problem yesterday at 10.30 a.m.’ There needs to be a way to flag unusual activity, unusually rapid increases in trade activity, unusual ratios in the 1st line, so that they can press a button and put a stop to it.’”

As well as the complexity of having to potentially involve the 1st line and vendors, dynamic calibration runs into the problem of internal model governance following demands from a number of different regulations, which were initially prompted by a number of notorious, large loss-making events at systemically important banks. Over time, the idea of ‘model’ has been extended so that for a number of banks, surveillance models must be overseen by formal governance. Any changes to the parameters in a surveillance model are likely to have to go through a formal review process. That is a significant brake on the calibration process and makes truly dynamic calibration problematic. 

A calibration-free future?
At one large bank, the controls head has high hopes for a pattern recognition-based alert system that does away with the need for a traditional calibration process, saying: “The system that we have developed in-house [to proof-of-concept level] does not need calibration. The pattern recognition approach identifies suspect trades and passes those alerts on to be validated or not.” 

If these systems turn out to work better than, or at least as well as, existing systems, then not only might they begin to solve the problem of the alert factory, they may also consign traditional calibration to the dustbin of history.

We’ll be hosting a Boardroom Debate: Trade Surveillance at the Surveillance Summit, March 18th, London. Participating in the debate is Hammad Hanif, Head, Trade Surveillance & Benchmarking Monitoring, Markets, Private Side & Treasury, Lloyds Banking Group and Michael Jones, Managing Director, Global Head of Independent Review Group, Deutsche Bank. Find out more here.

Based on an international benchmarking survey collecting the views of industry leading experts from 15 of the largest financial institutions globally, the 2020 Surveillance Benchmark Report provides a unique insight into the maturity and development of surveillance functions over the last 12 months, as well as predictions for the future. Including in-depth commentary from regulators, practitioners, consultants and technology experts, it is the only report for professionals in the industry.

Lead sponsor


Partner sponsors



Eventus Systems logo 1


Researched and published by