14th May 2020
Why Data Collection Relies on Speed
Data is made up of qualitative and quantitative bits of research which is updated every second of every day, all over the world. Covid-19 has highlighted the importance of organizations being able to utilize data quickly and effectively to make business decisions urgently in times of adversity.
When collecting data, there are many different methods organizations could use. The most common are human-led approaches such as keyword searches using search engines. This can be insufficient when trying to cover different versions of the same words, with minimal context (we’ve gone into detail on that here).
The most effective method to use is machine learning and there are multiple reasons as to why it is so important for data collection.
Collecting and reviewing data across adverse media, sanctions, PEPs and RCAs can be difficult and time-consuming, even more so when it is human-led. Depending on people for operational efficiency can increase the risk of false positives, especially when organizations rely on manpower to input information in order to get results.
Moving compliance to remote working during the pandemic has created its own difficulties for some industries. For some tools that rely on researchers to build their data manually, this has seen a 200% increase in false positives where providers have been unable to rely on trustworthy, machine learning tools.
Data collection with AI and machine learning tools is an alternative way to effectively gather the information organizations need in an automated way – ensuring business continuity during times of uncertainty.
Why keyword only search tools are ineffective
It can be difficult to visualize why keywords are ineffective on their own. When onboarding clients you need to be able to refer to an adverse media database that has no glaring errors or gaps.
If a simpler system is used instead, which only processes keyword filtering, for example, only a select number of media sources are able to be captured and analyzed which doesn’t necessarily provide the whole picture. Because only those sources containing the keywords will be flagged, some will inevitably slip through the net. This in turn can create high numbers of false-positive results.
Following on from the above, once this limited set of media has been keyword filtered you now have adverse media results containing some true adverse media and some non-adverse media. This approach results in a lot of noise as not all search results are applicable or relevant. As a compliance team, it would take more time and resources to be able to filter through all of these.
For the adverse media that isn’t included in the selected sources, to begin with, you won’t be able to detect it at all. You won’t even know that it exists. You’ve just left an open door for those entities to enter your business.
Is using keywords in combination with analysts more efficient?
Alongside using keyword searches, analysts can also be used for manual data filtering and checking. Analysts are able to take information at the ‘chainlink fence stage’ and filter data.
Unfortunately, this approach can also have its limitations. Analysts are manually inputting and producing the results, meaning data cannot be updated in real-time and can also include their own biases when filtering through onboarding profiles. They may miscategorize some articles as adverse (or not) and those individuals that have been negatively impacted by being wrongly labeled may now be faced with ongoing financial issues.
Machine Learning and Automation
It is apparent that machine learning has superiority over the methods mentioned above. Using machine learning to create an adverse media database and filter entities effectively minimizes human error. The limit here is actually the amount of media produced on any given day by media sources. There is no need to select just a portion of media to examine, all of it is available to be reviewed.
The reduction of human interruption at this stage also means that adverse media is able to be identified constantly and without time limits, irrespective of external factors.
Once the adverse media is identified, the information is extracted and assigned to profiles matching real-world entities without creating duplicates. Then, when entities with negative news articles attempt to onboard your business, they’re easily identified and rejected as necessary by compliance teams.
Creating an Effective Adverse Media Database
Taking advantage of machine learning (such as our AIM product) is necessary for many reasons. It creates an effective adverse media database by examining all the data available. It works constantly, identifying the data that matters and presents it in a format that is easily accessible. It is here that it is also able to be used in conjunction with our advanced name matching and person identifying systems.
Machine learning provides entity profiles that can rapidly be used to decide whether or not to onboard a customer according to the risk appetite of the individual business.
Without using machine learning-based solutions, businesses will never be able to identify all of the potential customers who pose a threat to compliance obligations.
But, machine learning tools are not just useful for adverse media products. Sanctions and PEPs information also changes regularly and businesses need to make sure that their data is up to date to avoid a breach or from making overly risky decisions.
Sanctions data sees a constant stream of change. It’s a critical data input for screening that operates at a binary level, if an entity is sanctioned then the business cannot work with it.
However, despite its level of importance, there is no unified structure on how sanctioned data is delivered to businesses. Lists can be changed without short notice and amendments can be similarly chaotic in delivery.
Our adverse media product is used in conjunction with manual reviews, creating the most accurate data set which is updated automatically on an ongoing basis at unprecedented speed. Essentially, by having both reviewal methods working together – workload can be reduced.
Politically exposed persons (PEPs) are constantly changing positions, ranks and introducing new players to the field. Data collection tends to move as fast as politics and is in frequent need of updates.
Critical PEP coverage is key to every financial service business. Using machine learning, we are able to obtain the information of all 245 countries and jurisdictions for critical class 1 PEPs across the executive, legislative and judiciary branches (there’s more on that here).
Machine learning allows us to monitor these positions continuously and be aware of changes in real-time. We also perform automatic updates every 30 days to ensure that nothing is missed. This means that unannounced changes are discovered immediately and in case of significant changes, such as an upset election, we’re able to have new and updated PEP data coverage in a matter of hours after the results have been confirmed.
When it comes to PEP coverage, in particular, speed is key. If businesses move too slow, they may accidentally onboard a client who’s too high-risk for their approach.
Manual and Automated processing – Fingerprints and Sanctions
We can also control the deletion of sanctioned entities and update data in a timely fashion for all clients through manual monitoring. Our Fingerprints tool allows us to support multiple websites and sources.
Fingerprint tools are used to monitor manual sanctions sources. It scans sources every couple of hours and sends a notification to trigger an update whenever something is altered online.
Once Fingerprints have triggered an update, we verify that only correct data is drawn into the production of the sanctions dataset through a proprietary manual review tool.
This allows us to identify, track, prompt, and log the activity. Using this process we have total control over the sanctions data prior to it reaching production which allows us to identify mistakes made by regulators and if needed, prevent delays on manual sanctions source updates.
If you’re working with inaccurate or incomplete data, no matter how good internal processes are, they will be difficult to review. Human intervention is a necessary step when filtering niche data such as sanctions lists and PEPs to ensure its quality. But, without machine learning to monitor and react to data changes, no human-led research can compare with the speed, breadth, depth, and accuracy of automated data collection.