The essential guide to AML data: 9 key data types that drive smarter decision-making

Insights The essential guide to AML data: 9 key data types that drive smarter decision-making

AML programs are driven by data. They rely on it to detect suspicious activities, identify potential risks, and ensure compliance with evolving regulatory obligations. But what kind of data is needed to help your compliance team see beyond the surface and make informed decisions about your customers and their transactions?

To help answer this question, this article will cover:

The importance of comprehensive data.
The 9 key data types that, when viewed together, help paint an accurate risk picture.
What “good” looks like in relation to these data types.
The kind of insights that can be produced from them.
Key questions you can ask to assess a vendor’s data retention and governance capabilities.

Why does comprehensive data matter?

A main component of quality data is its comprehensiveness – not just the breadth and depth of information collected but also how seamlessly it integrates with other data points to enrich its information. When viewed holistically, comprehensive data allows firms to construct a full, accurate picture of potential risk, turning disparate pieces of information into insights that allow analysts to make informed decisions. This integration is necessary for generating meaningful outputs, such as real-time risk scores, which empower firms to take a risk-based approach to alerts.

On the other hand, gaps in data coverage can leave firms vulnerable. Missing or incomplete information may result in undetected instances of money laundering and regulatory breaches. For example, in 2023, a subsidiary of a major commercial bank was fined $25 million by the Financial Crimes Enforcement Network (FinCEN) for failing to fully integrate crucial customer data from the know your customer (KYC) process into its risk assessment and transaction monitoring systems. This data gap prevented the firm from detecting suspicious activity and filing suspicious activity reports (SARs) within the required timeframe.

Compliance failures like this often stem from siloed data and disparate platforms that fail to integrate various types of relevant AML data, a challenge identified as the major limitation for compliance leaders in our State of Financial Crime 2025 survey. In fact, the top three concerns revealed by compliance leaders were:

Siloed datasets (45 percent).
Lack of real-time visibility into risks (45 percent).
Comprehensiveness and/or quality of data (44 percent).

What are the main limitations to your organization’s current approach to financial crime detection?

These issues highlight a critical problem: the inability to connect high-quality data and quickly draw inferences from it.

If your firm is actively looking to combat this challenge, understanding which AML data types are essential to creating a comprehensive risk picture is a good place to start.

AML data types across key compliance stages

While various data types are often used across multiple stages of the AML process, the table below shows how nine different kinds of data relate to some of the larger compliance activities that occur throughout the client lifecyle.

Stage in the AML process	Data required	AML purpose
Customer onboarding	Customer information BOI PEPs & RCA data Geographic risk data	Customer identification, KYC/KYB onboarding, risk profiling: Used to verify customer identity, assess ownership structure, and identify higher-risk individuals like PEPs or RCAs.
Sanctions screening and risk checks	Sanctions & watchlist data Adverse media	Sanctions screening, name matching, and customer risk assessment: Ensures customers, payments, and counterparties are not on sanctions lists or linked to negative media or high-risk individuals like PEPs.
Ongoing monitoring	Transaction data Behavioral data	Transaction monitoring & anomaly detection: Tracks customer transactions and behavior to detect suspicious patterns, unusual activity, and deviations from expected behavior.
Historical review & investigations	Historical data	Audit trails and regulatory investigations: Provides a historical view of customer activity, profile changes, and past transactions to support audits and regulatory investigations.

But what are the hallmarks of these data types? What does “good” look like in each case and what insights can typically be derived from the information provided? The next sections consider these questions in relation to each of the following:

Customer information.
Beneficial ownership information (BOI).
PEPs & RCA data.
Geographic risk data.
Sanctions & watchlist data.
Adverse media.
Transaction data.
Behavioral data.
Historical data.

1. Customer information

Know your customer (KYC) regulations make up the foundation of AML compliance. The process begins with collecting accurate and comprehensive information directly from the customer. Without this initial step, the onboarding process may stall before it truly begins.

Key customer information typically includes:

Full legal name.
Date of birth.
Residential address.
Nationality.
Occupation.
Unique identification numbers (e.g., passport number, national ID).

Once this data is collected, your compliance team’s expertise comes into play. Meticulous verification of the provided information is essential to ensure its authenticity and identify any possible risk factors. In some cases, additional inquiries into the customer’s source of funds (SoF) or source of wealth (SoW) may be necessary to build a comprehensive risk profile.

2. Beneficial ownership information (BOI)

Ultimate beneficial owners (UBOs) are individuals who ultimately own or control a company and benefit from its financial activity. However, identifying these individuals can be challenging due to complex ownership structures often designed to obscure their identities.

Key beneficial ownership information to collect includes:

The nature and extent of ownership interest.
The chain of ownership.
Percentage of shares or voting rights held.
Date when beneficial ownership was acquired.
Details about intermediate entities.
Relationships between different owners.
Any trust arrangements.
Indirect ownership structures.
Customer’s authority to appoint or remove officers or directors.
Other key decision-makers within the company.

The challenge lies not just in collecting this information but in interpreting it correctly. Your compliance team should be well-trained in recognizing red flags, such as unnecessarily complex structures or ownership chains that lead to high-risk jurisdictions. By thoroughly mapping out beneficial ownership, your team can better assess the risk associated with a business relationship and make informed decisions about customer onboarding and ongoing due diligence.

3. Politically exposed person (PEP) & relatives and close associates (RCAs)

Due to their prominent public functions, PEPs are considered higher risk for potential involvement in bribery, corruption, or money laundering. This risk often extends to their family members and close associates. Effective PEP and RCA screening requires comprehensive and up-to-date data. While the process can be complex, focusing on key elements can significantly enhance the quality and usefulness of PEP-related information. The hallmarks of “good” PEP data include:

Detailed positional information: Understanding the specific nature of a PEP’s familial and professional connections provides the context needed for evaluating potential vulnerabilities and exposure to illicit activities. It’s also pivotal for targeted risk assessments.
Data coverage: Effective PEP data management should encompass a wide range of sources and jurisdictions to ensure no critical information is missed. This includes global databases that are regularly updated to reflect new appointments, changes in status, and other relevant developments.
Transactional behavior analysis: Transactional behavior analysis involves monitoring the financial activities of PEPs to identify patterns that may indicate suspicious or illicit activities. This can include large or unusual transactions, frequent transfers to high-risk jurisdictions, or transactions that do not align with the PEP’s known sources of income.

To learn more about what constitutes “good” PEP data, read the dedicated blog written by our Regulatory Affairs Practice Lead, Iain Armstrong.

4. Geographic risk data

Because some jurisdictions have comparatively weak AML/CFT legislation, are known to be offshore financial havens, or have high levels of corruption, drug trafficking, and other predicate crimes in money laundering, a potential customer’s location factors into their risk status.

While there is no definitive global approach to identifying high-risk geographic locations, if an entity has ties to a jurisdiction that features on lists such as the Financial Action Task Force (FATF) ‘black’ and ‘grey’ lists, it is enough for it to be deemed higher-risk.

Beyond FATF lists, your compliance team may consider data provided by:

Transparency International’s Corruption Perceptions Index.
The Basel AML Index.
The Financial Secrecy Index.
The prevalence of specific predicate offenses (e.g., human trafficking, drug production).

5. Sanctions & watchlist data

Staying on top of sanctions and watchlist updates is a critical yet increasingly challenging task for compliance teams. With new sanctions designations being introduced at a rapid pace, your ability to access accurate and comprehensive data is essential for maintaining compliance. However, not all sanctions data providers offer the same level of quality, and gaps in data can lead to significant risks.

High-quality sanctions data can be evaluated using several key factors:

Accuracy: Error-free data is fundamental to effective compliance. Even minor inaccuracies can lead to missed sanctions matches or unnecessary false positives.
Coverage: Comprehensive coverage across all relevant jurisdictions ensures no critical information is overlooked. This includes sourcing data from global sanctions lists and other watchlists that align with your firm’s geographic and customer profile.
Currency: Access to up-to-date information is non-negotiable. Real-time updates to sanctions lists allow firms to respond quickly to new designations and reduce exposure to potential violations.
Networks: Understanding the broader networks surrounding sanctions targets is increasingly important. Family ties, business relationships, and other connections can reveal attempts to evade sanctions through intermediaries or proxies.

The State of Financial Crime 2025

Packed with practical tips from our team of subject-matter experts, download our fifth annual report that explores the major trends and topics set to shape the year in compliance.

Download now

6. Adverse media

Adverse media information consists of negative news or content about individuals or organizations spread through various media channels. This includes:

News articles highlighting financial irregularities, unethical practices, or scandals.
Social media posts criticizing products, services, or individuals.
Regulatory reports identifying violations of industry regulations.
Legal filings alleging wrongdoing.
Blog posts, forum discussions, and other online content related to financial crime or negative reputational issues.
Government reports and court documents containing adverse information.
Information from watchlists and blacklists.

However, when dealing with negative news screening, one of the main challenges analysts face is having to sift through vast amounts of data to identify relevant information. A major issue is the prevalence of irrelevant or noisy data. For example, searching for “Tiffany Palmer” on Google will generate over 70,000 results, even when using specific keywords like fraud or money laundering. Another challenge relates to keeping track of a customer’s risk information over time and assessing the quality and credibility of the data in question.

Adopting a machine learning (ML) approach to adverse media screening can help combat these challenges, giving your team access to unstructured data that has been pre-analyzed, categorized, and consolidated into comprehensive profiles. However, as with every other screening process, acquiring high-quality, relevant, and diverse data is crucial for training effective ML models. While various datasets exist, solving specific AML problems often requires millions of carefully curated training examples. Not every data provider will have access to or utilize such extensive datasets. In contrast, vendors that leverage their own proprietary data can offer enriched insights.

7. Transaction data

Transaction data largely consists of information referring to:

Account identifiers, such as customer account numbers, sort codes, and IBANs.
Transaction details, including the transaction ID, type (e.g., debit, credit, transfer).
The payment rail used for the transaction (e.g., ACH, FedNow, SEPA ICT).
Amount.
Currency.
Date and timestamp.
Balance information.
Counterparty information, including the merchant name or payee details.
Where available, information about where the transaction took place.

The Financial Action Task Force (FATF)’s recommendations emphasize the importance of capturing all relevant transaction data, including the originator’s and beneficiary’s details, to improve traceability. To ensure all information is complete for your risk analysis, make sure your teams are monitoring the quality of the transaction data they receive and are trained on the appropriate action to take when essential details are missing.

8. Behavioral data

While transaction data provides the raw facts of financial activities, this data alone may not reveal the full picture of potential money laundering activities. Behavioral information, on the other hand, adds crucial context by analyzing patterns and trends in customer activities over time. Behavioral data typically includes:

Transaction patterns.
Account usage.
Changes in behavior.
Peer group comparison.
Network analysis.
Device and channel information.

When behavioral data patterns are analyzed, your team can then craft customized rulesets that align with your specific customer base and risk appetite. These tailored rules enable more accurate detection of suspicious activities while reducing false positives. For example, TransferMate was able to work with ComplyAdvantage to tailor-make a rule that would detect key behavioral indicators for child sexual exploitation. Additionally, after receiving key updates from law enforcement in the field, they were able to immediately refine the rule and account for behaviors indicating abuse of younger victims. With other solutions, making the change could have taken six months or more.

9. Historical data

Historic data in AML screening is essentially a longitudinal view of customer interactions and financial activities over an extended period of time. It provides a consolidated, time-based perspective that allows compliance teams to:

Establish long-term behavioral baselines.
Identify gradual changes in customer financial patterns.
Understand cumulative risk indicators.
Track the evolution of a customer’s financial profile over time.

Historic data is also vital during audits and regulatory investigations as it essentially acts as a log of your team’s decisions.

What is data retention in AML, and why is it important?

In AML compliance, “data retention” refers to the practice of storing customer-related data for a specified period of time. As well as providing evidence for any investigations, data retention also allows firms to monitor and analyze activity for potential money laundering or terrorist financing.

Depending on the jurisdiction in which your firm operates, the period of time companies have to retain customer data can vary. For example:

The Money Laundering Regulations (MLRs) in the UK require CDD documents to be kept for at least 5 years from the date on which the transaction has completed or the business relationship has come to an end.
The Fourth AML Directive of the European Union mandates a minimum retention period of five years for personal data. However, it allows for an additional retention period of up to five years (totaling 10 years) if provided for under local legislation, but only if necessary for prevention, detection, or investigation of money laundering or terrorist financing.
Australia’s Anti-Money Laundering and Counter-Terrorism Financing Act 2006 (AML/CTF Act) requires firms to retain CDD and transaction records for seven years from the date of the transaction or the end of the customer relationship.

Strong data retention practices help ensure your team can meet its regulatory requirements, conduct effective investigations, and maintain accurate records for long-term compliance. Some key questions you can ask to assess a vendor’s data retention and governance capabilities include:

How do you maintain the accuracy, security, and integrity of retained data over time?
Can your system scale to accommodate increasing volumes of data as our organization grows?
What tools or processes could you provide to help us access historical data quickly for audits or investigations?

Maximize compliance efficiency with global AML data coverage

ComplyAdvantage is one of very few RegTech providers to hold its own financial crime risk data, alongside the software and UI layers. This means firms can access their full AML stack from one provider – no need to purchase data separately.

As a specialist in financial crime, we are experts in providing the sanctions, PEPs and adverse media data compliance teams need. Chartis’ latest analysis of the KYC Data market showed us as the sole ‘best-of-breed’ vendor, reflecting our specialist expertise in financial crime risk intelligence. Specifically, we were the only firm to receive best-in-class scores in both the ‘sanctions and watchlist data’ and ‘negative news and PEPs’ categories.

Explore how the ComplyAdvantage Mesh platform turns our proprietary data into AML risk intelligence

A cloud-based compliance platform, ComplyAdvantage Mesh combines industry-leading AML risk intelligence with actionable risk signals to screen customers and monitor their behavior in near real-time.

Get a demo

Originally published 04 February 2025, updated 11 February 2025

Disclaimer: This is for general information only. The information presented does not constitute legal advice. ComplyAdvantage accepts no responsibility for any information contained herein and disclaims and excludes any liability in respect of the contents or for action taken based on this information.

The essential guide to AML data: 9 key data types that drive smarter decision-making

The essential guide to AML data: 9 key data types that drive smarter decision-making

Why does comprehensive data matter?

AML data types across key compliance stages

1. Customer information

2. Beneficial ownership information (BOI)

3. Politically exposed person (PEP) & relatives and close associates (RCAs)

4. Geographic risk data

5. Sanctions & watchlist data

The State of Financial Crime 2025

6. Adverse media

7. Transaction data

8. Behavioral data

9. Historical data

What is data retention in AML, and why is it important?

Maximize compliance efficiency with global AML data coverage

Explore how the ComplyAdvantage Mesh platform turns our proprietary data into AML risk intelligence

Related Content

Recent AML compliance Articles

View Knowledge & training