Reference Data Management

A Reference Data Management (RDM) framework is a structured and organized approach to managing reference data within an organization. Reference data refers to the static data that provides context or codes for other data in a system, but it doesn’t change frequently. Examples include country codes, currency codes, product classifications, and other standardized lists. Implementing a Reference Data Management framework helps organizations maintain consistency, accuracy, and reliability in their reference data, leading to improved overall data quality and better decision-making processes. This includes aligning reference data policies with overall data governance principles and ensuring consistency across different types of data.

What is Reference Data?

Reference data is essentially the information used to categorize other data within a system or organization. It serves as a framework for organizing and understanding different types of data. This could include things like codes, classifications, identifiers, or any other data elements that provide context or meaning to other data.

Please see the below example where there are 2 sources with inconsistent and values which are not understandable by a business user. The source system, therefore, requires a clean-up and this is where Reference data can assist. The Reference Tables to the right of the diagram shows how the source system values can map to a list of valid values.

Why do I need Reference Data?

Below are some key reasons as to why Reference Data is important as part of any data platform.

  • Reference data are distinct sets of reference data values stored in systems across the enterprise, available as drop-down or pick lists that allow users to select individual values in the application user interface.
  • Reference data plays a fundamental role in data management, system interoperability, and maintaining consistency and accuracy in various applications.
  • Reference data serves as a critical foundation for decision-making, analysis, and information exchange across different domains.
  • Reference data enhances data analytics by providing a consistent and standardized framework for data management, integration, and analysis. These benefits ultimately lead to more informed decision-making, improved operational efficiency, and better customer experiences.
  • Reference data validation (conformance) is in place align with the Enterprise List values and Standard, change management in place to ensure Reference Data is appropriately updates and integrated in the local system/process and downstream impacted areas.
  • Reference Data Quality measurement embedded in on-going Data Quality measurement and monitoring.
Business Benefits
 Some of the key business benefits of well governed reference data are shown below.
  • Data Consistency and Accuracy – Consistent reference data helps prevent data errors and discrepancies, ensuring that analytics results are accurate and reliable.
  • Enhanced Data Integration – Reference data simplifies data integration processes, reducing the time and effort required to combine data from various sources.
  • Improved Decision-Making – Standardized reference data enables more informed and timely decision-making, particularly in scenarios involving financial, market, or operational data.
  • Efficient Data Governance – Reference data supports data governance efforts by enforcing data standards, facilitating data lineage tracking, and ensuring data privacy and security.
  • Enhanced Customer Experience – Leveraging reference data in analytics leads to a better understanding of customer behaviour and improve end user experience.
  • Cost Reduction – Reference data-driven analytics helps reduce refactoring of hard-coded queries with embedded reference codes, saving costs and providing efficiency.
  • Compliance and Risk Management – By incorporating reference data into compliance processes, organizations can reduce the risk of regulatory fines and reputational damage.

Characteristics

Reference data exhibits the following defining characteristics:

  • Semantically Stable – reference data codes rarely change their business meaning.
  • Relatively Static – reference data codes are usually static and change mainly due to business process changes.
  • Constrained – reference data codes belong to a defined domain or set of permissible values against which they can be validated.
  • Cross domain – reference data are often shared across multiple business domains.
  • Limited cardinality – reference data sets are a small set of unique codes.
  • Sourcing – reference data lists can be sourced externally or created internally.
Categories
Main categories of reference data are:
  • Enterprise Governed Reference Data – Enterprise reference data governed lists (ERDGL) managed centrally by EDG. These lists are of mainly 2 types:
    • Conformance Checked
    • General Reference
  • System Reference Data – Reference data lists managed locally within source systems.
  • Reporting Reference Data – Reference data lists managed within analytic systems for reporting hierarchies, filtering, grouping and dimensional analysis.
  • Operations Reference List – Reference data lists used by platform operations to categorise audit logs, control processes, etc.

Business Requirements

Table below lists high level business requirements for reference data management.

 Requirement Category Guidelines & Examples
1 Ability to record reference data code lifecycle events, such as activation, deactivation and expiry dates., Lifecycle Management Reference codes can be created, activated, deactivated, migrated and retired. These lifecycle events should be recorded with sufficient detail to enable accurate point in time reporting.
2 Ability to manage complex hierarchies within reference data lists for reporting purposes. Hierarchy Management There can be multiple hierarchies of same reference list required by different business areas for reporting rollups/drill downs etc.
3 Ability to maintain versions of reference code changes made by source systems, to enable accurate historical reporting. Version Management These changes can include merging or splitting of reference codes over time. Tracking of changes over time is needed for accurate historical representation.
4 Ability to maintain code mapping between equivalent reference codes from different systems across business areas. Integration Often same reference code values can have different interpretations or semantic meanings in different business areas and need some form of mapping or equivalence. This can also occur when legacy applications are migrated to new systems. Equivalent code mappings can be in form of code crosswalk or translation tables.
5 Ability to monitor and record data quality of reference data lists, based on specified rules, criteria and metrics. Quality Management Reference data quality monitoring can include issues like invalid, duplicate and non-conforming reference codes.
6 Ability to link reference data lists and individual codes to related business terms defined in Enterprise Business Glossary. Governance Business terms can be linked to reference lists at both table and individual code level. For e.g. customer status list may be linked to business term "Customer Status", but a specific code value "Active" may also be defined as a business term "Active Customer", if it has an important shared business meaning. Reference lists need to be checked for conformance with enterprise governed lists where applicable. Any code variations need to be approved and recorded as exceptions.
7 Ability to demonstrate conformance with governed reference lists including any approved exceptions. Governance Reference data term owners and data custodians need to be notified for any major changes to reference lists, such as new lists sourced from external providers, migration of legacy applications, new custom hierarchies added to a list, etc.
8 Ability to provide notifications to reference list data term owners or data custodians for any specified lifecycle events. Lifecycle Management Reference data term owners and data custodians need to approve access to their reference lists, managed via RBAC access controls.
9 Ability to provide role based access control to reference data lists. Governance Reference data lists should have data accountability and stewardship, by linking them to identified data domains, data owners and data custodians.
10 Ability to identify and record a data domain and related data owner for each reference data list. Governance Reference data is heavily used for data classification purposes by analytics processes such as Bl reporting and machine learning models.
11 Ability to identify reference data lists used by critical business processes. Governance Any reference lists used by processes that contribute to critical business decisions should be clearly identified.
12 Ability to describe reference data lists by attaching sufficient metadata. Governance Reference data lists should have minimum mandatory technical and business metadata as per standards. This can be supplemented by additional metadata as required by the platform.
13 Ability to share reference data lists with consumers in a standard format. Consumption Reference data lists should be accessible to authorised users and processes, by means of a publish/subscribe or a push distribution model in a standard machine readable format.

Leave a Comment

Your email address will not be published. Required fields are marked *