domain platform Commons: 3/5

Canonical Data Model

Also known as: Common Data Model

1. Overview

The Canonical Data Model is a design pattern that addresses the challenge of integrating multiple systems that have different data formats. Instead of creating a direct mapping between each pair of systems, which leads to a combinatorial explosion of translators, the Canonical Data Model introduces a common, standardized data format. Each system then only needs to be able to translate its data to and from this canonical format. This pattern is a form of enterprise application integration (EAI) and is often used in the context of message-based middleware and Enterprise Service Buses (ESBs) [1, 2]. The historical origins of this pattern can be traced back to the need to simplify the increasingly complex integration landscape in large enterprises.

2. Core Principles

The core principles of the Canonical Data Model pattern are:

  • Standardization: A single, common data model is defined for the entire enterprise or a specific business domain.
  • Abstraction: The canonical model is independent of any specific application’s data model, providing a layer of abstraction.
  • Mediation: The canonical model acts as an intermediary, with each application responsible for mapping its data to and from the canonical format.
  • Reduced Coupling: By decoupling the applications from each other’s data formats, the overall system becomes less brittle and easier to maintain.

3. Key Practices

In a distributed system landscape, applications and services often have their own, proprietary data formats. When these systems need to communicate and exchange data, a direct translation between each pair of systems is required. As the number of systems grows, the number of required translators increases quadratically (O(n²)), leading to a complex and unmanageable integration architecture. This tight coupling between systems makes the overall solution brittle, as changes in one system’s data format can have a cascading effect on all other connected systems.

4. Implementation

The Canonical Data Model pattern solves this problem by introducing a shared, common data model that is used for communication between all systems. Each application is then responsible for creating a translator that can convert its own data format to the canonical format and vice versa. This reduces the number of required translators to a linear function of the number of systems (O(n)). The canonical model acts as a lingua franca, enabling seamless communication and data exchange between disparate systems without them needing to know the specifics of each other’s data formats.

5. 7 Pillars Assessment

Pillar Score (1-5) Rationale
Purpose 3 Serves a clear technical purpose in system design
Governance 3 Can be governed through standard engineering practices
Culture 3 Supports engineering culture of reliability and quality
Incentives 3 Aligns incentives toward system stability
Knowledge 4 Well-documented pattern with extensive community knowledge
Technology 4 Directly applicable to modern technology stacks
Resilience 4 Contributes to overall system resilience
Overall 3.4 A valuable technical pattern that supports commons infrastructure

The primary trade-off of the Canonical Data Model pattern is the upfront investment required to design and agree upon a common data model. This can be a complex and time-consuming process, especially in large and diverse organizations. For a small number of systems, the initial effort might seem to outweigh the benefits. However, as the number of integrated systems grows, the pattern’s advantages in terms of reduced complexity and maintenance overhead become increasingly apparent. Another consideration is the potential for the canonical model to become a bottleneck or a single point of failure if not designed and managed carefully.

6. When to Use

  • Financial Services: In the financial industry, the Canonical Data Model pattern is often used to integrate various trading, risk management, and accounting systems. A common data model for financial instruments, trades, and counterparties allows for seamless data flow and consistent reporting across the enterprise.
  • Telecommunications: Telecommunication companies use this pattern to integrate their billing, customer relationship management (CRM), and network provisioning systems. A canonical model for customer data, service subscriptions, and usage records ensures data consistency and simplifies business processes.
  • Healthcare: In healthcare, the HL7 (Health Level Seven) standard can be seen as a form of a canonical data model for exchanging clinical and administrative data between different healthcare providers and systems.

7. Anti-Patterns & Gotchas

In the cognitive era, with the rise of artificial intelligence (AI) and machine learning (ML), the Canonical Data Model pattern becomes even more relevant. AI and ML models often require large volumes of clean, consistent data for training and inference. A canonical data model can provide a standardized and reliable source of data for these models, ensuring data quality and reducing the data preparation effort. Furthermore, as AI-powered services become more prevalent, the need for seamless integration between these services and traditional enterprise systems will grow, making the Canonical Data Model a crucial enabler for building intelligent and interconnected platforms.

8. References

The Canonical Data Model pattern aligns well with the principles of the Commons, particularly in the context of building open and interoperable platforms. By promoting a shared, standardized data model, the pattern encourages collaboration and reduces the barriers to entry for new participants. This aligns with the principles of Shared Resource and Equitable Access. The governance of the canonical model itself can be a form of Democratic Governance, where stakeholders from different parts of the ecosystem come together to define and evolve the common data format. The long-term Sustainability of the platform is enhanced by the reduced maintenance overhead and increased flexibility that the pattern provides. Finally, by enabling seamless integration and data exchange, the Canonical Data Model contributes to the overall Community Benefit by fostering a more vibrant and innovative ecosystem.

References

[1] Enterprise Integration Patterns. (n.d.). Canonical Data Model. Retrieved from https://www.enterpriseintegrationpatterns.com/patterns/messaging/CanonicalDataModel.html

[2] Wikipedia. (n.d.). Canonical model. Retrieved from https://en.wikipedia.org/wiki/Canonical_model