A Canonical Data Model, the missing link within a Service Oriented Environment

Chris Judson has given an interesting presentation regarding a Canonical Data Model within a Service Oriented Architecture.

First he gave an example of the different aspects and problems you could be facing when defining the existing arhictecture and business flows within an organisation.

One of the aspects that’s needed to accomplish this, is getting IT and business to consolidate and collaborate with eachother to have a clear understaning of today’s architecture and the goals defined for the future.

The Canonical Data Model will define a common format to describe business entity within the enterprise wide organisation, as well for business as IT.

Take aways from this session:

  • The CDM will reduce the interface maintenance and encapsulate business logic in one central place
  • Put the CDM on the bus: you can plug in new applications to listen to existing events, without the need to define a new format for the new consumer + there’s a common understanding of the data model for as well business as it
  • Use the 80/20 rule to define a CDM: First you take all the unique identifiers combined with a super set of data which will be used by most consumers. In other words, if 80% of the consumers have the needed data within the CDM, the 20% can be delivered using the enrichment pattern, without the need to enlarge the payload of the CDM
  • Managing change is hard within such a model, because the dependencies between several applications are mostly high. To manage change, the 80/20 rule is applicable as well. When 80% of the consumers need new attributes, changes in the existing attributes, … the CDM can be changed. The other consumers can be delivered the same functionality using the enrichment pattern again.
  • For schema versioning the Format Indicator Pattern is mostly used
  • Use generic XML types for the XSD instead of DB specific types
  • Use declarative namespaces to manage data domains to have a generic enterprise wide data definition strategy in place

The presentation of Chris was very enlightning, because a lot of these tips & tricks are valuable for each design or implementation using XML type data and service enablement.


7 thoughts on “A Canonical Data Model, the missing link within a Service Oriented Environment

  1. Hi Nathalie,Nice post! I have a question concerning the representation of relationships in a canonical data model and in entity-centric services…Say you defined several re-usable entity-centric services (Customer, Order, Product, …). These services encapsulate the domain logic for the specific entity, are application-independent and offer coarse-grained, document-centric operations to their consumers. The document-centric messages, used to interact with these services, use the canonical data model as much as possible.So how would you implement inter-service relationships?Basically, the Customer and the product are pretty self-contained but the Order isn't. Every order is associated with 1 customer and 1 or more products.So, when designing the Order entity in the Canonical Data model, we have to make a decision on how to represent these relationships. I think the options are:- use of business key: the Order type contains a business key for the representation of the customer and products.- type reference: the Order type contains a reference to the Customer type and a list of Product types (these types are owned by other departments and defined in another namespace)- type redefinition: the Order type still contains a reference to the Customer type and a list of Product types but the Customer type and Product type are defined in the order namespace and only contain the actual elements that are relevant for the Order serviceThe first option (business key) seems to reduce the coupling but might also result in a more "chatty" model since an Order might have to be "enriched" with the actual customer and product information in order to process the order (for example: if the discount for an order depends on the customer status, we would need to fetch the customer separately before processing the order, or if you want to fetch a list of orders with their customer information).The second option (type reference) seems to be very tightly-coupled, since every update of the Customer type or Product type, will result in a new version of the Order type. Furthermore, the Customer type and Product type contains much more information then required by the Order type…The third option (type redefinition) seems to solve some of the coupling issues, encountered in the second option, since the Order would define how a customer and a product is represented. The problem with this is that additional transformations are required and the model becomes less canonical…I guess we can also try to keep our canonical model as simple as possible and offer enrichment services?Furthermore, since we try to use the Canonical Data Model for the messages of the entity-centric services, this also impacts how the service implementation stores the relationship… Would it be recommended to just store the business key in the entity-service order database (and possibly retrieve the actual referenced entity in real-time) or would it be better to store a copy of the referenced entity (and set-up a message-based data replication strategy –> MDM style?).Do you know of best practices for this?Thanks!Stijn

  2. Hi Stijn,You've actually answered your questions yourself. It was very interesting to read your comment and your thinking process ;o)I would say that indeed you could simplify the canonical data model, using business keys when needed and through the enrichment pattern add the needed data of the customer or product where needed.That means you can keep the complexity to a minimum for the canonical data models, use the enrichment pattern where more information is needed and afterwards when more consumers are interested in this information you can hook them up at the level required. If they consumer is only interested in the order or product, only that information is given using the simplified canonical data model.Could you elaborate for which business needs you would recommend the MDM style approach? I'm not sure why you would want to set-up a replication mechanism.

  3. I'm not a big fan of the replication style since it introduces complex synchronization issues but it might be useful to reduce the overhead of real-time (distributed) service invocation.Imagine that 80% of the consumers of the order service require additional product information (name, price, …). If the order service only stores the business key of the associated product, the enrichment pattern will need to retrieve the additional product information at runtime from the distributed product service. This causes overhead…In this case, and combined with the fact that the product information doesn't change very often, it might be beneficial to store a simple copy of the products in the order database and use a message-based replication mechanism. This also reduces the coupling of the order service on the product service (the order service is not impacted when the product service is down)…Another use-case could be the integration of software packages, which require a dedicated copy of the data (for example when integrating Oracle Financials in a service oriented environment, it is not possible to direct Oracle Financials to use the existing customer service. In stead, Oracle Financials requires an internal copy of a customer. And that's again where MDM data replication can be used)…Would you take a similar approach to solve these issues?

  4. Hi Stijn,As you mention in your case, if 80% of the consumers need that detailed product information, I would say, use the enrichment pattern, because the replication mechanism will give you more overhead in terms of maintainability, release management, versioning, …If only 20% of the consumers would be interested in the detailed information, I wouldn't want the replication mechanism either because it adds a lot of complexity to the system without a specific need for it.Your talking about distributed data sources in which case customer is persisted in another data store then product, then order for example. Using the JCA-adapters you would retrieve this information from the different sources depending on your need.The message-based retrieval could be a solution as well. You could be accessing the product database for the specific business key, put the information on a queue and use that information each time you need to get the product data using the business key.But how would you trigger your 'copied' data when it isn't valid anymore, when it's out of sync? When your primary driver for the 'replication'-mechanism is overhead and performance I would be thinking of a caching mechanism that stores the data in memory and keeps a 'versioning'-mechanism to see if the data has changed. In that way the data source is only called when needed. For example using Coherence and Grid Control to be able to work on real-time data without the need to get the data each time you want to retrieve it.Regarding the Software Packages integration, we can have a look at the AIA (Application Integration Architecture) patterns being used and how they solved these issues using the Process Integration Packs (PIP's).I will surely follow up on this, it's an interesting business case!

  5. Hi Nathalie,Thanks for your interest! And thanks for coming up with the caching suggestion. That's very similar as to how I was thinking to implement this ;-)Basically, for the order service to be able to use customer data flexibly and to have up-to-date information (without the overhead of constantly invoking the product or customer service every time it needs a small piece of info), I could setup a cache within the order service. This way, the order service will have the customer and product data but this data is not a first-class citizen within the order service and I can use messaging to improve the quality / independence of the cache.On a side note, I would be really interested to hear how you would apply the enrichment pattern (from a practical point of view)… How would this improve the problems with accumulated network latency and runtime inter-service dependencies, which you would encounter when the order service would store the business keys and contact the customer or product service directly to resolve these business keys?Could you maybe point me to some documentation regarding this enrichment pattern?Or maybe you could write an article on how this is best implemented ;-)I'll keep you posted on how my practical SOA adventures evolve…Thanks!

  6. Hi Stijn,Sorry for the late reply, having different projects and activities going on, always takes up to much time to do the fun stuff ;o)The enrichment pattern is actually a pretty simple pattern: you have some key values in your data object, then the data source is called to get the detailed information, that information is enriched (added) into your data object which will then give you an enriched SDO.I wonder if that already exists: Enriched Service Data Object ESDO, otherwise, it should be ;o))If you have a look at the Enterprise Integration Patterns, the content enricher pattern is defined as follows: The Content Enricher uses information inside the incoming message (e.g. key fields) to retrieve data from an external source. After the Content Enricher retrieves the required data from the resource, it appends the data to the message. The original information from the incoming message may be carried over into the resulting message or may no longer be needed, depending on the specific needs of the receiving application.I've defined a post on the blog regarding enriching messages within an ESB flow, using Enterprise Service Bus (which is know rebranded to Oracle Service Bus).Kind regards,Nathalie

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

About nathalieroman