Ringholm-Logo Ringholm
 Whitepaper
Ringholm page header
Training    Services   |   Whitepapers    Blog    Events    Links   |   About us    Partners    Clients    Contact

Common issues found in implementations of the HL7 Clinical Document Architecture (CDA)

The contents of this whitepaper are published under the Creative Commons Attribution-Non-commercial license.
See http://www.ringholm.com/docs/03020_en_HL7_CDA_common_issues_error.htm for the latest version of this document.
Authors: Rene Spronk, Ringholm, and Grahame Grieve, Kestral Computing.
Document status: Final, version 1.0 (2008-10-09)
Please send questions and comments to rene.spronk@ringholm.com.


Summary

A large number of XML instances of HL7's CDA R2 standard were examined in detail. Two different tools, based on the CDA R2 XML schema and the CDA R2 MIF, were used to identify issues and errors. The issues identified are mainly related to the intent of the CDA standard and to misinterpretations of the underlying data type standard. The paper contains recommendations as to how these issues could have been detected and avoided. The reader is assumed to be familiar with the CDA R2 standard.

1. Introduction

As part of its work products the HL7 standard development organization created a standard for clinical documents: the Clinical Document Architecture R2 (referred to as "CDA" throughout this paper) [2]. The CDA standard is based on the HL7 version 3 development framework, a framework that is also used to develop messages and services.

Because of the architectural nature of the CDA standard individual implementations are always associated with an implementation guide, i.e. a document that describes how the CDA standard should be implemented in a specific context. The context can mostly be identified by the tuple (document type, organization) where the organization may be a country, a region, a software vendor or a healthcare provider [1].

2. Method

The examples as contained in the most inclusive archive of CDA documents [1] known to the authors were examined. The archive contains CDA examples created in the context of 26 different projects from over 14 different countries. Most of the examples were created as illustrative examples by the authors of the implementation guides. The implementation guides in this archive weren't reviewed given the multitude of languages they were written in.

Three projects were excluded from the review: one (the Canadian eMS) is based on a draft version of CDA R2, and not on CDA R2 itself. For two Italian projects the archive didn't contain any CDA XML instances.

CDA documents may be validated against the W3C XML schema for CDA published by HL7, optionally with the use of a stylesheet to test for document content rules that cannot be expressed using W3C schema.

In addition, conformance to the CDA standard can be tested against the Model Interchange Format (MIF) definition of the CDA model as well as against a derivation thereof: the XML schema. The MIF is an HL7 defined format that describes the full abstract CDA RIM-based object model as created by means of the HDF.

While the MIF can be transformed into XML schema, the XML schema can't fully express the abstract model in the way the MIF can as the XML schema language is too weak to express all of the abstract model requirements as present in the abstract CDA RIM-based model definition. An XML document which validates against the CDA XML-schema is not necessarily a valid CDA document.

Conformance to the CDA specification is best tested using a MIF based validation tool given that the MIF is a full expression of the abstract CDA model. Such a tool also needs to have additional knowledge of the CDA specification, and also other HL7 specifications including the RIM, data type, and structural vocabulary to provide full validation. The examples were examined using a XML schema validator [4], a MIF based validator [3] and review of the XML document by means of a generic CDA XML stylesheet.

3. Issues found

The testing process was limited to the requirements as stated in the CDA specification. Requirements as defined in project specific implementation guides weren't tested.

Implementation guides may specify additional requirements or constraints when it comes to the static model (e.g. expressed as templates, or in the form of constrained coding value sets). They may also specify extensions to the original CDA specification (e.g. when the use of a Realm- or project specific terminology is required).

The requirements specified in a project implementation guide are quite often expressed in the form of XML Schematron rules. These Schematron rules are generally made available in conjunction with the implementation guide. Schematron files, if present in the archive of CDA documents weren't used to validate project specific constraints.

The issues found can be categorized as follows:

3.A Human readability

This category of issues is related to the requirement that all relevant clinical content of a CDA document shall be present in human readable form. The issues in this category have an impact on clinical safety and the integrity of the attested content of the document. The following issues were observed:
  1. The document contains entries without any textual representation. Some documents consist solely of entries. The standard states that all attested information must be present in human readable form.
  2. The stylesheet as used in the project adds information not present in document (e.g. the contact details of the author). This information could have been sent as part of CDA. If a different stylesheet is used, this information would be lost. The standard specifies that the attested contents of the CDA document have to be faithfully rendered without adding anything which may lead the human reader to misinterpret its contents.
  3. The stylesheet is not based on the text as present in the CDA document, but on its entries. Entries may not contain the exact same information as is present in the text - using the entries as a source for the stylesheet may lead to misinterpretation of the document contents. The entries are not an integral part of the attested content of the document. A stylesheet may only use the content of CDA document entries when the entries have a DRIV relationship with the text, i.e. if the section-text was fully derived from the entries.

3.B Text formatting

This category of issues is related to the structure and markup of the textual part of a CDA document. The CDA standard explicitly leaves the rendering of the document up to the receiving application. The standard allows one to structure the text (e.g. using lists and tables) but doesn't contain a real markup language. The following issues were observed:
  1. Full XHTML in section text: see fragment B1. The standard provides its own light-weight markup language, the use of unconstrained XHTML isn't allowed in CDA R2.
  2. Assumption that <text> or <content> imply formatting akin to the HTML <PRE> (pre-formatted) Tag: see fragment B2. CDA uses standard XML formatting. According to the XML standard breaks and blanks are normalized by the XML processor into single space characters. <br/> or <paragraph> should be used to indicate the start of a new line/paragraph.
  3. Text formatting optimized for one specific stylesheet: The text is structured in such a way (e.g. by using specific keywords) that it is only rendered in a proper human readable fashion when a specific stylesheet is being used. If a different stylesheet is used, the textual information would be displayed in a less human-readable fashion, which may lead to misinterpretation of the document content
Fragment B1:
<b xmlns="http://www.w3.org/1999/xhtml">
   <font face="Arial-BoldMT" color="#442422">
     <div align="left">Patient N</div>	
   </font>	
</b>

Fragment B2:

            <content>Three values:
Red 100
Blue 130
Green 170
            </content>

3.C Textual parts and corresponding entries

This category of issues is related to the relationship between the textual narrative and the related coded entries. The following issues were observed:
  1. Broken links: unresolved references between entries and the textual content.
  2. Entries used as textual elements: see fragment C1. The observation is effectively used as a textual section with an associated date. Technically this is a valid entry, in the spirit of CDA it is however questionable.
  3. Reference within observation.text (ED): see fragment C2. The reference tag, used to reference back to parts of the narrative, isn't allowed in observation.text .

Fragment C1:

<text>January 24th: The patient ...long text snippet...</text>
<entry>
    <observation classCode="OBS" moodCode="EVN">
        <code nullFlavor="UNK"/>
        <text>The patient ...long text snippet...</text>
        <effectiveTime value="20080124"/>
    </observation>
</entry>

Fragment C2:

<text mediaType="text/xml">
   Asthme lors de l'enfance<reference value="antmed-2"/>
</text>.


3.D Data types

This category of issues is related to the use of data types in the XML instances. The following issues were observed:
  1. NullFlavor: Using NullFlavor for mandatory attributes (usually in SETs. This is a problem in CDA where e.g. telecom is a SET<TEL>, and nullFlavors are not allowed in Sets. This can only be resolved in a subsequent release of CDA)
  2. Identifiers: alphanumeric roots in OIDs; and uncertainty on how to deal with GUIDs (root or extension? Both are allowed by the standard).
  3. Identifiers: improperly formed IIs (e.g. root and extension swapped in one example; extension without root)
  4. Codes: Improperly formed CDs (code without codesystem), CD/CE with codeSystem and codeSystemName swapped, and entire CDA documents without a single codeSystem attribute anywhere.
  5. Illegal URLs: "tel: 123.." - a space after the colon - this is not a valid URL. "tel:(555).." - the braces are not a valid URL. Or even not using a URL at all: value="somebody@somewhere.org".
  6. TS: Incomplete use of time zone 200809211008-08 (should be: -0800)
  7. TS: day and month cannot be 0: See fragment D2. Fragment D1 is completely incoherent.
  8. General: Invalid data type substitution (see fragment D3). The data type substitution rules are poorly understood .

Fragment D1:

<effectiveTime value="20031100000000.20040200000000"/>

Fragment D2:

<value="19790000000000"/>

Fragment D3:

<birthTime value="19730220" xsi:type="IVL_TS">
	<width value="33" unit="a"/>
</birthTime>

3.E Coding Issues

This category of issues is related to the use of coded concepts in general, inclusive of the nullFlavor exceptions. The following issues were observed:
  1. SNOMED qualifiers (CD data type): the qualifiers have to be taken from SNOMED if the main concept code is from SNOMED. The use of a different code system for a qualifier isn't allowed.
  2. Invalid UCUM units: e.g. "picog", "md/dl".
  3. Must use allowed code sets. Attributes with CS data types, and attributes that have a CNE binding to a concept domain, use codes that are not part of the defined value set.
  4. Nullflavors: empty XML tags (e.g. <title/>) are not legal: either a value, an explicit nullFlavor, or an xsi:null should be used.
  5. Nullflavors: code instead of nullflavor attribute. For example: code="UNK"> instead of nullFlavor="UNK".
  6. Nullflavors: orginalText. See fragment E1. The use of originalText is only allowed if the nullFlavor= "OTH".
  7. Qualifier (CD data type): see fragment E2. A code is required for a qualifier.

Fragment E1:

<consent>
  <code nullFlavor="NAV" codeSystem="1.2.756.5.30.2.1.1.2">
   <originalText>
Muendliche Einwilligung waehrend Konsultation vom 03.10.2007
   </originalText>
  </code>
  <statusCode code="completed"/>
</consent>

Fragment E2:

<qualifier>
  <name 
     codeSystemName="Radiologian toimenpiteen tarkenteet">
    <originalText>puolisuus</originalText>
  </name>

3.F Extensions

This category of issues is related to the use of models that extend the CDA instances beyond that which is specified by the CDA standard. HL7 version 3 specifies an informal extension mechanism to extend models. The use of this mechanism implies the use of a different XML namespace for the extensions.

Informal extensions were used by 5 projects/21% of all projects. These were related to the use of Digital Signatures, extensions to the current clinical statement model (e.g. by a financial activity) and the identification of the sending and receiving application.

Two projects used the same XML namespace for the extension model - this is not allowed by the CDA standard.

3.G Element sequencings

This category of issues is related to the sequencing of XML elements in instances. The HL7 version 3 standard specifies the exact order in which attributes of a RIM class should be present as XML elements in an instance.

The following issues, probably due to serialization issues of an object structure, were observed:

  1. Wrong XML element order of class attributes (e.g. name before id on a Role class)
  2. Wrong XML element order of associations/ participations.

3.H Post publication infrastructural extensions

This category of issues is related to the versioning issues introduced by the addition of new infrastructural elements after the original publication of the CDA standard. For example: after the publication of the CDA standard the PUB use code was added to the TEL data type.

This leads to versioning issues:

  1. A valid CDA document, created in 2008, won't be valid when tested using the original CDA XML schema as published in 2005.
  2. All implementations of CDA have to take into account that certain infrastructural elements may be extended at some future point in time. Getting hold of the extensions isn't that easy: at the moment these extensions aren't identified and published in an easily accessible fashion.

4. Findings

The required level of validation is related to the level of coding as present in a CDA document. The use of entries adds a level of complexity above and beyond that of the CDA document header and textual sections. With regards to the level of coding: of the examples created as part of one the 23 selected projects:
  • Two projects, or 9% of all projects, contained Level 1 CDA examples with a NonXMLBody.
  • Twenty-one projects, or 81% of all projects, contained Level 3 CDA examples.
  • Of the projects that contained Level 3 examples, 48% contained at least some links between the section text and the entries.
With regards to the use of validation tools such as a XML schema validator and a MIF based validator:
  • Six projects, or 26% of all projects, produced at least one example that didn't validate against the published XML schema.
  • Twenty-one projects, or 81% of all projects, produced at least one example with errors as detected by the MIF based validator. The authors of the two projects that didn't contain any errors stated that they hadn't used a MIF based validator during the development of the examples.
There were a small number of false-positives, i.e. the tools identified errors in the instances that weren't actually errors in the CDA document instances. The underlying errors were present in the CDA XML schema, the CDA MIF or the validator itself.

The most commonly encountered errors are: the use of empty XML elements were nullFlavors should be used, and ill-formed II and coded data types.

5. Recommendations

The issues identified in this paper could have been avoided:
  • The issues related to the human readable part of the document (category A and B) can only be detected by a human reader. The author of a document should make sure the attestable content of the document is present within the document as human readable text. Authors should not assume that a particular stylesheet will be used by the reader of the document.
  • Most of the issues in this paper can be identified by using the available validator tools. Validation using the published CDA XML schema provides a basic level of validation. The issues in categories F and G; and a few of the issues in categories C, D and E were detected using XML schema based validation. The MIF based validator provides a better level of validation than the XML schema - an astounding 81% of projects failed validation at this level. Most of the issues in categories C, D and E were detected by the MIF based validator.
  • The issues defined by category H can probably only be solved if HL7 changes its versioning mechanism.
The MIF based validator wasn't available to the authors of the examples that were reviewed. This, combined with the amount of validation failures, suggests that it is difficult to create valid CDA instance documents based solely on the CDA specification.

It seems safe to assume that the more widely available the validation tool, and the better its ability to detect issues, the more it will be used, and the more compliant the CDA instances will be to the standard. This is supported by the fact that 74% of all projects pass XML schema validation, a technology which is readily available. Compare this with the MIF based validator which hasn't been made widely available yet - with a 19% pass-rate.

When it comes to the creation or validation of CDA instances we'd like to recommend the following:

  • Document authors should make sure the attestable content of the document is present in the form of human readable text.
  • Document creators should use the MIF based validator when authoring documents.
  • Use XML Schematron validation to document/test additional rules as specified by the implementation guide.
  • Test documents using online CDA validator tools, e.g. [6].
  • Validator service providers should collaborate in order to re-use each other's services. This to avoid inconsistencies between the various validators and to increase consistency.
  • HL7 should educate the authors of CDA implementation guides, as well as the creators of CDA software applications as to the functionality offered by the MIF based validator tool. This tool, even though it has been available for quite a while, is virtually unknown in the CDA community. The 19% pass-rate indicates that when it comes to this validation tool the communication with the CDA user community will need to be drastically improved.

Acknowledgement

We'd like to thank those that contributed examples to the CDA archive [1] as well as Diego Kaminker and Alexander Henket for their review of this paper.

References

[1] R.J.Spronk (Ed.), CDA around the world: archive of CDA instance examples, http://www.ringholm.com/download/CDA_R2_examples.zip, accessed on July 14th, 2008.
[2] HL7 Inc., HL7 version 3 standard, 2008, http://www.hl7.org
[3] Open Health Tooling, Instance Editor v0.9, see http://hl7book.net/index.php?title=Eclipse_Instance_Editor.
[4] Oxygen 8.1, SyncRO Soft Ltd., http://www.oxygenxml.com.


About Ringholm bv

Ringholm bv is a group of European experts in the field of messaging standards and systems integration in healthcare IT. We provide the industry's most advanced training courses and consulting on healthcare information exchange standards.
See http://www.ringholm.com or call +31 33 7 630 636 for additional information.