xml:tm – a radical new approach for a Translation Platform
- Translating XML documents
XML has become one of the defining technologies that is helping to reshape the face of both computing and publishing. It is helping to drive down costs and dramatically increase interoperability between diverse computer systems. From the localization point of view XML offers many advantages:
- A well defined and rigorous syntax that is backed up by a rich tool set that allows documents to be validated and proven.
- A well defined character encoding system that includes support for Unicode.
- The separation of form and content which allows both multi-target publishing (PDF, Postscript, WAP, HTML, XHTML, online help) from one source.
Companies that have adopted XML based publishing have seen significant cost savings compared with proprietary systems. The localization industry has also enthusiastically used XML as the basis of exchange standards such as the ETSI LIS (previously LISA OSCAR) standards:
TMX* (Translation Memory eXchange)
TBX* (TermBase Exchange), SRX* (Segmentation Rules eXchange) standards
GMX/V*(Global Information Management Metrics eXchange Volume.
XLIFF* (XML Localization Interchange File Format)
TransWS* (Translation Web Services).
W3C ITS* (Internationalization Tag Set).
Another significant development affecting XML and localization has been the OASIS DITA (Darwin Information Technology Architecture) standard. DITA* provides a comprehensive architecture for the authoring, production and delivery of technical documentation. DITA was originally developed within IBM and then donated to OASIS. The essence of DITA is the concept of topic-based publication, construction and development that allows for the modular reuse of specific sections. Each section is authored independently and then each publication is constructed from the section modules. This means that individual sections only need to be authored and translated once, and may be reused many times over in different publications.
A core concept of DITA is that of reuse at a given level of granularity. Actual publications are achieved through the means of a ‘map’ that pulls together all of the required constituent components. DITA represents a very intelligent and well thought out approach to the process of publishing technical documentation. At the core of DITA is the concept the ‘topic’. A topic is a unit of information that describes a single task, concept, or reference item. DITA uses an object-orientated approach to the concept of topics encompassing the standard object oriented characteristics of polymorphism, encapsulation and message passing.
The main features of DITA are:
- Topic centric level of granularity
- Substantial reuse of existing assets
- Specialization at the topic and domain level
- Meta data property based processing
- Leveraging existing popular element names and attributes from XHTML
The basic message behind DITA is reuse: ‘write once, translate once, reuse many times’.
- xml:tm
xml:tm* is a radical approach to the problem of translating XML documents. In essence it takes the DITA concept of reuse and implements it at the sentence level. It does this by leveraging the power of XML to embed additional information within the XML document itself. xml:tm has additional benefits which emanate from its use. The main way it does this is through the use of the XML namespace syntax. Originally developed as a standard under the auspices of LISA OSCAR, xml:tm is now an ETSI LIS standard. In essence xml:tm is a perfect companion to DITA – the two fit together hand in glove in terms of interoperability and localization.
At the core of xml:tm is the concept of “text memory”. Text memory comprises two components:
- Author Memory
- Translation Memory