ULI Subcommittee

ULI

Tech Site | Site Map | Search

Note

In January 2018, the ULI-TC was restructured as ULI-SC, a subcommittee of CLDR.

In July 2020, the ULI-SC was disbanded after not having convened for some time.

All site content and source repositories remain online, but in read-only archival state. Thank you for your interest in ULI.

Unicode Localization Interoperability Subcommittee

The Unicode Localization Interoperability CLDR Subcommittee (ULI) works to ensure interoperable data interchange of critical localization-related assets, including:

Translation memory: A translation memory system stores words or phrases that have been tanslated previously. The use of translation memory ensures the consistency of translated content, accelerates the speed of translation, and also reduces the cost of repeated translation requests.

Segmentation rules: Segmentation rules define the way to segment text for translation or other text processing. The rules are used in conjunction with translation memory to create memory segments or identify matches within the source content of existing translation memories.

Translation source strings and their translations: Translation source is natural language text, typically with markup, that will be translated into another language. The translated strings are the results of translating the source strings while preserving the markup.

Word Count: Defining best practices around how to best count words in the context of translation interchange.

Whether a translation request is completed by human or machine, these assets play a vital role in the overall translation process. Interoperable interchange of these assets reduces errors, lowers costs, and improves throughput.

Charter and Scope

Problem Statement

Localization Industry has problems in data interchange between service end points. With ULI, the intent is to solve the following problems:

Inconsistent application, implementation, and interpretation of standards

Lack of clear requirements for localization data interchange

Localization Interoperability Definition

Ensuring reliable localization data interchange through consistent implementation of localization standards and file formats.

Charter and Objectives

ULI will be the expert group with representatives from localization service consumers, localization service providers, tools/technology experts, academia and standards organization to advise on interoperable data interchange of critical localization-related assets

Objectives

Optimize the service time between systems through consistent interpretation and adoption of localization data interchange standards

Mature existing standards and data references by gathering requirements for extensions of localization interoperability standards

Reduce cost through best practice guidelines by providing open reference implementation of the extensions and profiles

Establish reference implementations or extensions to improve the usefulness of localization interoperability standards

Relationship to owners of Industry standards

Existing standards will not be changed; the goal is to extend them if needed.

The TC will engage with standard organizations as needed to influence existing/future standards.

The TC will contribute to existing standards through an open platform.

All TC activities are guided by the Unicode Consortium procedures.

See PDF: ULI Charter

Process

Introduction

This document describes the Unicode Localization Interoperability Technical Committee, and its process for specification definition, interchange format and examples. The process is designed to be light-weight: in particular, the meetings are frequent, short, and informal. Most of the work is by email or phone, with a database recording requested changes.

For more information on the formal procedures for the Unicode CLDR Technical Committee, see the Technical Committee Procedures for the Unicode Consortium.

Specifications

Language Segmentation

The UAX 29 Unicode Text Segmentation defines the guidelines for determining default segmentation boundaries between certain significant text elements. The interchange specification SRX (Segmentation Rule eXchange) is used as the actual exchange format for system-to-system communication of the behavior of text segmentation associated with any content. An actual example of the SRX at language level is available as part of the CLDR project.

See CLDR Process for more information on the vetting and submission of language segmentation input.

Translation Memory

The ability to interchange memories as static content within a translation request life cycle is defined by TMX (Translation Memory eXchange). The scope of this work is under discussion.

Public Feedback

The public can supply formal feedback into ULI by filing a Bug Report or Feature Request. There is also a public forum for questions at ULI Mailing List (details on archives are found there).

Anyone can also asked to be added to a list that will receive notification of new bugs, so they can track issues if they want. Anyone can also to reply to any bug report to add comments or questions.

There is also a members-only ULI mailing list for members of the ULI Technical Committee.

Meeting Minutes

Minutes are archived here

Profiles of Use

The primary focus of the ULI Technical Committee will be to establish profiles of use for XLIFF, TMX, and SRX. The committee will develop and publish specifications that document specific usage conventions that can be shared for interoperability. This will improve data exchange through more consistent implementations and enhance the usefulness of these three standards.

Extensions to Established Standards

The secondary focus of the ULI Technical Committee will be to gather requirements for future extensions to XLIFF, TMX, and SRX. The ULI committee will develop reference implementations, as necessary, to demonstrate the feasibility of any proposals for future standardization.

TMX: Translation Memory eXchange (TMX) is an XML-based standard for the exchange of translation memory data created by computer-aided translation and localization tools. TMX was developed and maintained by LISA, the Localization Industry Standards Association, until LISA became insolvent in 2011. The format allows easier exchange of translation memory between tools and/or translators with little or no loss of critical data.
See http://en.wikipedia.org/wiki/Translation_Memory_eXchange

SRX: Segmentation Rules eXchange (SRX) is an XML-based standard that was maintained by LISA, the Localization Industry Standards Association. It provides a common way to describe how to segment text for translation and other language-related processes.
See http://en.wikipedia.org/wiki/Segmentation_Rules_eXchange

XLIFF: The XML Localisation Interchange File Format (XLIFF) is maintained by OASIS. XLIFF is the industry standard for exchanging localization data (translation source and translated results) between service users and service providers.
See http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff

Word Count

One of the challenges of translation interoperability is objectively measuring the difficulty of a particular translation workload. A common metric used is the word count. However, methods for counting words vary across different systems and languages. Some examples: Thai is written without space characters between words, as is Japanese and Chinese. Should numbers be included or not included? Are Mongolian suffixes considered a separate word or not? (Note that the GitHub repository is now archived.)

You may see the past discussion on this Github page.

Publicly Available Specifications

These documents are archived for historical purposes and do not specify a Unicode standard. These documents are already publicly available online elsewhere, are are only hosted on the Unicode ULI site as a convenience.

GMX-V ( gmx-v10.xsd )

SRX 2.0

Participation

For information on how to join the ULI and get involved in its work, contact the Unicode Consortium with the contact form and ask about the ULI.

To become a voting participant in the work of the ULI committee, join Unicode in one of the three voting categories of membership: Full, Institutional, or Supporting. Learn about the benefits of joining.

The officers of the ULI will establish the meeting schedule. Meetings are to be conducted by conference call to enable broad participation by members of the industry.

Data Files
ULI Data Files (restricted access)
Officers

The Technical Committee Officers were:

Chair: Steven R Loomis

Vice Chair (Interim): Yoshito Umaoka (IBM)