ULI Segment Exceptions Posted in SVN and Demo Updated

posted Jan 18, 2013, 11:22 AM by Helena Shih   [ updated Jan 18, 2013, 2:26 PM by Steven R. Loomis ]
The latest ULI segmentation exception has been posted in SVN: http://unicode.org/uli/trac/browser/trunk/abbrs including:
  • Reference to CLDR date/month and other necessary symbols
  • Available in JSON (json-cooked) format.  The XLS files contain the input data including exception type and frequency.
The latest demo is also updated to reflect the changes above. Try it out yourself at http://demo.icu-project.org/icu-bin/icusegments.

Things to try:
  • Compare ULI vs. non-ULI version of the English sample text. Note breaks after "Mr." in the non-uli format.
  • For German ULI, try the string "Im Okt. München war kalt."  ( Something like, in October, Munich was cold. )   Without ULI the sentence breaks after the abbreviation Okt (for Oktober).  With ULI and with CLDR data,  "Okt." is an exception.