posted Jan 18, 2013, 11:22 AM by Helena Shih
[
updated Jan 18, 2013, 2:26 PM by Steven R. Loomis
]
The latest ULI segmentation exception has been posted in SVN: http://unicode.org/uli/trac/browser/trunk/abbrs including: - Reference to CLDR date/month and other necessary symbols
- Available in JSON (json-cooked) format. The XLS files contain the input data including exception type and frequency.
The latest demo is also updated to reflect the changes above. Try it out yourself at http://demo.icu-project.org/icu-bin/icusegments. Things to try: - Compare ULI vs. non-ULI version of the English sample text. Note breaks after "Mr." in the non-uli format.
- For German ULI, try the string "
Im Okt. München war kalt. " ( Something like, in October, Munich was cold. ) Without ULI the sentence breaks after the abbreviation Okt (for Oktober). With ULI and with CLDR data, "Okt." is an exception.
|
|