- Contribute transform engine for SRX rules to CLDR for the Dec 2011 release. --> Kevin (Open)
* CLDR meeting on Nov 30: SRX custom break integrate with CLDR existing rules.
- Describe ULI has first focused in segmentation.
- Create exception rules and normative behavior: industry tools have adopted SRX rules for producing text.
- Condition before and after the segmentation process.
- Each vendor uses a different SRX rules. Unicode regex does not support sentence break.
- Need to look at how we can allow these declarative exceptions that can also benefit from normative behavior of UAX#29.
- Use UAX#29 as basis and only insert them when need to introduce exceptions.
- Summary: something that takes SRX rules and transform to insert into the UAX#29 base.
- Goal : use current merge rules and identify UAX#29 missing features and use the transform engine to take SRX input and feed that as exception to the base
- Schedule: contributed SRX review and each of the rules to see which ones have already been provided support by UAX#29.
* Want to have as few rules as possible not to have regex to overaccept the rules.
* Would like to add rules that positively affect the behavior of the output
* A given rule changes the behavior of UAX#29, e.g. something that is redundant. For each given locale, create a custom language break iterator.
* Language base exceptions should be small and beta level in the beginning.
- Still aiming for CLDR v21 deadline.
- Get everyone to provide input to the SRX file contributed Rodolfo. (Open)
- Posted validated output on the ULI site. Will post the URL:
- Provide IBM input on additional language supplementary input to the SRX default file --> Helena (Closed)
- David to connect with Helena to solicit input about ULI PR with the Multilingual-Web LT activities. ---> David Filip (Closed)
- New interested parties are: Intel and Adobe.
- Arle still owns the separator character proposal. Needs more fleshed out.---> Arle Lommel (Open)
- Liaison to CLDR TC --> Open
* report back next week.
- Default SRX rules: http://uli.unicode.org/home/uli-documents/merged_srx.zip?attredirects=0&d=1.
Feedback so far:
* Good portion of the data is not useful for "usual content". Vetting needed.
* Standardization should also be on the linguistic construct: let's look at the CLDR current data
* Using beyond the "rule" element of SRX standards
Mati: abbreviation symbol is not a dot and not the end of a word. For terminating sentence, a "full stop" is used. The abbreviation symbol is different from that.
Kevin: there is a line break subclass but not a sentence break.
- Kevin: software format fields. Software strings are issues.
- David: To the view of Rodolfo and Yves, might be good to generate some XLIFF comments. Would advise this committee to deal with the SW strings sooner than later. The main focus of XLIFF is human translator. There is an important criteria on how content should be ordered.
- Kevin will provide some examples.