Other languages
Computer Science
Meta
Advanced
Factor's Unicode library is in the basis/unicode/
directory, along with the encoding support in core/io/encodings/
and basis/io/encodings/
. The library is pretty complete but a few tasks of varying complexity remain to be implemented. We would expect a student to at least attempt all of them over the summer, but depending on the student's skills, only doing the easy tasks would still be acceptable.
We want a normalized-stream
type which wraps an underlying character stream, to convert output to normalization form C or normalization form D. Support for normalization is already in unicode.normalization
, but the stream needs to be done.
Unicode defines various code points for digits other than the usual ASCII 0..9. Parsing numbers with these code points would be useful. Perhaps this could even be integrated with the roman
library for a high-level number parser.
The encoding API for converting strings to byte arrays and vice versa is mostly done (http://docs.factorcode.org/content/article-io.encodings.html)
The algorithm is detailed in the Unicode 5.1 specification.
Support for bidirectional text
What do we need from here?
Collation tailoring, break tailoring
Performance/cleanup/better data structures
The student gains experience with internationalization and localization.
Factor's Unicode support, which is already more advanced than most languages, would be world-class after these changes.
This revision created on Fri, 13 Mar 2009 04:13:31 by slava