Unicode compatibility equivalence in Xojo

I’m looking to find the “compatibility equivalence” of a unicode string. There’s a lot of details about this at UAX #15: Unicode Normalization Forms. Essentially, I want to turn “𝕁𝕖𝕥” into “Jet”, and unicode has a system for this already. I’m “just” looking to see if anybody has any ideas about where to start.

This looks non-trivial to implement.

If you’re building this for an app or service which will run on linux, maybe uconv would be an option?

https://linux.die.net/man/1/uconv

btw - I based my “non-trivial” call when I was browsing the normalization code which is part of OpenJDK:

Interestingly, both the uconv man page and this code reference old IBM copyrights:

Copyright © 2000-2005 IBM, Inc. and other (uconv)
Copyright (C) 2009-2014, International Business Machines (open jdk)

So some original work done by IBM might be the “gold standard” and the basis for most implementations.

NOTE: This is just idle speculation based on some cursory searching of the web, it may be just noise, but if felt relevant.

I thought Xojo used ICU, which has normalisation methods. Accessing them may not be simple, but it sounds better than trying to do it yourself.

Looks like Kem already solved this one. His M_String module includes an M_Norm module to do exactly this job.

2 Likes