Hi, all!
I'm developing library online catalog. The catalog will record bibliographic information, subject headings, links (for digital material) of books, websites, journals, DVDs, etc.
The catalog my library is currently using stores records in UTF-8. The cataloging standards require storage of a title in both its original script and its transliterated script; and a library may translate the title into the language of the native speakers for its patron's needs. The storage is fine; it's the retrieval that is flummoxing us.
If we see a title like the following, I make sure I record all three versions of the title, original, translated (into English), and transliterated.
242 12 a A Japanese-English-Chinese dictionary of computer terms / c Hitachi. Omika Factory. y eng
245 10 6 880-01/$1 a Wa-Ei-Chu taiyaku denshi keisanki yogoshu / c Hitachi. Omika Factory.
880 10 6 245-01 a ????????????? / c Hitachi. Omika Factory.
(Ignore the codes. They are library codes analogous after a fashion to XML tags. See MARC cataloging if you're interested or ask me.)
The problem is that we can look up the title in the Romanized forms but not the kanji.
The library catalog is running on a Linux server; and the catalog itself stores the records in a MySQL database. The first filing indicator of the 242, 245, and 880 fields tells the catalog to index that version of the title and link it to this record; that's the "1" after the 242, 245, and 880. In order to get the catalog to retrieve the kanji, we have to add to the Linux box an extension called Elasticsearch; and then grapple hooks between Elasticsearch and the catalog. This problem occurs in other writing systems as well, e.g., Cyrillic. If I catalog "????? ? ???", I'll include the translation "War and Peace", and the transliteration "Voina i mir" just so that we and our patrons can retrieve the work.
But I began thinking about Linux in my own little project. If I develop a Xojo application and compile a Linux version, will I have this type of retrieval problem regarding multilingual writing systems? Does Xojo's database engines automatically handle storage and retrieval of multiple writing systems or will I have to build in reliance on something like Elasticsearch? I know I'll have to make sure the interface recognizes and processes multilingual writing systems, but will the database be able to retrieve the non-Roman material as long as I store the data in UTF-8?
Thanks for your help, all.