I have an app to search words in documents; it works all right with usual .doc, .txt etc. documents, but it fails with documents created with app Pages, when the text contains indic words.
For instance, given the text “house ??? house ???” I can find “house”, but I cannot find any instance of ???.
Opening Pages’ document in TextWrangler, I see that the text above appears as:
"!?&&?0=house ?Ƈ懶? v*
Now I wonder, what encoding is that? Since the word is a Bengali one (using a Bangla-Unicode font), I even tried setting define/convertencoding to macBengali, but with no results.
BTW. I tried inserting Greek words, and the app finds them; but inserting Hebrew words I get inconsistent results.
Any idea how to proceed? Suggestions appreciated. Thanks.
Does text wrangler happen to show you that a pages document is a zip file in reality ?
Dropping one on BBEdit shows a pile of internal bits
And the actual data of the document is an unknown format to me … whatever an IWA file is
BBEdit shows the same pile of internal bits as TextWrangler.
Googling .iwa I see that such files are described as compressed files.
Yet my app detects Roman-words, but fails on Bengali.
Thanks for answering.
a small bit more digging now that i’m done what I was working on … and lo and behold !
@Norman Palardy [quote]https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#iwa [/quote]
Based on that knowledge, is there a way to extract/decipher the text data in Xojo?
I would think so
But its not “simple”
And since I like “simple” (read: I understand only simple things), I leave it alone.