encodings in app Pages

Carlo_Rubini · March 19, 2019, 12:11am

Hello,
I have an app to search words in documents; it works all right with usual .doc, .txt etc. documents, but it fails with documents created with app Pages, when the text contains indic words.
For instance, given the text “house ??? house ???” I can find “house”, but I cannot find any instance of ???.
Opening Pages’ document in TextWrangler, I see that the text above appears as:
"!?&&?0=house ?Ƈ懶? v*

Now I wonder, what encoding is that? Since the word is a Bengali one (using a Bangla-Unicode font), I even tried setting define/convertencoding to macBengali, but with no results.
BTW. I tried inserting Greek words, and the app finds them; but inserting Hebrew words I get inconsistent results.

Any idea how to proceed? Suggestions appreciated. Thanks.

Norman_Palardy · March 19, 2019, 12:30am

Does text wrangler happen to show you that a pages document is a zip file in reality ?
Dropping one on BBEdit shows a pile of internal bits
And the actual data of the document is an unknown format to me … whatever an IWA file is

Carlo_Rubini · March 19, 2019, 1:59am

BBEdit shows the same pile of internal bits as TextWrangler.
Googling .iwa I see that such files are described as compressed files.
Yet my app detects Roman-words, but fails on Bengali.
Thanks for answering.

Norman_Palardy · March 19, 2019, 4:17am

a small bit more digging now that i’m done what I was working on … and lo and behold !

https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#iwa

Carlo_Rubini · March 19, 2019, 10:05am

@Norman Palardy [quote]https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#iwa [/quote]
Based on that knowledge, is there a way to extract/decipher the text data in Xojo?

Norman_Palardy · March 19, 2019, 2:52pm

I would think so
But its not “simple”

Carlo_Rubini · March 19, 2019, 3:08pm

And since I like “simple” (read: I understand only simple things), I leave it alone.
Thanks.