How to read Excel and Word files content as text?

Hi all.

Is there a way to read the content of Microsoft Excel and Word documents?

Many thanks!

On Windows I’d expect you can use the office automation plugins

I need for mac… I’m sorry but I forgot to specify.

Anyway thank you Norman!

On the Mac you’re probably relegated to using Apple Script in some manner since MS has not updated any SDK we could use to craft Office plugins on OS X since the PowerPC days

Ok, Norman… You are saying that for now is not possible, right? My app is available on the mac app store too… so no ppc code is allowed there…

MBS XL Plugin + libxl license and you can read/write Excel files without Excel installed.

Maybe Libre Office or Open Office for Mac can help for Word and Excel files.

Not just for now.
Until / unless MS produce a new SDK for OS X you’re out of luck as far as our plugins go
There may be others but I can’t vouch for how well they do / do not work

Many thanks Norman for your clarification… I’ll take a look in the Christian’s plugins as he suggested.
Thanks to Christian too.

MBS XL Plugin work fine! Thanks Christian!

Any suggetion for .doc documents?

If Word is installed on the client, you can use Apple Script to return the text.

tell application "Microsoft Word" activate open file name "Mac HD:Users:Sander:Documents:fox.docx" tell active document content of text object end tell end tell

This opens a word doc called 'Fox.docx" and returns the text content
Don’t know if you can start Word in a hidden manner like in Windows.

Another solution is to parse the docx file yourself, as it is an XML document.

Isn’t the spec for the Office xml file thousands of pages long?

6000 or so

Oh god!
But my needs is only to read all the text inside, and does not matter if there is any tags or other “codes” indide. I need to know if a document contains a string or not… So, do you think i can read all document as string? Can be so easy?

for that you can load the xml content, walk through tree and pick all XMLTextnodes to collect the text fragments.

Does not look easy, from what I have seen when opening a docx file into a text editor : text seems to be encrypted :frowning:

.docx is not encrypted they are zip files. The XML is within the zip.

Many thanks Christian and Ian for clarification! It seems easier than I thougt