Read and analyze word document

Did you consider calling pandoc (as an external tool) to make initial word to html conversion?

https://pandoc.org/index.html