Does anybody know if there’s a library around for parsing the ePub (electronic book) file format? It’s in fact a zipped file with XML for structure and HTML for text but there is so much inconsistency possible that parsing it by hand is a pain.
I do not know if there are any libraries for parsing ePub’s. However I am writing my own ePub creating application and for my research I am using the book "A hands-on-guide to Epub2 and Epub3. It is written by Jarret W. Buse and a very good source of information.
The book is intended for people who like to write manual epubs, however when you create a few epubs with the help of this book, you will understand better how it can be done by automating this process yourself.
Other books I am using are “ePUb3 best practices” and “Accessible Epub” published by O’Reailly, surely not easy to read books but they contain a wealth of information. Bot O’Reilly books have an epub version, the first book I mentioned by my knowledge has not an epub version.
I plan to use Monkeybread “compression” plugin and Bkeeney Formatted Text Control. Everything else in between is Xojo code and my own epub libraries (2 and 3) which contain the tags possible. I am planning to write my own epub creator software. My software is only in an early planning state.
I agree with you about the inconsistencies in the epub format. They make it very difficult indeed. They also lead to the fact that some publications look good on one reader and a mess on other readers. The question here is, can you participate on all those inconsistencies? I am sure it is very challenging but even more interesting, at least for me. I am convinced it can be done with Xojo.
I regret I cannot give you more help on this subject. Your knowledge on this matter is probably much greater than mine. I create XHTML pages manually by using TopStyle 5. When creating an epub by hand I also spend most of my time on coding and structuring the correct epub format. Many of these tasks can be automated by Xojo. To give you an idea, I enjoy coding XHTML by hand while most others use visual tools.
Wish you very much success and all the best.
Thanks for your elaborate reply. I will take a look into the books you mentioned. Nevertheless I wonder why the IPDF choose a format that can be so ambigious.
I have been working my way to the ePubs with Xojo, but parsing the metadata (tags) is hard.