Cleaning up html with TidyMBS

I have a bit of an odd problem when exporting html data to Filemaker. My html usually is pretty crappy. If there are no

tags then the first paragraph shows up in Filemaker with a smaller font.



If I add some

tags then the html shows up correctly. I tried to clean up the html with TidyMBS but both OptionOutputXml and OptionOutputXhtml don’t really improve the html:

 dim theTidy as new TidyDocumentMBS
theTidy.OptionOutputXml = true
theTidy.OptionForceOutput = true
call theTidy.SetIntegerOption(TidyOptionIdMBS.TidyWrapLen, 0)
theTidy.OptionShowWarnings = False
theTidy.OptionShowErrors = 0
call theTidy.SetInputCharacterEncoding("utf8")
call theTidy.ParseString(Bottorf)
call theTidy.CleanAndRepair
call theTidy.SetOutputCharacterEncoding("utf8")
dim temp as String = theTidy.SaveString

Shouldn’t SetOutputCharacterEncoding result in an utf8 encoded string? Instead the string has a nil encoding.

What can I do to improve the html?

Here is the original html:

@Christian_Schmitz : any idea?

SaveString always returns nil encoding.
Please use defineEncoding to mark the string as UTF-8.

I’ll still try to fix it by having the function look on the encoding you set to match it for the Xojo string.

And if it’s not possible, maybe just make it UTF8 by standard? :slight_smile:

It’s a bit odd that after using SetOutputCharacterEncoding(“utf8”) the encoding is nil.

But why is there not valid xml o xhtml after using OptionOutputXml or OptionOutputXHtml?

I found a solution for Filemaker by using <p></p> in front of the html. Whatever Filemaker is doing here…

As said, I’ve changed it for next version.
So if you do SetOutputCharacterEncoding with utf8, the Xojo string will also be marked as UTF-8.