I would like XMLDocument.ToString
to encode the String into a specified character set. I believe the default is UTF-8. How can I use ISO 8859-1 for example, such that the horizonal ellipsis (…) is returned as the entity …
?
I understand XMLDocument has the encoding
property, and perhaps this is used to set the encoding type, but the documentation doesn’t specify valid string value options. I tried “ISO 8859-1” but it made no difference to the ToString
output.
Any help would be appreciated.
Probably something like:
XMLDocument.ToString.ConvertEncoding (Encodings.xxxxx)
and you’ll need to discover what the correct xxxxx is.
I tried this with several encoding types but couldn’t get it to do what I am looking for.
I think you need to handle such a conversion yourself, with String.Asc you get the code/Unicode of a character.
If I’m understanding correctly, what you are trying to do is force the document to use numeric character references instead of non-ASCII characters.
XML engines don’t do that. They produce UTF-8 or UTF-16, period. To get the numeric references, you’ll need to first get the UTF-8 out of the XML document, then manipulate it with the String functions.
To do this, here’s what I would do (the variable Contents being the String coming out of the XMLDocument):
Dim Seeker, Changer as RegEx
Dim rxm as RegExMatch
Dim Completed as Boolean
Dim currentChar as String
Seeker = new RegEx
Changer = new RegEx
Seeker.SearchPattern = "[^\000-\177]"
Changer.Options.CaseSensitive = True
Do until Completed
rxm = Seeker.Search(Contents)
If rxm is nil then
Completed = True
Else
currentChar = rxm.SubExpressionString(0)
Changer.SearchPattern = currentChar
Changer.ReplacementPattern = "&#"+Str(currentChar.Asc)+";"
Contents = Changer.Replace(Contents)
End If
Loop
Return Contents
3 Likes