XMLDocument.ToString specify encoding character set

I would like XMLDocument.ToString to encode the String into a specified character set. I believe the default is UTF-8. How can I use ISO 8859-1 for example, such that the horizonal ellipsis (…) is returned as the entity …?

I understand XMLDocument has the encoding property, and perhaps this is used to set the encoding type, but the documentation doesn’t specify valid string value options. I tried “ISO 8859-1” but it made no difference to the ToString output.

Any help would be appreciated.

Probably something like:

XMLDocument.ToString.ConvertEncoding (Encodings.xxxxx)

and you’ll need to discover what the correct xxxxx is.

I tried this with several encoding types but couldn’t get it to do what I am looking for.

I think you need to handle such a conversion yourself, with String.Asc you get the code/Unicode of a character.

If I’m understanding correctly, what you are trying to do is force the document to use numeric character references instead of non-ASCII characters.

XML engines don’t do that. They produce UTF-8 or UTF-16, period. To get the numeric references, you’ll need to first get the UTF-8 out of the XML document, then manipulate it with the String functions.

To do this, here’s what I would do (the variable Contents being the String coming out of the XMLDocument):

Dim Seeker, Changer as RegEx
Dim rxm as RegExMatch
Dim Completed as Boolean
Dim currentChar as String
Seeker = new RegEx
Changer = new RegEx
Seeker.SearchPattern = "[^\000-\177]"
Changer.Options.CaseSensitive = True
Do until Completed
   rxm = Seeker.Search(Contents)
   If rxm is nil then
      Completed = True
   Else
      currentChar = rxm.SubExpressionString(0)
      Changer.SearchPattern = currentChar
      Changer.ReplacementPattern = "&#"+Str(currentChar.Asc)+";"
      Contents = Changer.Replace(Contents)
   End If
Loop
Return Contents     
3 Likes