Encountered invalid character. - HOW TO INTERCEPT IT?

Hello,

I’m having a hard time figuring out where to correct my code.

I get data from a MySQL database and encode it to UTF8 whenever text is sent to screen. I keep getting RuntimeException whose contents is "Encountered invalid character. ".

Any hint about what can be happening? Where to try and intercept errors? I tried putting try catch blocks in some locations where data is sent to weblistboxes, textfields or webpopupmenus. Nothing. Exception keep bubbling up and first location I can intercept it is Session.UnhandledException.

Help and ideas are appreciated.

The text from the database is probably not valid UTF-8. Convert it to straight bits to check. In UTF-8 encoding, the bits for code points below 128 will be 0. For 128 and above, the bits for the first byte will 110, 1110, or 11110 indicating the total number of bytes required to store that character. The trailing bytes will all start with 10. For example, if the first byte starts with 1110, that means there are three bytes for that character and you would expect the next two bytes to start with 10.

Or use Encodings.UTF8.IsValidData( dbString ) to check if you just want a general idea of whether it will work or not.

In any case, if the bytes of the string do not conform to the rules of the chosen encoding, you will end up with an invalid string.

1 Like

BTW, I just did a session on this at XDC2019, so you might want to check that video.

Did you tell Database to send UTF-8?
Do you use DefineEncoding?

Or do you use MBS Xojo SQL Plugin to get encodings right automatically?

Thanks for the ideas.

I do DefineEncoding everything and the database itself sends UTF-8.

What is puzzling me is that this issue appeared after a significant update (new features) on the app I hae been working for a long time… The issue appears even when I send and empty webdialog to screen…

I’m running some tests and will report back as soon as possible.

Thanks again!

Since my data comes from a remote host, it could be anything even though, in principle, the other end tells me what it is. So I read it into a memoryblock, and then do a lossy conversion from data to UTF-8.

[code]dim tmpmmb As Xojo.Core.MutableMemoryBlock, tencch As xojo.Core.TextEncoding, outp as text

tencch = xojo.Core.TextEncoding.FromIANAName (“UTF-8”)
outp = tencch.ConvertDataToText (tmpmmb, true)[/code]

This at least means that if the data is mostly or almost all UTF-8, the local user will get it most of it. In your case, something like the above may help in finding which characters are rubbish.

Thank you all for your help and feedback.

I finished the investigation and the conclusion is rather interesting…

The problem is in the CODE itself, NOT the data coming from the database!

This is how I proceeded to get to this conclusion:

1 - In all parts of the code of the webpage where it is supposed to sent data to the browser, I placed and Exception interceptor, something like this (the example below is a method in a custom WebListBox that takes a recordset and populates it):

[code]Public Sub Populate(rs as RecordSet)
Dim i_count As Integer

If ((rs = Nil) Or (rs.BOF AND rs.EOF)) Then
Self.DeleteAllRows
Return
End If

i_count = rs.FieldCount

while not rs.eof
Self.addRow “”// add a new row
for i as integer = 1 to i_count
Self.cell(self.lastIndex, i-1) = rs.idxField(i).stringValue.DefineEncoding(Encodings.UTF8)
next
rs.moveNext
wend

Exception err
Msgbox(“ALARM - ERROR IS HERE!”)
Break

End Sub
[/code]

2 - I also placed Break at Session.UnhandledException and App.UnhandledException.

3 - I made several tests and the Exception ALWAYS bubbled up to Session.UnhandledException. The exceptions scattered along the code where data was being sent to the screen were never fired.

After pulling a lot of hair and lose several hours of sleep, I remembered that the web app WAS working for almost a full year before presenting this “Encountered invalid character” issue. That made me think about it and I traced back what I did with the code that could have raised the issue. I had implemented several new features and streamlined code.

Nothing wrong there.

I also thought about adopting some code revision tool such as github (I looked into BitBucket actually). In order to make it work, I learned that the code should be saved as a “text file”, using the “Xojo Project” format in place of the “Xojo Binary Project” I’m used to. So I saved it and opened one of the resulting files with Notepad++ only to take a peek at it.

As this move to BitBucket is only a plan at this moment, I closed the file on Notepad++ and saved it back to “Xojo Binary Project” on the IDE. This is how I suspect code was contaminated with non-UTF8 characters… I might have changed something on Notepad++ before closing it.

In order to test my new hypothesis I saved the project as “Xojo Project” again, opened it on Notepad++ (again) and started looking for strange characters. Since the app is aimed at Brazilian users, which speak Portuguese, there are lots os opportunities for strange characters to appear, such as [, , , ] and so forth . Unfortunately the code is way to long to inspect… . The webpage that first raised the issue is about 8000 lines long on Notepad++. Not possible to inspect it manually, even using Notepad++ excellent search features.

I thought about writing a small desktop app using @Tim Streater’s idea, but that was not the fastest approach and I really don’t need computer forensics here (but the idea is cool!), so I decided instead to revert back to a previous code, implement the new features, review it and run on Debug mode.

And…

It ran smoothly…

So, I can conclude that the “invalid character” was in some label, title or any other part of the code that I might have messed with.

Any thoughts? Is that possible? Should I mark this as closed?

You could use Thomas Tempelmann’s Arbed to look at the project.

http://www.tempel.org/Arbed/

I several times found and fixed file corruption with it, and in my opinion it should be in every Xojo user’s toolbox.

Hello @Markus Winter ! Thanks for the tip. I just downloaded Arbed and take a look intoit. It looks tidy and organized, bjut I couldn’t figure out how to fix files with it? How should I proceed? Line by line search?

Got Arbed just now and I’m playing with it.

It is AWESOME! One of those moments you get yourself thinkng “how could I have lived without it until now?”

Thanks again @Markus Winter for the tip an @Thomas Tempelmann for such an awesome tool!

[quote=436836:@Markus Winter]You could use Thomas Tempelmann’s Arbed to look at the project.

http://www.tempel.org/Arbed/

I several times found and fixed file corruption with it, and in my opinion it should be in every Xojo user’s toolbox.[/quote]
Sadly, Arbed can’t open my 2019 Xojo project files…
I’ve this “encountered invalid character” error as well; it happened after I added several functions to my web app (exactly like Leonidas). I haven’t added characters “more weird” than the previous-working version of the app. I read files often in this app, so figuring out the problem is a nightmare (can’t debug to a server…).

Hello @Arnaud Nicolet ,

I’m sympathetic to you, as I also felt the harsh frustration this situation brings. Unfortunatley the only solution I had was to recover a previous version of the code and rewrite it.

When that happens (not the first or last time I had to rewrite code), I always tell myself that the second time a code is written is always best than the first. As a Christian I take comfort on prophet Haggai’s book chapter 2 verse 9 words:

[quote=467358:@Leonidas Brasileiro]Hello @Arnaud Nicolet ,

I’m sympathetic to you, as I also felt the harsh frustration this situation brings. Unfortunatley the only solution I had was to recover a previous version of the code and rewrite it.

When that happens (not the first or last time I had to rewrite code), I always tell myself that the second time a code is written is always best than the first. As a Christian I take comfort on prophet Haggai’s book chapter 2 verse 9 words:[/quote]
Hello Leonidas,

I agree. Often, rewriting the same code (even application) a second time, you know your previous mistakes on the subject and you start by thinking differently (which, in turn, can make an application significantly different).

I’m in the process of resolving the issue the other way around: I start with the whole (most recent) project and remove “a lot” of things I’ve added since it worked, making sure (1) what I remove doesn’t crash the program before I can test and (2) is significant enough to be uploaded to my server (even if I had a decent Internet connection, which I haven’t, it’s a pain to cycle thru that everytime). If it works, I uncomment a “half of what I removed; otherwise, I comment more, keeping in mind what cannot be the problem.
I’ve spent more than 1 day and half about this problem; I’m now aware it happens with ListBox.AddRow File.DisplayName. I’m now going to find which filename is making troubles inside the folder I’m enumerating… The final fix will depend on the result.