WhAcKy ChArAcTeRs (from a novice)

I have some text I must’ve downloaded somewhere along the way which, when put into a string and displayed in a textarea suddenly starts doing ThIs,SoRt Of. It’s not every other letter, more random, but it’s like it’s changing point-size or font or something.

I’ve run it through an encoding bit, like this – s = s.DefineEncoding(Encodings.UTF8) – but it doesn’t change anything. I’ve iterated through the string building a new string with the characters that are <255 ascii value and that worked except … I lost smart quotes, etc.

I don’t really understand character encoding though I’ve grappled with it many times in the past. I’m too dumb I guess, or too old. Or too lazy to look carefully at the details. I just want a way to programmatically clean this whackiness out of this particular string and others like it.

Any ideas?

Why 255? Should be 128 if you want ASCII. And of course you’d lose smart quotes.

You could post an image of what this “whacky” text looks like.

Meanwhile, have you used the debugger to examine the text? Posting the hex of the text would help too.

DefineEncoding does not change things, it is only promise, by doing it then your saying, hey I got this string here and I know its this encoding, and I want to stamp it with promise that it is that encoding.

Now if your promise is incorrect then you will get bad result.

This might help.

https://youtu.be/L8uQpu0_sFo

Here’s the whacky text:

And here’s the debugger info:

It’s a hymn which I copied online originally (I think).

I’ll check it out. Thanks.

That appears to be plain and simple ASCII.

There are a couple smart apostrophes/quotes in there but, yeah, it’s mostly just ASCII. So how’s come I’m getting that crazy printing? Mystifying. I know I can do some workarounds but I’d like to solve this puzzle.

I looked closer and it’s little/big, little/big every other letter pattern. Could I be tapping into a crazy font that does this? It begins halfway through the text.

Progress: The whacky characters are triggered by an em dash. If I swap it out for a regular dash, the problem disappears.

Is this a bug?

How ?

Also, it looks like ASCII, but it is not ASCII; I noted E2 80 94 in the hex part (after they walk the way; line 7).

Paste the string in the Code Editor, then Copy it and run the project and watch what you get…
Probably UTF8 encoded string…

1 Like

Yes:

U+2014 — e2 80 94 EM DASH

So, UTF-8.

You may want to switch off AllowStyledText in the textarea. It may solve the issue.

2 Likes

Tried that … no change.

Okay, I did some work with fonts and now the problem has disappeared. I’m not sure why or how but it seems the System font may have been giving goofy results. Just a guess.

Anyway, I’m happy now … until the WHACKY FONTS rear their ugly heads again.

“Mostly” ASCII or “appears” to be ASCII, is not the same as “it is” ASCII.

If you have UTF8 characters there, why dont you simply use:

DefineEncoding(Encodings.UTF8)
???

Once defined the correct encoding, you can replace the non ascii chars for their ascii counterparts if you want an ASCII string:

.Replace("’", "'")

It is UTF8, read my previous post or watch the strings as Hex…