Clean up text from paste (webapp)

Hi, I have an web-app, and when a user paste text from Outlook(mac) into my textarea, I have a problem, reading it back again from the database… So I suspect it has something to do with characters… I get EOF error all the time… so I have:

“chips?
F, S”

as text, and a hex dump is: …63…68…69…70…73…2028…46…2C…20…53
But look right after 73 (that is the S), there is a 20 for space, and 28 for a start parantese, but is it not in my text and it somehow backspace a step… how can I spot this and how can I get rid of this error…

I’m out of options now… have searched a lot…

please help.

the whole text dump is:

Meny 3
Forrett
Tartar av økologisk laks med crudité av eple, rødløk og fennikel. Wasabimajones og sesamchips?F, Su,Se,E,G(hvete)
Hovedrett
Indre let med bakte småpoteter, urtestekt sopp og babyspinat, sellerikrem. Rødvinssaus?Melk,Se?
Dessert
Sitronmousse med lemoncurd,
bringebærcoulis, bær og sorbet
E,Melk,G(hvete)

and contins this error at least twise…

How are you reading/writing to the DB?
Are you specifying encoding both in and out?

Hi Roger, it is my listbox that causes the problems. Not the DB… but I have tried now with encodings to and from DB, but no help.

From DB:
encTxt = session.curs_punkt.Field( “Program” ).GetString()
encTxt = encTxt.ConvertEncoding(Encodings.UTF8)
cLB.lb.Cell(cLB.lb.LastIndex,3)=encTxt

to DB:
encTxt = ta(feltnr).text.ConvertEncoding(Encodings.UTF8)
session.mTbl_Punkt.field(“program”).SetString(encTxt)

encTxt is a string.

But still no go…

How to strip out NON-characters?

Well, using this:
Dim c as TextConverter
encTxt = session.mTbl_Punkt.field(“Program”).GetString()
c=GetTextConverter(Encodings.UTF8, GetTextEncoding(0))
encTxt=c.convert(encTxt)

for both the listbox and textarea, when showing the text worked…

So, then I’m good for now… thengs Roger, for pointing me in the encoding direction… (I have never had to use encodings before…)

are you sure 2028 isn’t the codepoint 2028 which is a Unicode “LINE SEPARATOR” and could just be Replaced with an EndOfLine?

this is most certainly an occurrence of <https://xojo.com/issue/35919>
While Xojo refuses to understand it as a bug, you can circumvent easily it with mySting.ReplaceAllB(Encodings.UTF8.Chr(8232), EndOfLine) (old Framework) but need to do this with each and every value being passed as an AJAX request to the browser.

Hi tobias, I’ll try yours, but why in the code do you use chr(8232) and not 2028 ? or is it something else ?

the Unicode codepints that don’t need escapeing in JSON but have a special meaning in Javascript (thus needing escaping otherwise you see the error) are exactly U+2028 and U+2029. These codepoints are encoded in UTF8 at decimal position 8232 and 8233, see here.

Value = ReplaceAllB(Value, Encodings.UTF8.Chr(8232), "\\u2028") Value = ReplaceAllB(Value, Encodings.UTF8.Chr(8233), "\\u2029")
It would really help if Xojo would add these two lines where the framework does the other JS escaping stuff already internally.
For all kind of Unicode analysis stuff, I can recommend the Tool ‘Unicode Checker’ for macOS.

So If I use it before storing into database, I will not need to do it again when displaying into listbox or textareas ?

correct - if you filter the input before you save it, you don’t need to escape it before displaying. But in that case the original value would be lost. You can replace it with EndOfLine, something similar, or depending on the source even "" may be appropriate.
Be careful if you try to display it directly again w/o storing it: Something like WebLabel1.Text = WebTextArea1.Text would not be possible unaltered.
The replacement of \\u2028 would be the valid JSON escape sequence for the unaltered character but this need to be done on display.