Encrypting foreign languages

Martin_Fitzgibbons · April 4, 2024, 5:22am

I am trying to encrypt a Dutch text file with the following but it fails when hitting certain characters “AALST : Stad In Midden-België”, I’m guessing ‘ë’ is the character not working
Should I be using different encodings? I’m using the MBS code from AESMBS example

When I look at the text file in texted the text looks as above but when I import it into my edit field before encrypting the ë is replaced with a question mark. Dutch seems to be the only issue of the 6 languages I am encrypting

d = DefineEncoding(s,Encodings.ASCII)
tout.writeline encryptAES(d, keycode)

Thomas_Roemert · April 4, 2024, 5:25am

Why do you use ASCII instead of UTF8? ASCII is limited and does not know any special characters…

Martin_Fitzgibbons · April 4, 2024, 5:30am

First time trying to encrypt and just jumped on the first MBS example.
I noticed that the special ë doesn’t display in the textfield when I import the text file and that part of my code I use tin.Encoding= Encodings.UTF8

Tim_Hare · April 4, 2024, 5:52am

I don’t know if this is your problem, but when you decrypt a string, it won’t have an encoding, so you have to use DefineEncoding to tell Xojo it’s UTF8.

Martin_Fitzgibbons · April 4, 2024, 6:04am

ok I will change from ASCII to UTF8 when encrypting and decrypting but I think the problem isn’t there but when I import to the textfield and the characters are being substituted with the ?

I use

tin = f.openAsTextFile
tin.Encoding= Encodings.UTF8
while tin.eof = false
    ts = Trim(tin.readline)
    tempwc.append ts
Wend
for i = 0 to tempwc.Ubound
  a = a + tempwc(i) + Chr(13)
next i
RawTA.value = a

Emile_Schwarz · April 4, 2024, 7:27am

Where do you set the UTF8 encoding to a ? (before setting a to RawTA)

Martin_Fitzgibbons · April 4, 2024, 7:28am

I don’t

Martin_Fitzgibbons · April 4, 2024, 8:10am

What is the default encoding for any string variable created in Xojo?

AlbertoD · April 4, 2024, 8:27am

UTF-8

TimStreater · April 4, 2024, 8:31am

That’s the initial encoding. If you assign a string with a different, or Nil, encoding to your string, then it will get the encoding of that string.

Martin_Fitzgibbons · April 4, 2024, 8:36am

This is how I import the file and this is what I see in the debugger

Dim f as FolderItem
Dim dlg as OpenDialog
Dim tin as textinputstream
Dim ts, a as String

Redim tempwc(-1)   
dlg=New OpenDialog                          
dlg.initialDirectory =  SpecialFolder.Desktop 
f=dlg.ShowModalwithin(MainWindow1)  

if f <> Nil then
  wcnt = 0
  tin = f.openAsTextFile
  tin.Encoding= Encodings.UTF8
  while tin.eof = false
    ts = Trim(tin.readline)
    sseparator = -1
    if ts.IndexOf(Chr(58)) <> -1 then 'test for a  colon chr(58): 
      
      sseparator = 1
      
      if ts <> ""  then
        tempwc.append ts 
        wcnt = wcnt + 1
      end if
      
    end if
  wend
  tin.close

Screenshot 2024-04-04 at 7.34.09 pm

Confirmed that the import file is UTF8

Screenshot 2024-04-04 at 7.39.51 pm

AlbertoD · April 4, 2024, 8:40am

You are importing a file that you are assuming it is UTF-8, if you get wrong results then most likely your file is not UTF-8.

There are Text editors that can tell you what encoding the text has. Maybe some Windows encoding?

Not because you see special characters in a file means that the encoding is UTF-8.

Martin_Fitzgibbons · April 4, 2024, 8:49am

XOJO says the original is UTF8 but I will download Bbedit and check… thanks for the suggestion

TimStreater · April 4, 2024, 8:53am

How is Xojo saying that?

Emile_Schwarz · April 4, 2024, 9:08am

If the db does not store the encoding, how can you be sure you will get your non ASCII characters ?

So, apply an UTF8 encoding to a and check what happens. If everything is OK, your get the good code.

Martin_Fitzgibbons · April 4, 2024, 11:33am

I had 6 docx files which I loaded into Pages and exported as plain text. I thought they were all UTF8 as I loaded them into textedit for some minor editing and saving. Somehow the Dutch translations ended up Mac OS Roman… not even sure how that was possible but as @AlbertoD said the assumption was wrong.

I might have to start a new thread but can you search foreign characters like ë in a Regex search? I have a dictionary that I have been using for years to search plain text matches A-Z but if I try a special character it raises and search error?

Current Search pattern rg.SearchPattern = “[^a-zA-Z0-9{}.,*]”

AlbertoD · April 4, 2024, 12:05pm

Yes, is better to discuss RegEx search patterns in a new thread.

This works for ë in Regex101 site

[^\x{00eb}a-zA-Z0-9{}.,*]

sometimes you see ë but in fact is 2 characters in one (different Unicode), the fun with foreign letters.

Martin_Fitzgibbons · April 4, 2024, 12:15pm

Thanks that works great