Encrypting foreign languages

I am trying to encrypt a Dutch text file with the following but it fails when hitting certain characters “AALST : Stad In Midden-België”, I’m guessing ‘ë’ is the character not working
Should I be using different encodings? I’m using the MBS code from AESMBS example

When I look at the text file in texted the text looks as above but when I import it into my edit field before encrypting the ë is replaced with a question mark. Dutch seems to be the only issue of the 6 languages I am encrypting

d = DefineEncoding(s,Encodings.ASCII)
tout.writeline encryptAES(d, keycode)

Why do you use ASCII instead of UTF8? ASCII is limited and does not know any special characters…

First time trying to encrypt and just jumped on the first MBS example.
I noticed that the special ë doesn’t display in the textfield when I import the text file and that part of my code I use tin.Encoding= Encodings.UTF8

I don’t know if this is your problem, but when you decrypt a string, it won’t have an encoding, so you have to use DefineEncoding to tell Xojo it’s UTF8.

ok I will change from ASCII to UTF8 when encrypting and decrypting but I think the problem isn’t there but when I import to the textfield and the characters are being substituted with the ?

I use

tin = f.openAsTextFile
tin.Encoding= Encodings.UTF8
while tin.eof = false
    ts = Trim(tin.readline)
    tempwc.append ts
Wend
for i = 0 to tempwc.Ubound
  a = a + tempwc(i) + Chr(13)
next i
RawTA.value = a

Where do you set the UTF8 encoding to a ? (before setting a to RawTA)

I don’t :frowning:

What is the default encoding for any string variable created in Xojo?

UTF-8
image

That’s the initial encoding. If you assign a string with a different, or Nil, encoding to your string, then it will get the encoding of that string.

This is how I import the file and this is what I see in the debugger

Dim f as FolderItem
Dim dlg as OpenDialog
Dim tin as textinputstream
Dim ts, a as String

Redim tempwc(-1)   
dlg=New OpenDialog                          
dlg.initialDirectory =  SpecialFolder.Desktop 
f=dlg.ShowModalwithin(MainWindow1)  

if f <> Nil then
  wcnt = 0
  tin = f.openAsTextFile
  tin.Encoding= Encodings.UTF8
  while tin.eof = false
    ts = Trim(tin.readline)
    sseparator = -1
    if ts.IndexOf(Chr(58)) <> -1 then 'test for a  colon chr(58): 
      
      sseparator = 1
      
      if ts <> ""  then
        tempwc.append ts 
        wcnt = wcnt + 1
      end if
      
    end if
  wend
  tin.close
  

Screenshot 2024-04-04 at 7.34.09 pm

Confirmed that the import file is UTF8

Screenshot 2024-04-04 at 7.39.51 pm

You are importing a file that you are assuming it is UTF-8, if you get wrong results then most likely your file is not UTF-8.

There are Text editors that can tell you what encoding the text has. Maybe some Windows encoding?

Not because you see special characters in a file means that the encoding is UTF-8.

1 Like

XOJO says the original is UTF8 but I will download Bbedit and check… thanks for the suggestion

How is Xojo saying that?

If the db does not store the encoding, how can you be sure you will get your non ASCII characters ?

So, apply an UTF8 encoding to a and check what happens. If everything is OK, your get the good code. :wink:

1 Like

I had 6 docx files which I loaded into Pages and exported as plain text. I thought they were all UTF8 as I loaded them into textedit for some minor editing and saving. Somehow the Dutch translations ended up Mac OS Roman… not even sure how that was possible but as @AlbertoD said the assumption was wrong.

I might have to start a new thread but can you search foreign characters like ë in a Regex search? I have a dictionary that I have been using for years to search plain text matches A-Z but if I try a special character it raises and search error?

Current Search pattern rg.SearchPattern = “[^a-zA-Z0-9{}.,*]”

Yes, is better to discuss RegEx search patterns in a new thread.

This works for ë in Regex101 site

[^\x{00eb}a-zA-Z0-9{}.,*]

sometimes you see ë but in fact is 2 characters in one (different Unicode), the fun with foreign letters.

1 Like

Thanks that works great :slight_smile: