Hashing/Encoding Problem

Hey gang,

I am using the following code to hash/encrypt a password and store it in a database table:

    Using Xojo.Core
    Using Xojo.Crypto
    
    Dim salt As Text = SOME_SALT_TEXT_I_DONT_SHARE
    Dim saltMB As MemoryBlock
    saltMB = Xojo.Core.TextEncoding.UTF8.ConvertTextToData(salt)
    
    Dim password As Text = UserList.CellTag(row,1).StringValue.ToText
    Dim passwordMB As MemoryBlock
    passwordMB = Xojo.Core.TextEncoding.UTF8.ConvertTextToData(password)
    
    Dim hash As MemoryBlock
    hash = PBKDF2(saltMB, passwordMB, 500, 32, HashAlgorithms.SHA512)
    
    Dim thash as string = EncodeBase64(TextEncoding.UTF16.ConvertDataToText(hash))

This was pretty much recommended by others users. The problem seems to be on Windows at least that if I use a password that is longer than 5 characters, I get a runtime exception in the very last line that an invalid character was encountered. It is in the ConvertDataToText method from Xojo that this happens.

THIS ONLY HAPPENS IN WINDOWS! In MacOS, I can use the exact same password that causes a crash in Windows and it works fine. So this looks like a Xojo problem perhaps.

What am I doing wrong or is this a Xojo bug?

Using 2017r3.

The real mystery is not why you the exception in Windows, it’s why you don’t get it everywhere.

The problem is the last line of code that attempts to convert what is, in effect, an arbitrary series of bytes into Text. That simply won’t work (or shouldn’t) so you have to base64-encode the raw bytes instead. Unfortunately the new framework doesn’t make that easy, but since API 2.0 was announced anyway, I suggest you rewrite this around the classic framework and stop using Text.

If you’d rather not do that, convert the hash to a String, then call EncodeBase64 on it.

BTW, the salt is not meant to be a secret so it shouldn’t matter whether you share it or not. It should also be a truly random series of bytes rather than text, and it should be different for each password you hash.

It if makes a difference, Bcrypt is available in my M_Crypto package and that manages the salt for you.

use the code showed in this blog post–> https://blog.xojo.com/2015/10/09/tips-dealing-with-the-problem-of-passwords/

i use it and works perfect on windows

xojo 2016 R3 / windows 10

This bears repeating. The salt exists to add uniqueness to the hash output, in order to prevent identical inputs from producing the same output.

Randomness is not required, however, and can actually be self-defeating if you don’t take steps to ensure that you never generate the same random salt twice.

Finally, random or not, ensure that salts are at least 16 bytes (128 bits) long.

I’m not sure about the length as 8 bytes should be enough if a sufficiently long password is also required, but as for the rest, agreed. Crypto.GetRandomBytes will generate cryptographically secure bytes for this purpose.

FYI, 8 bytes produce 1.8E19 possible combinations, reducing the possibility of duplicate salts to practically zero in several lifetimes. :slight_smile:

As I understand it, for an N-bit random salt (or any other data) the risk collision becomes non-negligible after only 2^N/2 values are generated, assuming a uniform distribution. For an 8-byte (64 bit) salt that’s just 2^32=4.2 billion. 16 bytes (128 bits) gives you 1.8E19 (i.e. 2^64).

That’s the first time I’ve heard that. Do you have a link? I wasn’t able to find one, but then, my search mojo is nonexistent.

Thanks. Certainly the longer the better and I would never discourage that, even if I don’t understand the reasoning behind that math. But even taken at face value, I think I’m still ok with 8 bytes. :slight_smile:

(Accidentally deleted my last post while trying to quote myself… so here it is again)

I’m basically parroting what I read on this Stack Exchange answer.

(End deleted post)

I found a more formal source in RFC2898#4.1.2:

[I]f the salt is 64 bits long, the chance of "collision" between keys does not become significant until about 2^32 keys have been produced, according to the Birthday Paradox.

That makes sense to me. Restated, if you expect to produce 2^n salts over the life of your system, best to make each one 2^2n bits or greater to avoid collisions.

Thanks for all the tips. I am not quite sure what the purpose is of Xojo’s ConvertDataToText function if it does this. I guess you can have data in your memory block that are not valid “text” characters. All I really wanted was a way to get it to a string of some sort that I can store and the blog post gave me that in the easiest way.

Kem - I’ll have to take a look at your M_crypto package.

Thanks to all as always for the education on this subject! The internet was so much more fun before all this security stuff was needed!

ConvertDataToText will take a stream of bytes that represent text and interpret those bytes through the specified text encoding to produce the intended Text.

Look at it this way: say you have a compressed file that you want to open in Word. The assumptions are that 1) you know how it was compressed (zip, gzip, StuffIt, etc.) and 2) it really does contain a Word file. In this analogy, the type of compression represents that text encoding, and the file itself represents the underlying text.

Now suppose you had a JPG of a funny cat and you tried to run that through unzip. You’d get an error right? But that’s essentially what you’re trying to do by taking the hash (arbitrary bytes) and asking it to be decoded into text as if it were UTF-8.

Does that make sense?

Yes, it does. It’s better to convert it to a hex string as in the blog post. Your analogy is a good one.

The old framework was just more forgiving - a string cold hold just about any sort of data. But Text cannot.

Right, they are different types of storage. String is a “bag of bytes” with an attached encoding to tell the system how to convert those bytes to human-readable text. Text is the end result of that, more like an array of unicode code points.