Storing multiple values in an Integer

Rick_Araujo · March 6, 2024, 5:59pm

That’s good enough for me.

But the approach I was thinking was infinite feeding of bytes plus an end of feed instruction.

Are you dealing with the fractional ending needing “=” fillers?

Eric_Williams · March 6, 2024, 6:02pm

Ooops, off by a factor of a lot there. Make that 64GB.

Carry on. Your code looks pretty tight already.

Eric_Williams · March 6, 2024, 6:05pm

I’m off to an appointment, but I do think there’s a speedup to be had here by treating your string as bytes instead of characters. You’ll avoid a lot of string encoding overhead. I’ll think about it and get back.

Aaron_Hunt · March 6, 2024, 6:08pm

I didn’t work that in, just characters for numbers. Also, input zero returns an empty string.

Hm, I’d be curious to see that. I guess you must have similar things already made?

Rick_Araujo · March 6, 2024, 6:10pm

Not in Xojo, not here. But tonight I may play of writing something and can show you.

Eric_Williams · March 6, 2024, 6:37pm

What are the contents of your b64int lookup?

Aaron_Hunt · March 6, 2024, 6:40pm

b64int = new fastCaseSensitiveArray
// same characters as RFC 4648 but starts with the numbers
dim j as integer
for j = 48 to 57
  b64int.append chr(j)
next
for j = 65 to 90
  b64int.append chr(j)
next
for j = 97 to 122
  b64int.append chr(j)
next
b64int.append "+"
b64int.append "/"

EDIT: I guess it could just store ASCII values in a normal array instead, use chr() in the encode and asc() in the decode method, and sidestep the whole case problem.

Eric_Williams · March 6, 2024, 6:56pm

If I understand your code correctly - I believe you’ve gotten your character ranges a little out of order. I think you need to start with the 65-90 range, then the 97-122 range, then the 48 to 57 range, and then the + and / characters. I’m referring to the table on this page:

Aaron_Hunt · March 6, 2024, 6:58pm

Right, I did that on purpose (noted above in comments). The idea is that it’s more like other bases, starting by counting with numbers, then moving to letters.

Eric_Williams · March 6, 2024, 7:00pm

Oh, so you’re not intending to be interoperable with RFC 4648.

Aaron_Hunt · March 6, 2024, 7:08pm

No, I just took the character set. But it was just an experiment.

I just tested for speed using a normal integer array of ascii values, using asc() and chr() in the encode / decode methods, and it appears to be about twice as fast. So there’s no need for a case-insensitive array.

I would guess there is a hit using String.Split() to decode, but I couldn’t think of any way around it.

Eric_Williams · March 6, 2024, 7:09pm

Yes, there is a hit, but the more concerning problem is that you are left with an array of characters and not an array of bytes. I’m in the middle of mixing up a batch of cupcakes but I’ll come back with more explanation for why this is a problem. Fortunately, the fix for it will speed up your code.

Aaron_Hunt · March 6, 2024, 7:12pm

Hm, is the solution to fill a static memoryblock with the string, then read the byte values? Somehow that sounds like it would be slower, but let’s see …

Oh, wait … I think you mean don’t use strings at all. Use Memoryblocks, and let Xojo convert them to strings? That also sounds expensive.

I better just wait for the cupcakes to bake (I have to grade some student’s work anyway).

Eric_Williams · March 6, 2024, 7:16pm

You got it.

The problem with characters is they can be multi-byte - i.e., “é” may be more than one byte. If you dump the contents of the string into a MemoryBlock, you can then read out the bytes one at a time.

This sidesteps all of the processing that goes with handling strings, like encodings. You’ll find it much faster. Plus, if you read out byte values as integers, you can use a simple array to store your encoded values, instead of an lookup indexed by case-insensitive strings. This will be a huge speedup.

Eric_Williams · March 6, 2024, 7:22pm

So it’s something like:

'Build a lookup and keep it for later
dim base64Lookup as MemoryBlock

base64Lookup=new MemoryBlock(255)

dim mIndex as integer

for j as integer=48 to 97
  base64Lookup.Byte(mIndex)=j
  
  mIndex=mIndex+1
next

for j as integer=65 to 90
  base64Lookup.Byte(mIndex)=j
  
  mIndex=mIndex+1
next
 

's is your input string
dim s as String

dim sBytes as MemoryBlock

'You can create a MemoryBlock directly from a string!
sBytes=s

dim base64Value as UInt8

for byteIndex as integer=0 to sBytes.Size-1
  base64Value=base64Lookup.Byte(sBytes.Byte(byteIndex))
next

Whenever you’re encoding binary data in Xojo like this, you generally want to stay away from the encoding-aware functions like Asc and Chr, and deal directly with the string’s bytes themselves. It’s too easy to lose data otherwise.

Aaron_Hunt · March 6, 2024, 9:44pm

That’s good advice, but in this case, since the values are known to be safe ASCII, I don’t see a reason to avoid asc() and chr(). I put together a version using only memoryblocks, and it’s about 1/3 slower than just using integer arrays with ASCII values. So it looks like indexOf on an integer array is faster than accessing specific bytes of a memoryblock.

Eric_Williams · March 6, 2024, 9:49pm

Show me your MemoryBlocks.

Aaron_Hunt · March 6, 2024, 10:09pm

The lookup block is as you suggested. I only tested the first 20 or so iterations and I wasn’t looking at the values, just the microseconds result. Just tried again for higher values and it’s crashing. Looking at the code, it seems when using blocks the logic has to change … looks like the lookup values have to be the opposite of what they were in the array model. Hm. No, sorry, I’m confused. If you got it working, it would be better for me to look at what you have.

Aaron_Hunt · March 6, 2024, 10:17pm

If you’re not using asc(), then how are you getting a value to lookup? It’s no longer making sense.

I think your initial table must be wrong. What has to be done is essentially to recreate the ASCII table as a memoryblock. The values are no longer 0-63 as they are in an array. They are at the positions they would be in the ASCII table …

EDIT: I don’t see how you’re doing it with just one lookup block. You need a block to get the values, which is what asc() does, and another block to do a reverse lookup, which is what chr() does. If I’ve understood that correctly, you’re saying it’s faster to make your own tables with blocks than to use asc() and chr()?

Aaron_Hunt · March 6, 2024, 11:32pm

Having a conversation with myself here, but I figured out that that must be what you were saying, because it’s true! Making my own lookup tables to replace asc() and chr() turned out to be between 3 to 4 times faster than using asc() and chr().