Storing multiple values in an Integer

Aaron_Hunt · March 6, 2024, 3:52pm

Sometimes I need to look up an object in an array by matching several of its properties. The values might be anything, but let’s say for example I need to match a boolean, and two numbers, where I know the values will be within some limit, say one is always between 0-15, and the other is a 32-bit int.

In the past I’ve used strings to put together values into something easy to read, like “false,7,234576” and stored them in an array, then build the search string the same way from the values I want to find, and use IndexOf on the string array to look up the index, then I can get the object from that index.

That works, but it’s clumsy. Because it involves string conversion and parsing, it’s ridiculously slow.

So I took a clue from the hardware world (where memory is scarce), and got the idea to treat a 64-bit integer as a memoryblock and pack it with bits, then use IndexOf on that instead. It isn’t quite as easy to build the search value, but it works faster and saves memory.

I was just wondering if anyone else does this.

// Storing multiple values in an Integer
' ( or Using Integers as Memoryblocks )

' value A: 0-1 = 1 bit
' value B: 0-15 = 4 bits
' value C: 0-4294967296 = 32 bits

// 1+4+32 = 37 bits, so in a 64-bit integer we'll have 27 empty bits

// Example: this should give a result of 1's
dim A, B, C, D as Integer
A = 1
B = 15
C = 1023

' pack the values into D
D = A + Bitwise.ShiftLeft( B, 1 ) + Bitwise.ShiftLeft( C, 5 )

' check it
dim iA, iB, iC as integer
dim sA, sB, sC, sD, e as string
e = EndOfLine
iA = Bitwise.BitAnd( D, 1 )
iB = Bitwise.BitAnd( D, 30 ) \ 2
iC = Bitwise.ShiftRight( D, 5 )

sA = str( iA )
sB = str( iB )
sC = str( iC )
sD = D.ToBinary()

msgbox "Encoded values:"+e+sD+e+sA+e+sB+e+sC

Eric_Williams · March 6, 2024, 4:05pm

If you don’t mind wasting a tiny bit of memory, this might be faster:

Public Function GetKeyForValues(a as integer, b as integer, c as integer) As integer
  'Assumes a and b are 8-bit values and c is at most 16 bits
  static m as memoryblock
  
  if m=nil then
    m=new MemoryBlock(8)
  end
  
  m.UInt8Value(0)=a
  m.UInt8Value(1)=b
  m.UInt16Value(2)=c
  
  return m.UInt64Value(0)
End Function

Eric_Williams · March 6, 2024, 4:12pm

Caveat: not likely to be faster if you encapsulate it in a function like this and it gets called a million times. The stack overhead from calling the method will eat up any gains. However, you could extract the code and use it wherever needed. Just make sure you aren’t constantly re-allocating the memory block.

Eric_Williams · March 6, 2024, 4:18pm

A quick-and-dirty test suggests that the MemoryBlock approach is about 14% faster than bit shifting.

Aaron_Hunt · March 6, 2024, 4:25pm

Hm, I set out to avoid memoryblocks thinking that the overhead of creating them would be slower than working directly with an integer, so I’m surprised by your findings. Theoretically, shouldn’t bit shifting be faster?

… I mean, obviously it doesn’t matter, if one thing is faster than the other then that’s the way it is. I’m just wondering why bit shifting is slower when it’s supposed to be lower level (so should be faster).

EDIT: Ah, now I get it. It’s keyword static, as you pointed out. That makes sense!

Eric_Williams · March 6, 2024, 4:40pm

Yeah, the key is to not re-allocate the MemoryBlock - since it’s a fixed size, and dedicated to this one use, it’s safe to simply reuse it.

After posting my reply, I thought the same and threw together a quick benchmark. I guess a few small memory writes and a read are quicker than two bitshifts and several additions.

Aaron_Hunt · March 6, 2024, 4:46pm

Thanks for improving this for me

Rick_Araujo · March 6, 2024, 4:48pm

Many languages provides fast access to bit structures, you just need to define them and use. C and Pascal for example. I miss advanced structures and multiple views of the same structure (called unions in C).

This Pascal structure fits in 2 bytes for example:

TSomeBitLevelStructure = bitpacked record
  OneBit: 0..1;
  TwoBits: 0..3;
  FourBits: 0..15;
  EightBits: 0..255
end;

And the compiler sets and read them as fast as it can for you, directly.

In Xojo one must play with masks, shifts, ors and ands by themselves.

Aaron_Hunt · March 6, 2024, 4:53pm

Looks like I wrongly assumed that bit shifting in Xojo would be super-efficient, as it seems to be in other languages. Just curious, anyone know why in Xojo it isn’t ?

Rick_Araujo · March 6, 2024, 5:02pm

The more low level prepared a language is, more efficient it is planned to handle low level matters, bit fields are one of those matters. XOJO is a high level language, it needs a bit more effort from you, and they don’t care much about squeezing bits or counting clock pulses to solve problems as low level languages do.
So, when in some language some instruction you write may end as few bytes of fast inline machine code playing with CPU registers, in Xojo you can end with parameters going to the stack, a function call, and results being popped from the memory after.

Eric_Williams · March 6, 2024, 5:06pm

A few years ago, Xojo introduced Bitwise.OR, Bitwise.AND, etc. to replace the OR, AND, etc. operators. Guess what - they’re slower, probably due to the stack frame overhead Rick is alluding to. So I’m always a little cautious when I see bit operations in Xojo and tend to second-guess their performance.

Rick_Araujo · March 6, 2024, 5:14pm

Current Xojo don’t need the BitWise module prefix. Just the operator like ShiftLeft()

I guess that is an acceptance of it as a language feature (reserved word?) and may progress towards optimizations.

How good it is? People need to measure and compare the performances of lots of random 64 bit string manipulations and the same 64 byte string ones. If the necessity is just packing info, you won’t care if the 64bit one performs a bit slower anyway.

Aaron_Hunt · March 6, 2024, 5:29pm

Current Xojo don’t need the BitWise module prefix. Just the operator like ShiftLeft()

Aha, thanks for the tip. I didn’t know they made those public.

I’ve collaborated on many hardware projects so I got familiar with low-level concerns that way, but am of course spoiled at the high level by Xojo.

Related to all this: just for fun I recently made a Base64 encoder / decoder for integers. It uses bit shifting in groups of 6 bits to turn unsigned 64 bit integers into strings using RFC 4648 (reordered to begin with the numbers). Now I’m thinking, based on the info here, instead of doing bit shifting in the encode / decode methods, it should be faster to use static memoryblocks, allocate a little extra space and work with bytes on the memoryblocks instead. Makes sense.

Eric_Williams · March 6, 2024, 5:36pm

I’d be curious to see your original code. MemoryBlocks only deal with bytes - it’s difficult to work across byte boundaries, as you’d be doing with Base64 encoding.

Rick_Araujo · March 6, 2024, 5:40pm

You feed an encoder with bytes, but the encoder “sees” 8 bits entering to the queue, just it. You may have a 64 bit buffer receiving and consuming those bits.

Aaron_Hunt · March 6, 2024, 5:45pm

Sure. Since the native IndexOf isn’t case-sensitive, it uses a dictionary-based class for lookup Kem Tekinay helped me with some years ago. Otherwise with the bit shifting it’s what you’d expect. With the memoryblock bytes I wasn’t thinking of bridging across boundaries, instead just leave 2 empty bits in each byte … but I see what you mean, because then it’s not possible to read a UInt64 value directly; my thinking was a bit flawed there.

Eric_Williams · March 6, 2024, 5:47pm

Yeah, that’s where it breaks down. You can’t store a value into a MemoryBlock on anything but an 8-bit boundary - so there’s no way to stack 6 bit values end-to-end without gaps in between the trailing and leading bits.

Rick_Araujo · March 6, 2024, 5:51pm

Sure you can. I wrote lots of encoders and decoders for years. I can write something like that without dictionaries and bit gaps.

Aaron_Hunt · March 6, 2024, 5:54pm

Public Function ToBase64(Extends i as Integer) As string
  // Encode a 64-bit integer with (up to) 11 6-bit ASCII characters
  ' from the printable set of 64 characters after RFC 4648 (reordered)
  dim j, b, L as integer
  ' get the number of characters needed
  L = (log(i)/log(2.0))/6
  ' encode each 6-bit segment into a character
  dim s as string
  for j = L downto 0
    b = ShiftRight( i, j*6 )
    b = BitAnd( b, 63 )
    s = s + b64int.GetValue( b )
  next
  return s
End Function

Public Function FromBase64(Extends s as string) As integer
  // Decode a 64-bit integer from (up to) 11 6-bit ASCII characters
  ' from the printable set of 64 characters after RFC 4648 (reordered)
  dim j, k, b, i as integer
  ' decode each character as a 6-bit segment
  dim c() as string = s.split("")
  for j = c.LastIndex downto 0
    k = c.LastIndex - j
    i = b64int.IndexOf( c(j) )
    b = b + ShiftLeft( i, k*6 )
  next
  return b
End Function

b64int() is a fastCaseSensitiveArray, which is my subclass of Kem’s CaseSensitiveDictionary that adds an indexOf, Ubound (now also LastIndex), append, and a GetValue method which works like a standard array but looks a little different - instead of writing b64int(0) you have to write b64int.GetValue(0).

My code is likely far from optimal. I’m sure you guys can do better (At least I removed Bitwise before posting this )

Eric_Williams · March 6, 2024, 5:56pm

Mathing it out: a direct lookup table of all possible 3-byte values to Base64 4-byte encoded values is “only” about 65MB (Edit: 64GB, so stop reading here). A very fast approach to encoding might be to simply build the lookup in memory and use 3 bytes at a time to index into the lookup directly, retrieving the 4 byte encoded value.

Decoding would take a much, much larger table and is not recommended.