Fastest way to decode this string?

I have a string that I need to decode and it’s in the following format:


The two digits following \x are hex codes that represent the ascii character.

I wrote something to do it just with regular xojo string parsing and it works but it’s insanely slow when you have megabytes and megabytes and megabytes to process.

Thanks in advance.

Assuming you know nothing about the source other than ascii codes, I’d use MemoryBlock. Especially if you use Ptrs, that should be very fast.

Let me see if I can work up the code…

Isn’t this basically percent encoding with \x instead of %?

Dim s As String = encodedstring
s = ReplaceAll(s, "\x", "%")
s = DecodeURLComponent(s)

Note, minimally tested, but it takes less than 4s for about 25 MB once compiled:

Private Function ToValue(hexByte As Integer) As Integer
  const kZero as integer = 48
  const kUpperA as integer = 65
  const kLowerA as integer = 97
  select case hexByte
  case is >= kLowerA
    return hexByte - kLowerA + 10
  case is >= kUpperA
    return hexByte - kUpperA + 10
  case else
    return hexByte - kZero
  end select
End Function

Public Function Decode(source As String) As String
  var sourceMb as MemoryBlock = source
  var destMb as new MemoryBlock( sourceMb.Size )
  var sourcePtr as ptr = sourceMb
  var destPtr as ptr = destMb
  var sourceIndex as integer = 0
  var destIndex as integer = -1
  var goneTooFarIndex as integer = sourceMb.Size - 3
  const kBackslash as integer = 92
  const kX as integer = 120
  while sourceIndex < sourceMb.Size
    var thisByte as integer = sourcePtr.Byte( sourceIndex )
    if thisByte = kBackslash and sourceIndex < goneTooFarIndex then
      var nextByte as integer = sourcePtr.Byte( sourceIndex + 1 )
      if nextByte = kX then
        thisByte = ToValue( sourcePtr.Byte( sourceIndex + 2 ) ) * 16 + _
        ToValue( sourcePtr.Byte( sourceIndex + 3 ) )
        sourceIndex = sourceIndex + 3
        // It's something else
        thisByte = nextByte
        sourceIndex = sourceIndex + 1
      end if
    end if
    destIndex = destIndex + 1
    destPtr.Byte( destIndex ) = thisByte
    sourceIndex = sourceIndex + 1
  return destMb.StringValue(0, destIndex, Encodings.UTF8)  
End Function

Unless the source string might contain \\x. Otherwise, your plan takes 0.5s here compiled.

Public Function Decode2(s As String) As String
  return DecodeURLComponent( s.ReplaceAll( "\x", "%" ) )
End Function
1 Like

BUT if I add pragmas, mine takes 0.2 s. So there. :slight_smile:

To the top of each method in my first post:

#if not DebugBuild
  #pragma BackgroundTasks false
  #pragma BoundsChecking false
  #pragma NilObjectChecking false
  #pragma StackOverflowChecking false

To avoid improper percent decoding, one need to encode those found in the source string first too, before the “URL decoding”, as:

Return DecodeURLComponent( s.ReplaceAll("%","%25").ReplaceAll( "\x", "%" ) )

@Kem_Tekinay, @Andrew_Lambert and @Rick_Araujo, I can’t thank you enough. These are all great approaches that work orders of magnitude faster than what I was doing. Kem, that is a truly magnificent piece of code, thanks for writing that up. Andrew and Rick, you’re spot on about the simplicity of the problem that was staring me in the face!

Thanks everyone, I really appreciate it.