How to Compress a part of a Unicode text file

I’ve been using EncodeBase64 and I realized in the notes it states

and I regularly add Unicode text.
Are there compression libraries that will return compressed Unicode text, not a file?
Thomas Templeman did write a library, but that will write a file.
I am pretty sure I could use MBS, but I’d rather not. No offense meant to MBS.

I’m a bit confused. Regardless of the notes, EncodeBase64 will work just fine on your UTF-8-encoded string, you just have to define the output as UTF-8 on the other side.

But Base64 is not for compression, it’s to encode raw bytes into something made up of entirely of visible characters in the ASCII range. In fact, it will expand your string into something longer, so what is it that you’re trying to accomplish?

Kem is right, Base64 encoding increases the size by roughly 25%.

My zlib wrapper can do in memory compression/decompression:

Dim data As String = "foo" data = zlib.Deflate(data) data = zlib.Inflate(data) //or data = zlib.GZip(data) data = zlib.GUnZip(data)

DecodeBase64 doesnt expect codes above 127, because EncodeBase64’s raison d’etre is to create a file where no such values exist.

Sorry about being obscure about why.
Thanks for the assurance about EncodeBase64.
I use EncodeBase64 to put it into a preferences file that is written as text. That way I can be assured that any endofline and tab characters, the array delimiters, don’t cause problems.
Now I have ran into a problem of file names on Mac that have extensions and characters that are compounded (wrong word but) like “??? language.png” become “??? language.png” under certain conditions. It happens with any multimedia file, and since on Apple it isn’t a bug, I need to change.

EncodeBase64 also, I think, will take up a lot more space than the actual name, so I wanted to find an alternative.

Can You save me some time. Can it be a Plug-in or a library?

Is this New ??? (no encoding ?)

Sorry but it’s old. Maybe 5 to 10 years. I posted something on the previous forum.
I couldn’t find a post that actually referenced the problem.
My name on there is Art and I searched for art and file and got something from about 2009
I am not sure of the terminology except the characters are decomposed.

look for “composed” on the forum. You will get better results. E.g. String comparison works on Windows but not on OSX

In a 2009r5 created application, I can add an encoding to BASE64… so my “New ?" question is bad. That application is 10 years old !

Short answer: No. It’s written in plain Xojo and AFAIK Xojo plugins are written in C++.

This code works for me

  dim s1 as string = "??? language.png"
  dim s2 as string = EncodeBase64(s1)
  dim s3 as string = DecodeBase64(s2)
  s3 = DefineEncoding(s3, encodings.utf8)

An encoded string is simply a sequence of 8-bit bytes. Base64 takes a arbitrary sequence of bytes and transforms it into a sequence of “safe” byte values that will not interfere with transmitting the string, and then transforms it back into the exact same sequence of byte values. In order to display those bytes in some language, you have to tell it what kind of encoding to use.

Thanks Tim. I remember now in my line with DecodeBase64, I actually have that.

ts = DecodeBase64(pArrayFile(rowcnt), Encdng ) Where Encding is a property for “encodings”.
I think I’ll stick with EncodeBase64.
I’m not sure what the line [quote]expect codes above ASCII 127[/quote] means and I’ll file a request for clarification.

@Arthur Gabhart — Well Base64 and compression can work together. You can compress data with a few Declares to ZLib and use EncodeBase64 so you can store the result easily.

When you need the data, use DecodeBase64 then decompress the result and you get the original text.

Here is my implementation of compression/decompression with ZLib on macOS:

[code]Function GZip(extends data as string, level as integer = 6) As string
//# Compress a string with GZip and returns the compressed string.
soft declare function compress2 lib “libz.dylib” (outBuf as Ptr, byref outBufLen as UInt32, inBuf as Ptr, inBufLen as UInt32, level as Integer) as integer

dim outBuffer, inBuffer as MemoryBlock
dim bufferLen as UInt32

inBuffer = data
outBuffer = New MemoryBlock( lenB( data ) * 1.001 + 12 )
bufferLen = outBuffer.Size

dim err as integer = compress2( outBuffer, bufferLen, inBuffer, inBuffer.Size, level )

if err=0 then
return outBuffer.StringValue( 0, bufferLen )
return “”
end if
End Function

And for decompressing:

[code]Function GUnzip(extends data as string, expectedMaxSize as integer = 0) As string
//# Decompress a string compressed with GZip and returns the decompressed string.
// By default, the max expected size is 10 times the data size

soft declare function uncompress lib “libz.dylib” (outBuf as Ptr, byref outLen as UInt32, inBuf as Ptr, inLen as UInt32) as integer

dim inBuffer, outBuffer as MemoryBlock
dim err as integer
dim bufferLength as UInt32
dim maxSize as integer

if expectedMaxSize>LenB( data ) then
maxSize = expectedMaxSize //Use expectedMaxSize if it has a sensible value
maxSize = 10 * LenB( data ) //By default, we use 10 times the size of the compressed data
end if

inBuffer = data
outBuffer = New MemoryBlock( expectedMaxSize )
bufferLength = outBuffer.Size
err = uncompress( outBuffer, bufferLength, inBuffer, inBuffer.Size)

if err=0 then
return outBuffer.StringValue( 0, bufferLength )
return “”
end if
End Function

You MISREAD the description

What is says is the SMTP doesn’t expect codes above 127… Which is the reason that BASE64 is used

Actually, plain C is sufficient to write a Xojo plugin.