declares against zlib leading to crash on Linux

I have the strange effect that some of my declares that invoke “libz.so.1” on a Fedora 32 or Debian 10 system crash the app, and I can’t figure out why.

Same declares work on macOS, linked to the same lib, and even on another Debian 9 system. All built for 64 bit.

Are there any known issues with declares on Linux 64 bit, such as that the ABI used for libs is not 100% compatible with how Xojo uses it?

Here are some details:

C prototypes:

const char * zlibVersion (void); int deflateInit_ (z_streamp strm, int level, const char *version, int stream_size); int deflateInit2_ (z_streamp strm, int level, int method, int windowBits, int memLevel, int strategy, const char *version, int stream_size);

Declares in Xojo:

[code]soft declare function zlibVersion lib “libz.so.1” () as CString
soft declare function deflateInit_ lib “libz.so.1” (stream as Ptr, level as Integer, version as CString, stream_size as Integer) as Integer
soft declare function deflateInit2_ lib “libz.so.1” (stream as Ptr, level as Integer, method as Integer, windowBits as Integer, memLevel as Integer, strategy as Integer, version as CString, stream_size as Integer) as Integer

const MAX_WBITS = 15
const DEF_MEM_LEVEL = 8
const Z_DEFAULT_STRATEGY = 0
dim err as Integer
dim mStream as new MemoryBlock (112) // size for 64 bit
// works:
dim version as String = zlibVersion()
err = deflateInit_ (mStream, level, version, mStream.Size)
// crashes:
err = deflateInit2_ (mStream, level, Z_DEFLATED, MAX_WBITS, DEF_MEM_LEVEL, Z_DEFAULT_STRATEGY, version, mStream.Size)[/code]

But it’s not just the higher number of arguments. Even calling “deflate”, which only takes two args, crashes.

So, there’s something very odd going on.

Are you sure the lib is a 64-bit lib?

Build a C app to print you the size of the structure and the offsets.
Maybe alignment is different there.

Alignment has been verified with a C program on the same system. And it’s of course an ELF 64 bit file, or it wouldn’t even load.

Maybe the lib has been created incorrectly. I’ll try to rebuild it next.

You could use my plugin instead…

For the curious. Here’s the C test code:

[code]#include <stdio.h>
#include <zlib.h>

void main () {
printf ("sizeof(z_stream): %ld
", sizeof(z_stream)); // “112”
const char *vers = zlibVersion();
printf ("version: %s
", vers); // “1.2.11”
printf ("Z_DEFLATED: %d, MAX_WBITS: %d, Z_DEFAULT_STRATEGY: %d
", Z_DEFLATED, MAX_WBITS, Z_DEFAULT_STRATEGY);
z_stream zs = {0};
int err1 = deflateInit2_ (&zs, -1, 8, 15, 8, 0, “1.2”, 112);
printf ("deflateInit2_: %d
", err1); // “0”
}[/code]

Build and run with:

gcc test.c -lz ; ./test.out

Output is exactly the same on macOS and Fedora, including the version number:

sizeof(z_stream): 112 version: 1.2.11 deflateInit2_: 0

And this is the equivalent Xojo code:

#if TargetMacOS const libz = "/usr/lib/libz.dylib" #else const libz = "libz.so.1" #endif declare function zlibVersion lib libz () as CString declare function deflateInit2_ lib libz (zs as Ptr, l as Integer, m as Integer, b as Integer, ml as Integer, st as Integer, v as CString, ss as Integer) as Integer dim version as String = zlibVersion() print version dim zs as new MemoryBlock (112) dim err as Integer = deflateInit2_ (zs, -1, 8, 15, 8, 0, "1.2", zs.Size) print str(err) // should be 0 break

And this works on macOS but crashes on Fedora at the deflateInit2_ call.

This is pretty straight-forward. That’s why I suspect an issue with Xojo here, not on my end.

Maybe try 88 bytes?

zlib is checking the mStream.Size to ensure it matches what the installed solib is expecting. I’ve observed three sizes: 56 bytes for 32-bit; 88 bytes for a 64-bit version that uses 32-bit integer members; and 112 for 64 bit versions that use 64 bit integer members. And as far as I can tell the only way to know which one is installed is to see which one doesn’t fail.

Andrew, as you wrote yourself, the size is checked by the libz functions. So, if it were wrong, it would return an error (-6). You can easily try this yourself. No, I do everything right, rest assured. This is an issue with Xojo, not in my declares.

I’ve disassembled the generated code, both from gcc and from Xojo, and I can now see that Xojo puts the arguments into the wrong registers, just as I had suspected.

And since there’s no one at Xojo who can fix this, I’m left with writing a stub for this. Damn.

Thomas, report your findings for them to fix as soon as possible.

If you are interested in a plugin based workaround, please let me know.

I could not believe that the ABI would be different between macOS and Linux (and I looked up the docs, which said they’d be equal), so I double checked with more care (pen & paper method)

Turns out that I made a mistake analysing the code in Hopper - one sometimes sees what one wants to see, and not what’s actually there :slight_smile:

So no error there – the registers appear to be correctly set up (though, Xojo’s generated code is awfully convonluted, hence my mistake because I misread the flow of the many superfluous assignments).

So the next step is to use gdb to figure out what’s going wrong. Oh, the fun.

What version of xojo?

I tried with 2016, 2019r1.1 and 2019r3.1. All the same.

Stepping thru the Linux code with gdb now, which is really tedious, as I have to re-enter “disass” and “stepi” commands repeatedly. Is there some easier way? Are there easy-to-install guis for gdb, like Xcode does on macOS?

Ah: “gdb -tui”, then read https://sourceware.org/gdb/current/onlinedocs/gdb/TUI.html

Alright. I’ve figured it out.

I’m invoking functions I declared against the external libz.so.1 in /usr/lib64, and they’re getting called. But then, these functions look up further symbols of their own lib, and at that point, for reasons I don’t understand yet, the symbols declared in Xojo’s framework (e.g. XojoConsoleFramework64.so) are found and called, because the Xojo framework has declared the same symbols, but apparently built from an older version of the zlib source code, and they’re incompatible - hence the crashes.

Now, why does this happen? I’d think that when a dynamically loaded lib looks up symbols, it would find them in their own domain first, but that doesn’t seem to be the case for Linux.

Whose fault is this? I can’t say until I know the rules of how such things are to be prevented. Clearly, we have a case of duplicate name use here, in two libs. What are the rules for Linux? Does someone know?

At least, though, Xojo is violating some easy-to-understand rules here:

  1. The framework uses its own copy of the zlib code even though it should instead just link to the one the system provides.
  2. If it uses its own, it should declare the functions as private to the framework or rename them if they need to be exported in order to avoid conflicts with the existing system libs.

Will this get fixed? I highly doubt so, seeing the overall state of low level issues that aren’t getting addressed for years now. I even doubt that there’s an engineer who understands how to fix this. I hope to be proven wrong, though.

(This also explains why it didn’t crash in Debian 9 - that OS probably used an earlier version of the zlib, which was compatible with the code included in the Xojo framework).

Currently, the only work-around I can come up with would be to build my own libz from the public source, and rename all its functions, so that they do not clash with the ones inside the Xojo framework. Sigh.

BTW, I tried to report this with the Feedback.app (from 2018), but it doesn’t let me choose “Bug” - it always goes to “Beta Bug”, which this isn’t, and it seems to think I have a 2020r1 beta of Xojo installed, which I don’t think I have. Is that a known issue? I’m not going to debug that one.

Well… Report it as a “beta bug” and explain details in the text. They will set the correct labels.

Well, we run into that issue with our plugins years ago.
One of the changes is to link first each plugin with the libraries, make all symbols from libs non-public and then link the shared library. Annoying, but that is how the library loader works.

Christian, do you mean you ran into the issue because of public symbols in Xojo’s framework, like in this case?

And do you agree that it’s Xojo’s mistake to export these symbol names that clash with those in a system-provided library?