Hard crash with no exception - on a bug hunt!

Hamish_Symington · August 26, 2014, 4:01pm

Hi -
I’ve got a user with a machine which hard crashes - no exception - and I’m having trouble finding where it’s happening. The app was built with 2012 r2.1.
The crash stack is as follows:

Thread 2 Crashed: 0 libsystem_kernel.dylib 0x943bc952 __pthread_kill + 10 1 libsystem_pthread.dylib 0x9abcf167 pthread_kill + 101 2 libsystem_c.dylib 0x90154368 __abort + 187 3 libsystem_c.dylib 0x901542ad abort + 172 4 MBS Real Studio GraphicsMagick Plugin.rbx_0.dylib 0x06c57d4a MagickPanicSignalHandler + 58 5 libsystem_platform.dylib 0x917e0deb _sigtramp + 43 6 ??? 0xffffffff 0 + 4294967295 7 libsystem_c.dylib 0x9015429c abort + 155 8 libc++abi.dylib 0x97c0d6c9 abort_message + 169 9 libc++abi.dylib 0x97c2e47d default_terminate_handler() + 264 10 libc++abi.dylib 0x97c2bc30 std::__terminate(void (*)()) + 14 11 libc++abi.dylib 0x97c2b64b __cxa_throw + 116 12 libc++abi.dylib 0x97c2be74 operator new(unsigned long) + 100 13 rbframework.dylib 0x02b58a38 0x2b10000 + 297528 14 rbframework.dylib 0x02b54867 0x2b10000 + 280679 15 com.lightbluesoftware.lightblue 0x01f5fa7f externalProgramsUpdateThread.publishToGoogleCalendar%%o<externalProgramsUpdateThread> + 7647 16 com.lightbluesoftware.lightblue 0x01eeee6a externalProgramsUpdateThread.Event_Run%%o<externalProgramsUpdateThread> + 4290 17 com.lightbluesoftware.lightblue 0x0106a88f ourThread.Event_Run%%o<ourThread> + 102 18 rbframework.dylib 0x02b365a1 0x2b10000 + 157089 19 libsystem_pthread.dylib 0x9abce5fb _pthread_body + 144 20 libsystem_pthread.dylib 0x9abce485 _pthread_start + 130 21 libsystem_pthread.dylib 0x9abd3cf2 thread_start + 34

and the console around that time is this:

26/08/2014 11:26:10.827 GoogleSoftwareUpdateDaemon[6469]: -[KSUpdateEngine(PrivateMethods) updateFinish] KSUpdateEngine update processing complete. 26/08/2014 11:26:13.136 GoogleSoftwareUpdateDaemon[6469]: -[KeystoneDaemon logServiceState] GoogleSoftwareUpdate daemon (1.1.0.3659) vending: com.google.Keystone.Daemon.UpdateEngine: 1 connection(s) com.google.Keystone.Daemon.Administration: 0 connection(s) 26/08/2014 11:26:38.395 GoogleSoftwareUpdateDaemon[6469]: -[KeystoneDaemon main] GoogleSoftwareUpdateDaemon inactive, shutdown. 26/08/2014 11:33:58.131 WindowServer[90]: device_generate_desktop_screenshot: authw 0x0(0), shield 0x7fe2f3d67ad0(2001) 26/08/2014 11:33:58.181 WindowServer[90]: device_generate_lock_screen_screenshot: authw 0x0(0), shield 0x7fe2f3d67ad0(2001) 26/08/2014 11:38:57.074 nbagent[2678]: XPC Activity invoked with state=2 26/08/2014 11:38:59.392 nbagent[2678]: XPC Activity invoked with state=2 26/08/2014 11:43:57.977 Light Blue[6171]: Light Blue(6171,0xb6c27000) malloc: *** mach_vm_map(size=28835840) failed (error code=3) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug 26/08/2014 11:43:59.984 com.apple.launchd.peruser.501[171]: (com.lightbluesoftware.lightblue.27664[6171]) Job appears to have crashed: Abort trap: 6 26/08/2014 11:44:03.115 ReportCrash[6493]: Saved crash report for Light Blue[6171] version 5.1 b5 (5.1.0.2.5) to /Users/Sandra/Library/Logs/DiagnosticReports/Light Blue_2014-08-26-114403_24.crash 26/08/2014 11:44:03.395 gkbisd[151]: Unable to collect cdhash for /System/Library/CoreServices/Problem Reporter.app (error code 100024) 26/08/2014 11:44:04.220 gkbisd[151]: Unable to collect cdhash for /Applications/Microsoft Office 2011/Office/Office365Service.app (error code 100024) 26/08/2014 11:51:03.311 nbagent[2678]: XPC Activity invoked with state=2 26/08/2014 11:51:15.071 xpcproxy[6521]: assertion failed: 13E28: xpcproxy + 3438 [D559FC96-E6B1-363A-B850-C7AC9734F210]: 0x2 26/08/2014 11:55:04.370 nbagent[2678]: XPC Activity invoked with state=2

Can anyone tell me what sort of thing would call a malloc problem? If not, I’m going to have to put in vast amounts of logging into the method, to work out exactly where it’s happening - tedious, but it may prove useful.

100 Internets will be awarded to anyone who can help…

Thanks,

Hamish

Tim_Parnell · August 26, 2014, 4:42pm

It may be the Graphics plugin.
Are your images nearing 3.5GB by chance?

Hamish_Symington · August 26, 2014, 4:44pm

It’s very unlikely; we’re not doing much stuff with images. The user’s also reported that they start the app, wait for a while (indeterminate amount of time) and then it happens - so it’s not like we’re loading in gigs of tifs.
Christian by email said this:

[quote]externalProgramsUpdateThread.publishToGoogleCalendar calls some runtime function which allocates memory. That causes an abort which is catcher by the GraphcisMagick library’s signal handler.
GraphicsMagick catches those signals to delete temp files just before app crashes.[/quote]
You are hereby awarded one internet for being the first to reply…

Dirk_Cleenwerck · August 27, 2014, 8:55am

I got something similar this week and am still narrowing it down.
Are you using a class from a thread? You crash report says thread two crashed.
I had the problem when I was using a class from a thread.
The thread was running in a window, the class was external to the window. When I closed the window down, the framework seemed (from the crash report) to try to kill the thread, at which time I got the hard crash.
I moved all the elements from my class to inside the window and the crash disappeared.
I kept a copy of the version that crashed in the hope that I might narrow it down and file a feedback report.
Maybe you have a similar problem. It’s definitely not exactly the same.

Hamish_Symington · August 27, 2014, 9:43am

I’m not sure that classes are going to be the problem, Dirk - there’s no closing of windows going on at all here, as the user was saying that the app was just sitting there not doing anything, then quit. All rather mysterious.

Dirk_Cleenwerck · August 27, 2014, 9:51am

I’m not necessarily sure that the closing window was the problem. I think the problem happens when the thread somehow runs into a situation where it gets an error and then instead of handling it (by raising an exception) hard crashes. I think my error happened when the thread was getting a kill signal (by closing the window in my case) in the middle of processing something that might or might no longer exist at that point in time. I have exception handlers in place and even tried try/catch but they don’t catch anything. I only get a hard crash.
Is there anything in your thread that might go out of scope?
Anyways, I’m not sure at all if our problem is similar, but just trying to get you to double check to eliminate the possibility that it might be something like what I had.
PS: my thread gets started from a timer once per minute. You are not starting the thread from a timer? It could explain the sitting there and then quitting.

Eli_Ott · August 27, 2014, 9:53am

[code]11 libc++abi.dylib 0x97c2b64b __cxa_throw + 116
12 libc++abi.dylib 0x97c2be74 operator new(unsigned long) + 100
I would say that the unsigned long argument is nil, which throws the c++ exception on the next line.

Christian_Schmitz · August 27, 2014, 10:01am

GraphicsMagick catched the signal only to close temp files.

Christian_Schmitz · August 27, 2014, 10:02am

@Eli: The unsigned long is right there.

You see in log the message “mach_vm_map(size=28835840)”, so it tries to allocate 28 MB and there is no free space on heap, so new raises exception via __cxa_throw c++ runtime function.