Display wakes, app crashes

I’ve been trying to debug a seemingly random and infrequent failure and crash for a quite some time. The app runs 24/7, has been running for almost 3 years, and crashes randomly usually with an unhandled xml excecption. It crashes once every 4-8 weeks roughly. It currently is running on a Mac Mini. What it comes down to is that the display is set to sleep at about 15 minutes. The app will continue to run but when the display is woken up the app crashes. Always with an UnhandledXMLException.

In trying to chase down the XML issue I made a lot of changes, mostly to logging progress and defensive programming. The code that ran good enough for 3 years with widely dispersed crashes now couldn’t run for 2 hours. Then it occurred to me that the code may not be the problem (well it probably is in a way I don’t understand) and that it’s a hardware/OS/code issue.

The app works fine now as long as the display does not have to be woken up, it can be powered off and on. It needs to be the frontmost application. If I move it to the background it crashes.

This is an early app for me and I know there’s a lot of room for improvement in its design. I had a lot to learn about handling XML and other things so I just wanted to get something done.

I’d be happy to hear any suggestions. Thanks!

Whats in the Activate() event?

Nothing. I don’t have one in the app.

Debugging an app that crashes only every 4-8 weeks is lovely.

Can you make the app crash faster?
Do you have logging in place so that you can see what xml makes the app crash?
Why don’t you have a generic handler for the unhandled exception so that you get at least a stack trace?

Yes, it crashes as described above. When it’s not frontmost, when the display sleeps and wakes, and when logging in with remote access software.

Yes, I have extensive logging in place. I know exactly where it crashes.

I do have a generic handler for unhandled exceptions. I also have pretty good defensive programming in place.

I’m convinced this has to do with the app and it’s status/place in the OS and hardware operations.

I’m trying to learn more about the Activate and Deactivate app events but I don’t see it used in any of the example projects I’ve looked at.

What is the app doing before the crash?

Mostly it is processing XML. I’ve handled most all of these issues. There was a thread already running error but I think I fixed that as it hasn’t been around in a week or so. There were issues with connecting to one of the remote servers but that has gone away. That was never a problem in the past 2 years. I looped until I got a connection and reported the attempt tries. Sometimes 4 attempts were required.

Maddeningly it hasn’t crashed in about 36 hours.

I seem to recall having a similar issue some years ago, although not with XML processing. When the machine wakes the app would crash. All I can recall is that I haven’t seen it for a very long time, what version of Xojo are you using?

I am using 2016R3. The latest.

I did not have the UnhandledException setup correctly so was not getting the stack report. Now I do.

Would it be possible that after the exception is thrown and in the time before I see it on the display and click the OK button to shut down the app, which could be hours, that the timer could possibly kick off the thread again and it continues to run?

Did you know that you can return True from the UnhandledExceptiin event to allow the app to keep running?

I just wrote a logging mechanism in the app.UnhandledException event. It writes the error and stack trace to an error log. in my case I want the app to close because I have some errors that can’t be recovered from. As @Greg O’Lone mentioned, to prevent the app from quitting you can return true. I have a second program running that is connected via a TCP socket. When the ‘Server’ app quits, the secondary app restarts the server app.

Code in app.UnhandledException Event:

[code] if AppQuit or main.AppQuit then Return false

If error <> Nil Then
Dim type As String = Introspection.GetType(error).Name
main.AppQuit = True
AppQuit = True

if Main.MainDb <> nil then
  SQLPS.BindAll 2
  SQLPS.SQLExecute("Fatal Error: " + type + EndOfLine + EndOfLine + Join(error.Stack, EndOfLine),System.EnvironmentVariable("COMPUTERNAME"))
End If

'only show detail error information if debugging
#if DebugBuild then MsgBox(type + EndOfLine + EndOfLine + Join(error.Stack, EndOfLine))
return false

End If[/code]

Code to restart server (in timer.action event):

'If the server got disconnected, check see if there is anything we can do about it if not Connected then QBUpdateServer.Port = ServerPort QBUpdateServer.Address = ServerIP QBUpdateServer.Connect ServerError = ServerError + 1 if ServerError = 20 or ServerError = 40 or ServerError = 60 then 'give it three trys before giving up Dim count As Integer = System.NetworkInterfaceCount For i As Integer = 0 To count-1 if System.GetNetworkInterface(i).IPAddress = ServerIP Then 'try restarting the server since it's been down for 2 seconds, and we are on the same computer as the server Dim sh As New Shell dim p as String = Setting("MainDBPath").StringValue p = ReplaceAll(p,"main.db","QB Update Server.exe") 'don't restart the server if we are debuging #if not DebugBuild then sh.Execute(p) msgbox "Attempted to restart the server." exit For end if Next end if end if

No I did not know that even though I have it there in a new reworking of this. Originally I had what I must have copied out of the docs back in 2012.

Interesting though the current docs recommend against this. My last two lines of code are:

Quit Return True
Wouldn’t the Quit cancel the Return True?

Very interesting. I don’t have a server running but just an app. The app reaches out to one server and gets new or changed records and then updates another server. It’s the app that needs to be restarted. I’ll look into that as it would be a help.

The error handling is interesting. Why do you “return false”. To shut the app down?

[quote=299944:@Duane Mitchell]Very interesting. I don’t have a server running but just an app…
…It’s the app that needs to be restarted.
My server is just a desktop app with a TCPsocket. When the second desktop app looses it’s connection it first tries the reconnect. If it can’t It tries to launch the ‘server’ desktop app.

[quote] Dim sh As New Shell
dim p as String = Setting(“MainDBPath”).StringValue
p = ReplaceAll(p,“main.db”,“QB Update Server.exe”)
'don’t restart the server if we are debuging
#if not DebugBuild then sh.Execute§

Correct. In my case the crash is caused at random and comes from QuickBooks SDK. It’s an error that is not easily recovered from. I return false to close the app, and lean on my second app to restart the first app.

Note I have an AppQuit flag which I set to true. This prevents recursion into more than one instance of the event. I had a serious recursion problem when an error occurred in a timer.

I believe so. In my example I wanted the app to quit and added the quit statement ‘just incase’. I’m not sure I would have needed the ‘return false’ line.

Just noticed a flaw in this method. If the app crashed without throwing an error (which is possible but undesired), on some computers, a windows message pops up say the program encountered a problem. It seems that it waits for user input before completing the shut down of the program. It so happens that the socket is not disconnected until the shut down is complete. I suppose I could ping the socket and request replies from the ‘server’ which I assume it would be unable to do if it’s in the process of dying. Although I’m not sure it would have released the mutex (allowing it to be restarted). Anyone else have a solution to this scenario?

Is there a way to test this by causing this type of crash (with no errors/error handler not called)? Perhaps using windows declares?

I found the solution to this problem here. How to Disable Windows Error Reporting

Circling back on this…

The client organizes educational events every 4-8 weeks or so. The “random” error was occurring whenever event related data activity was heavy. I never knew what their schedule was so never looked at this. Thanks to the suggestions for detailed logging and defensive programming I was able to narrow my focus. Haven’t fixed it yet but at least I know where to look.