I’ve been trying to debug a seemingly random and infrequent failure and crash for a quite some time. The app runs 24/7, has been running for almost 3 years, and crashes randomly usually with an unhandled xml excecption. It crashes once every 4-8 weeks roughly. It currently is running on a Mac Mini. What it comes down to is that the display is set to sleep at about 15 minutes. The app will continue to run but when the display is woken up the app crashes. Always with an UnhandledXMLException.
In trying to chase down the XML issue I made a lot of changes, mostly to logging progress and defensive programming. The code that ran good enough for 3 years with widely dispersed crashes now couldn’t run for 2 hours. Then it occurred to me that the code may not be the problem (well it probably is in a way I don’t understand) and that it’s a hardware/OS/code issue.
The app works fine now as long as the display does not have to be woken up, it can be powered off and on. It needs to be the frontmost application. If I move it to the background it crashes.
This is an early app for me and I know there’s a lot of room for improvement in its design. I had a lot to learn about handling XML and other things so I just wanted to get something done.
Debugging an app that crashes only every 4-8 weeks is lovely.
Can you make the app crash faster?
Do you have logging in place so that you can see what xml makes the app crash?
Why don’t you have a generic handler for the unhandled exception so that you get at least a stack trace?
Mostly it is processing XML. I’ve handled most all of these issues. There was a thread already running error but I think I fixed that as it hasn’t been around in a week or so. There were issues with connecting to one of the remote servers but that has gone away. That was never a problem in the past 2 years. I looped until I got a connection and reported the attempt tries. Sometimes 4 attempts were required.
I seem to recall having a similar issue some years ago, although not with XML processing. When the machine wakes the app would crash. All I can recall is that I haven’t seen it for a very long time, what version of Xojo are you using?
I did not have the UnhandledException setup correctly so was not getting the stack report. Now I do.
Would it be possible that after the exception is thrown and in the time before I see it on the display and click the OK button to shut down the app, which could be hours, that the timer could possibly kick off the thread again and it continues to run?
I just wrote a logging mechanism in the app.UnhandledException event. It writes the error and stack trace to an error log. in my case I want the app to close because I have some errors that can’t be recovered from. As @Greg O’Lone mentioned, to prevent the app from quitting you can return true. I have a second program running that is connected via a TCP socket. When the ‘Server’ app quits, the secondary app restarts the server app.
Code in app.UnhandledException Event:
[code] if AppQuit or main.AppQuit then Return false
If error <> Nil Then
Dim type As String = Introspection.GetType(error).Name
main.AppQuit = True
AppQuit = True
if Main.MainDb <> nil then
SQLPS = Main.MainDb.Prepare("INSERT INTO ErrorLog(Value,User,Time) VALUES(?,?,DATETIME(CURRENT_TIMESTAMP,'LOCALTIME'))")
SQLPS.BindAll 2
SQLPS.SQLExecute("Fatal Error: " + type + EndOfLine + EndOfLine + Join(error.Stack, EndOfLine),System.EnvironmentVariable("COMPUTERNAME"))
End If
'only show detail error information if debugging
#if DebugBuild then MsgBox(type + EndOfLine + EndOfLine + Join(error.Stack, EndOfLine))
Quit
return false
End If[/code]
Code to restart server (in timer.action event):
'If the server got disconnected, check see if there is anything we can do about it
if not Connected then
QBUpdateServer.Port = ServerPort
QBUpdateServer.Address = ServerIP
QBUpdateServer.Connect
ServerError = ServerError + 1
if ServerError = 20 or ServerError = 40 or ServerError = 60 then
'give it three trys before giving up
Dim count As Integer = System.NetworkInterfaceCount
For i As Integer = 0 To count-1
if System.GetNetworkInterface(i).IPAddress = ServerIP Then
'try restarting the server since it's been down for 2 seconds, and we are on the same computer as the server
Dim sh As New Shell
dim p as String = Setting("MainDBPath").StringValue
p = ReplaceAll(p,"main.db","QB Update Server.exe")
'don't restart the server if we are debuging
#if not DebugBuild then sh.Execute(p)
msgbox "Attempted to restart the server."
exit For
end if
Next
end if
end if
Very interesting. I don’t have a server running but just an app. The app reaches out to one server and gets new or changed records and then updates another server. It’s the app that needs to be restarted. I’ll look into that as it would be a help.
The error handling is interesting. Why do you “return false”. To shut the app down?
[quote=299944:@Duane Mitchell]Very interesting. I don’t have a server running but just an app…
…It’s the app that needs to be restarted.
[/quote]
My server is just a desktop app with a TCPsocket. When the second desktop app looses it’s connection it first tries the reconnect. If it can’t It tries to launch the ‘server’ desktop app.
[quote] Dim sh As New Shell
dim p as String = Setting(“MainDBPath”).StringValue
p = ReplaceAll(p,“main.db”,“QB Update Server.exe”) 'don’t restart the server if we are debuging #if not DebugBuild then sh.Execute(p)
[/quote]
Correct. In my case the crash is caused at random and comes from QuickBooks SDK. It’s an error that is not easily recovered from. I return false to close the app, and lean on my second app to restart the first app.
Note I have an AppQuit flag which I set to true. This prevents recursion into more than one instance of the event. I had a serious recursion problem when an error occurred in a timer.
I believe so. In my example I wanted the app to quit and added the quit statement ‘just incase’. I’m not sure I would have needed the ‘return false’ line.
Just noticed a flaw in this method. If the app crashed without throwing an error (which is possible but undesired), on some computers, a windows message pops up say the program encountered a problem. It seems that it waits for user input before completing the shut down of the program. It so happens that the socket is not disconnected until the shut down is complete. I suppose I could ping the socket and request replies from the ‘server’ which I assume it would be unable to do if it’s in the process of dying. Although I’m not sure it would have released the mutex (allowing it to be restarted). Anyone else have a solution to this scenario?
Is there a way to test this by causing this type of crash (with no errors/error handler not called)? Perhaps using windows declares?
The client organizes educational events every 4-8 weeks or so. The “random” error was occurring whenever event related data activity was heavy. I never knew what their schedule was so never looked at this. Thanks to the suggestions for detailed logging and defensive programming I was able to narrow my focus. Haven’t fixed it yet but at least I know where to look.