App Thread Continuous Loop or Timer or ?

I have a number of stand alone web apps running on Windows severs that do various things in the background and the Web UI is used primarily to display a transaction log and an exceptions log. Depending on what the app needs to do I have used either a Thread that gets created at application launch and runs a continuous Do loop with a sleep period, or alternatively a Timer that launches a new thread at a specific interval, does the processing and then goes out of scope. In both cases the sleep period or timer is usually set to 10 seconds. The apps generally work great and do what they were created to do, however, every once in a while from weeks to months, one of these apps will quit processing even though the UI is responsive. The only solution to get it it going again is to stop and restart the service.

In these apps since the timers and threads run at the app level they are not web timers or web threads. I do use web timers and web threads where appropriate for Web Session UI functions but not for an app level function to process files and folders or Posts coming through HandleSpecialURL.

I would like to make these services more resilient and I’m looking for Ideas I might try, like auto-restart at intervals, or some way to monitor that a thread or timer has died and notify support. Someone may have solved a similar problem so I’m just fishing the pool of brilliant minds here.

I think the place to start will be to look at the state-fullness of the applications concerned. If an application is stateless periodic refresh of the execution environment may be the easiest “fix”. Containerisation can help here. On the other hand, if the application is complex and state-full it might be better to componentise the app by transferring tasks to “worker” console apps, which can be discarded when done while the results (state) is managed centrally, to make the system more robust and help isolate problems. Otherwise state-full long-running processes can be monitored using OS monitoring tools (shell out to get the answers) to see if memory is leaking or cpu skyrocketing etc and go from there (e,g, alarms, graceful shut-downs etc.). But if all is well then anomalous input data (e.g. unexpected or undefined string encodings) can sometimes cause an issue to be intermittently triggered when certain values of data are involved. (I test for this last since to isolate the problem eliminating the others first is often necessary.) If that kind of thing is the real issue - which may often be masked by subtle interactions even between variables in your code - then auto-restart won’t help but simplifying applications with worker console apps may well help expose what’s really going on.

Good luck!

Thanks for the reply Eric! For the most part these apps are relatively stateless and not terribly complex. Some monitor folders and process XML files updating a database. Some wait for POSTs and process XML and then query other systems before updating a database or move files around. I’ll consider whether it makes sense to transfer any tasks to shell apps. I have done that before so that multiple compute intensive tasks could get their own CPU for processing but the simplicity of these apps didn’t really seem like candidates for that since their tasks usually take milliseconds to complete. Mostly they spend long times being idle with bursts of activity at various times of day or night. I’m more inclined to think that there are some subtle variable interactions that occasionally arise as you suggest or possibly some processing condition I have not anticipated that causes the app to continue to try to resolve but I have not yet discovered and can trap for.

I’m still pondering if there is a good way for me to reset the app state when idle to help eliminate it from ceasing to process and require a restart. I can easily have our local manufacturing NOC schedule a regular restart but that is seeing the problem as a nail and getting out the hammer to just bash it as opposed to tweezing out the real problem.

If recycling the app ‘fixes’ the problem - i.e. you’re absolutely sure there’s no data loss since processing of XML files always continues normally upon restart - a couple of things come to mind…

You might consider creating a periodic heart-beat in your processing app such as writing to a RAM drive or IP socket. Then have a console managing app launch your processing app plus monitor its health, and recycle the processing app when appropriate. Another way to go is for a manager app to monitor data throughput such as network activity to your processing app’s port or the deletion of XML files after processing. If this activity doesn’t happen within a reasonable time, the manager app recycles the processing app.

The reason why I would go this way with stateless short-run processes is because you may find there’s a very subtle issue in the interaction between your customer’s data, your code, any third-party libraries and/or the Xojo framework it self, which could take a very long time to find and the work-around may turn out to be more expensive / harder to implement anyway.

Therefore stateless / short-run processes are often considered more like cattle than pets. Within reason, DevOps practice will tend to recycle rather than nurse a sick app, provided there’s no risk of data loss when doing so.