shell.isRunning sometimes is incorrectly false?

James_Sentman · May 22, 2018, 1:34pm

Has anyone else noticed this? I have a shell subclass that uses a check for isRunning to know if it is OK to send messages to the helper process or not. For no reason that I can find sometimes isRunning is returning false when the shell is running just fine as Im happily receiving packets from the shell program.

Once this happens even closing and restarting the shell process via a new execute command does not make it start to return True again. Then just as mysteriously some hours later it will show true correctly.

Ive temporarily replaced my call to isRunning with a regular properly that I set to True when I execute the command and set to false in the Completed event.

This is an interactive shell but is otherwise not doing anything all that unusual. I dont see anything in feedback or here about it being wrong. I thought for a while it was only when I was having to kill the sub process manually during development of the thing, but I am also seeing it on test machines whos shells are never manually killed and I was not able to duplicate the problem in a test app by killing the sub process in any conceivable way. So im not sure what causes it. It can happen a few moments after the call to execute, or hours later.

Tim_Jones · May 22, 2018, 2:09pm

Which platform? If Windows, have you set the Shell.Timeout to -1?

James_Sentman · May 22, 2018, 2:18pm

Sorry, typed too fast as I had to run a kid to school. This is on a Mac, tested on 3 different machines all running 10.13.4. I did not change any timeouts but I think that only affects windows, is that right?

Tim_Jones · May 22, 2018, 2:26pm

Yes - the Timeout is a Windows thing.

This is very odd as I use shell’s and the IsRunning flag in hundreds of locations in multiple apps and this is one issue that I’ve never run into.

One thing to check is what Mac OS is doing with your helper app - check it in a Terminal with the ps tool using “ps auxh”. Look for your helper and then check the STAT column. I have seen my helpers zombied (Z stat) if they’ve say idle for too long, but the Shell’s IsRunning flag is still correct.

James_Sentman · May 23, 2018, 5:10pm

I can see that the isRunning boolean is false even while Im receiving information from the child process. So I know without question that its actually running. But I will look for the Zombied info and see what else I can find.

James_Sentman · May 26, 2018, 12:09pm

I thought for a while that it had something to do with killing the shell and then restarting another process in it in the same block of code. Perhaps the Completed event fires after my code is complete and it thinks that its shutdown from that even though it has actually stared a new one. I cant duplicate that though and I have verified that this isnt whats happening on my test machines as there are no log entries about that happening. So this remains frustrating and odd. It definitely dod not used to do this until the last major update or 2 and it doesnt always do it even now. The fact that the completed event fires after Ive restarted another process means that my setting of my hack local variable did not work in those cases. The helper process im running communicates with the host program via a socket and so I can check to see if the socket is there and check its connected property to know if the shell is running or not and I can send data to it. Im trying that now. Its not an ideal solution as there is a delay between when the shell is started and when the helper makes the connection and errors can be logged in that time but at least things are staying working for longer periods of time now.

Greg_O_Lone · May 26, 2018, 1:20pm

Just to clarify, what is Shell.mode set to?

James_Sentman · May 27, 2018, 2:20pm

This is an interactive shell. shell mode 2.

Greg_O_Lone · May 27, 2018, 3:02pm

When you start the second call, are you reusing the same instance or are you creating a new one and replacing the one that you have?

James_Sentman · May 27, 2018, 7:36pm

re-using the same instance. But I dont think thats the problem because it gets set to false even when its still running the same command just some time later. There is no evidence that the command failed or that another one was run in the same shell before it starts to be false. I THINK but have not proven that it drops the PID value as well. I replaced the isRunning check with one that just returned PID <> 0 but that also returned false. I am still futzing with it to try to understand fully.

Greg_O_Lone · May 27, 2018, 8:57pm

What Im thinking though is that if a command is already running, it doesnt get destroyed until it actually reaches some sort of conclusion. So even if you reinitialize the property, it may be that the Xojo Framework is holding a reference until the command actually finishes. That would explain why the Completed event is firing at strange times for you.

James_Sentman · June 6, 2018, 7:56pm

and this turns out to be totally my fault. Or someones fault Or the cat walking across the keyboards fault some months ago I had refactored a routine that saved all the changed data from an edit window into the object and commented out all the old code at the end. At some point in time a single line of code far down into the commented section had been uncommented. I know it was commented originally because it didnt start happening right away. The line that was left uncommented was one that wouldnt have caused any traditional compile or runtime errors, it was just creating a new object based on the dictionary of data that I had created and already saved into the existing object. So it was creating a brand new object and inserting it into all my indices, but some other objects that had a direct reference to the shell were still calling into the one it had orphaned. When I started it up, the one in the index was started, but the objects that were trying to send messages to the shell still had a reference to the old one that had been mysteriously replaced and was honestly not running. I realized something similar must be happening when I added some debug code to output all the stats of the object and it was completely running, isRunning was true, it had a valid PID and everything yet somebutnotall of my objects were insisting that they couldnt send a message because their shell wasnt running. The only difference was that the ones that were working were looking up the object in the indexes by its name or ID and the ones that werent working were using a saved reference.

Found that single line of uncommented code 2 pages down into a section of commented out old code where nothing should have been running at all

Sorry for the false alarm I fat fingered that line back into life at some point and introduced a random problem that would only happen after you had edited the shells settings and only when accessing it from certain objects. Very frustrating!!