shell.isRunning sometimes is incorrectly false?

Has anyone else noticed this? I have a shell subclass that uses a check for isRunning to know if it is OK to send messages to the helper process or not. For no reason that I can find sometimes isRunning is returning false when the shell is running just fine as I’m happily receiving packets from the shell program.

Once this happens even closing and restarting the shell process via a new execute command does not make it start to return True again. Then just as mysteriously some hours later it will show true correctly.

I’ve temporarily replaced my call to isRunning with a regular properly that I set to True when I execute the command and set to false in the Completed event.

This is an interactive shell but is otherwise not doing anything all that unusual. I don’t see anything in feedback or here about it being wrong. I thought for a while it was only when I was having to kill the sub process manually during development of the thing, but I am also seeing it on test machines who’s shells are never manually killed and I was not able to duplicate the problem in a test app by killing the sub process in any conceivable way. So i’m not sure what causes it. It can happen a few moments after the call to execute, or hours later.

Which platform? If Windows, have you set the Shell.Timeout to -1?

Sorry, typed too fast as I had to run a kid to school. This is on a Mac, tested on 3 different machines all running 10.13.4. I did not change any timeouts but I think that only affects windows, is that right?

Yes - the Timeout is a Windows thing.

This is very odd as I use shell’s and the IsRunning flag in hundreds of locations in multiple apps and this is one issue that I’ve never run into.

One thing to check is what Mac OS is doing with your helper app - check it in a Terminal with the ps tool using “ps auxh”. Look for your helper and then check the STAT column. I have seen my helpers zombied (Z stat) if they’ve say idle for too long, but the Shell’s IsRunning flag is still correct.

I can see that the isRunning boolean is false even while I’m receiving information from the child process. So I know without question that it’s actually running. But I will look for the Zombied info and see what else I can find.

I thought for a while that it had something to do with killing the shell and then restarting another process in it in the same block of code. Perhaps the Completed event fires after my code is complete and it thinks that it’s shutdown from that even though it has actually stared a new one. I can’t duplicate that though and I have verified that this isn’t whats happening on my test machines as there are no log entries about that happening. So this remains frustrating and odd. It definitely dod not used to do this until the last major update or 2 and it doesn’t always do it even now. The fact that the completed event fires after I’ve restarted another process means that my setting of my hack local variable did not work in those cases. The helper process i’m running communicates with the host program via a socket and so I can check to see if the socket is there and check it’s connected property to know if the shell is running or not and I can send data to it. I’m trying that now. It’s not an ideal solution as there is a delay between when the shell is started and when the helper makes the connection and errors can be logged in that time but at least things are staying working for longer periods of time now.

Just to clarify, what is Shell.mode set to?

This is an interactive shell. shell mode 2.

When you start the second call, are you reusing the same instance or are you creating a new one and replacing the one that you have?

re-using the same instance. But I don’t think thats the problem because it gets set to false even when it’s still running the same command just some time later. There is no evidence that the command failed or that another one was run in the same shell before it starts to be false. I THINK but have not proven that it drops the PID value as well. I replaced the isRunning check with one that just returned PID <> 0 but that also returned false. I am still futzing with it to try to understand fully.

What I’m thinking though is that if a command is already running, it doesn’t get destroyed until it actually reaches some sort of conclusion. So even if you reinitialize the property, it may be that the Xojo Framework is holding a reference until the command actually finishes. That would explain why the Completed event is firing at strange times for you.

and this turns out to be totally my fault. Or someones fault :wink: Or the cat walking across the keyboards fault :wink: some months ago I had refactored a routine that saved all the changed data from an edit window into the object and commented out all the old code at the end. At some point in time a single line of code far down into the commented section had been uncommented. I know it was commented originally because it didn’t start happening right away. The line that was left uncommented was one that wouldn’t have caused any traditional compile or runtime errors, it was just creating a new object based on the dictionary of data that I had created and already saved into the existing object. So it was creating a brand new object and inserting it into all my indices, but some other objects that had a direct reference to the shell were still calling into the one it had orphaned. When I started it up, the one in the index was started, but the objects that were trying to send messages to the shell still had a reference to the old one that had been mysteriously replaced and was honestly not running. I realized something similar must be happening when I added some debug code to output all the stats of the object and it was completely running, isRunning was true, it had a valid PID and everything yet somebutnotall of my objects were insisting that they couldn’t send a message because their shell wasn’t running. The only difference was that the ones that were working were looking up the object in the indexes by it’s name or ID and the ones that weren’t working were using a saved reference.

Found that single line of uncommented code 2 pages down into a section of commented out old code where nothing should have been running at all :wink:

Sorry for the false alarm :wink: I fat fingered that line back into life at some point and introduced a random problem that would only happen after you had edited the shells settings and only when accessing it from certain objects. Very frustrating!!