Hi all,
Playing with Shell and huge amount of data, I’m wondering what size is the Shell buffer.
Does anybody have the answer? And better, does anybody know how to increase this size (if possible)?
TIA for any idea or suggestion.
Hi all,
Playing with Shell and huge amount of data, I’m wondering what size is the Shell buffer.
Does anybody have the answer? And better, does anybody know how to increase this size (if possible)?
TIA for any idea or suggestion.
If you control the shell process, you can return the data in chunks. If you don’t, you can save/read from a temp file.
Otherwise, I don’t think you have any control.
The original question remains, though. Is the buffer’s size fixed among all systems (e.g. set by the framework) or is it dependent on the amount of installed RAM? Can it be calculated, if it’s not fixed?
In other words, what sets this limit?
You’re right. But what is the best way to control the shell process?
Timer with ReadAll and Poll, and, in this case, what period for the Timer?
Or DataAvailable and Completed events?
I’m talking about Interactive Shell and 1 million, and even 1 billion characters of text to get from it.
Is the process running in the Shell something you created and can change?
A lot of interesting questions! Any answers in the area?
I too am interested in the size of the buffer, and it’s frustrating that the only response is that of trying to change directions.
Does anyone have actual knowledge about this limit?
@Kem_Tekinay No, Kem. Actually, the process is Pari/GP tool compiled for Macintosh. And huge data is a result of, i.e., 1 million decimals of Pi or several millions digits of 100,000!.
Running tests between my “FrontEnd” interface using Xojo Shell and the native Mac Shell, it seems like my code is two times slower than native Shell for these results.
With less digits (let say 10,000 or 100,000), it’s only a few seconds slower.
I’m filling a MemoryBlock of 100,000,000 bytes with a Timer of 1 ms period, until the 2 last characters are either "? " (end of the command) or "> " (an error happended). Then I display the MemoryBlock as String in a DesktopTextArea.
A bit of googling finds you that the size of the pipe on Linux is 65536 bytes while macOS only uses 16384 bytes (small) or 65536 bytes (Big).
I’d love to have a property to increase it in our ShellMBS class, but I can’t find a way to change it.
I downloaded Pari/GP via MacPorts, then wrote a simple desktop app. The window has an embedded interactive Shell and, on Open, calls GPShell.Execute( "/path/to/gp" )
. I have a button that sends the contents of a TextField to the Shell via WriteLine
, and implemented the Shell’s DataAvailable
to append to a TextArea via me.ReadAll
.
The upshot is that the code didn’t seem to have any trouble keeping up with 100000! without any calls to Poll
or using a Timer.
(My result is 456,574 digits, but I have no way of confirming that’s the right number and confirmed the result against the result from GP via Terminal.)
@Christian_Schmitz Good to know, thanks for this information, Christian.
Because my tests show that I’m getting from 2,000 to 3,000 bytes with each ReadAll, I now know that this code is highly inefficient and can (must) be enhanced.
What do you mean? From 16384 min up to 65536 bytes, or the system selects small or big pipe and in this case, based on what?
@Kem_Tekinay Thanks kem for spending some time coding about that. It seems that DataAvailable is more accurate than a Timer-based solution. I will re-code this part of my app and let the community know what happens.
@Kem_Tekinay Well, Kem, the problem is not the value of the result (it’s good and the same as in the Terminal), nor the calculation time (about the same), but the time to display the result!
Here are my results on a iMac 2020, Mac OS13.5, 64 GB RAM:
Terminal Timer code DataAvailable code
100K! Unmeasurable 2s 9s
1M! 2s 10s 105s
Do you get same results, at least with DataAvailable code?
Yes, but I wasn’t trying to optimize it. I would keep adding chunks to an array until the ? comes in, then set the try in one shot.
I wonder if setting that much text into a TextArea is going to be a bottleneck. Also, make sure you DefineEncoding on the text as you receive it.
Getting back 1,000,000 digits, without displaying them or even keeping them, takes about 18s here.
I will have to try writing/reading from a temp file instead, probably tomorrow.
Documentation seems to suggest, it uses small pipe if you have small writes and big one for big chunks. Anyway, no way to control.
@Tim_Hare Thanks Tim, already done and with Latin language (I’m French), it’s a nightmare… Accented and special characters are coded differently based on the Text Encoding, and Text files may be UTF-8 (ideally), Mac OS Roman, Windows ANSI, …
I discovered, this night, trying to increase the speed of my code, that managing Text Encoding in the DataAvailable version of the code slows it by almost 10x, with 105" for 1M! against 11".
Nevertheless, in the Timer version, time is 10" with or without Text Encoding.
My opinion is that DataAvailable event runs in the thread (or something like that) of the Shell, leading to slow execution, wheras Timer runs in the main thread without impacting Shell “thread”.
Hi all,
After getting:
Terminal Timer code DataAvailable code
100K! Unmeasurable 2s 9s
1M! 2s 10s 105s
I tried a lot of possibilities and finally got these results with a MemoryBlock of 10 Millions bytes (instead of 100 Millions):
# Digits Terminal Timer code DataAvailable code
100K! 456,574 Unmeasurable 1s 1s
1M! 5,565,769 2s 10s 10s
10M! 65,657,060 46s 136s 135s
100M! 756,570,557 12min 45s 30min 30s "Failed to resize
String Handle" error
after about 1hour
Times are the same with a MB of only 1 Million bytes. Please note that “DataAvailable” code uses a StringHandleMBS class, sized to 10 Million bytes. Speed of StringHandleMBS vs MemoryBlock is the same.
What is very interesting is time ratio between Terminal and Xojo Code, that is 5x for 1M! and more than 8x for 10M! in the first tests, whereas it’s only about 3x for 1M! and 10M!, and even less (about 2.4x) for 100M! in the last tests.
My conclusion is that Timer is a more efficient way to manage huge amount of data coming from a Shell in Interactive mode, at least in the case of my App. But I think it’s likely true for all cases.
In addition, it’s of no interest to use a too big MB, 1 or 10 millions of bytes is enough. Of course, one has to resize the MB if needed.
Thanks to all replyers, don’t hesitate to reply if you’re not ok with my conclusion, or you have more information to share.