Options for URLConnection timeout when querying LLM

Ian_Jempson1 · May 5, 2025, 12:04pm

What options do I have when I’m getting URLConnection timeouts? I’m querying a local LLM and getting timeouts when using larger models with large context sizes and complex queries. Queries may take 5 minutes + to resolve.

My understanding is that my maximum timeout setting is 60seconds or 75 on Mac. If I ask for a streaming response from the LLM it will come back with the first response more quickly, but still not quickly enough in some cases. The use case involves sending 128k of data (well, really as much as the context window will take) to the LLM and asking it to perform multi-stage prompts that take a long time.

My Python app is taking 5-10 minutes on some queries to Gemma3 27b in Ollama. Running on 32GB Mac Studio with M2 Max.

I thought Xojo would provide a simple way to build a single executable with nice UI for the project, and so far that’s true. For most queries on smaller models it’s working nicely, but I know some users will have larger models and push the limits. A 70b model on a 128GB Mac notebook is going to be even slower.

To sum up, what are my options for long running queries to web-services?

Ian_Jempson1 · May 5, 2025, 2:12pm

looks like my answer is the same as is so often the case in FileMaker world. MBS Plugin seems to be the answer. I’ve just tested it on several queries taking a couple of minutes and works as expected.

With the 27b model, even the simplest queries will take 30 seconds or so.

MarkusR · May 5, 2025, 3:01pm

the send method have a default timeout argument of 60 seconds.

Gabriel_Ludosanu · May 5, 2025, 3:05pm

are you using llama.cpp? if so, I think the shell class might be better

Ian_Jempson1 · May 5, 2025, 3:40pm

At the moment I’m using Ollama for testing whether this would work in Xojo. I agree that using llama.cpp would probably be better.