What options do I have when I’m getting URLConnection timeouts? I’m querying a local LLM and getting timeouts when using larger models with large context sizes and complex queries. Queries may take 5 minutes + to resolve.
My understanding is that my maximum timeout setting is 60seconds or 75 on Mac. If I ask for a streaming response from the LLM it will come back with the first response more quickly, but still not quickly enough in some cases. The use case involves sending 128k of data (well, really as much as the context window will take) to the LLM and asking it to perform multi-stage prompts that take a long time.
My Python app is taking 5-10 minutes on some queries to Gemma3 27b in Ollama. Running on 32GB Mac Studio with M2 Max.
I thought Xojo would provide a simple way to build a single executable with nice UI for the project, and so far that’s true. For most queries on smaller models it’s working nicely, but I know some users will have larger models and push the limits. A 70b model on a 128GB Mac notebook is going to be even slower.
To sum up, what are my options for long running queries to web-services?