Let’s say I want to download a model and run it locally. How the heck do I even get started with that in Xojo? I am not talking about generative ai.
Although I haven’t used them for a bit now, a couple good places to start would be…
Just plan on making sure you have tons of free space as the language models IIRC are generally 7+GB on up to tens, so you can quickly and easily plow through a hundred or more GB. Of course, having a fast Internet connection will help too.
This is all old news (6+ months), which in current AI timelines is ancient and others might have more recent developments on this front.
You can try with GitHub - ggml-org/llama.cpp: LLM inference in C/C++
LMStudio and you setup a local server using any of the model available in lmstudio
then it’s an htmlviewer or a tcpsocket
Create your server, call it from anywhere at your lab using the Ollama API
You will need “just” to create the Xojo client.
Oh duh. I guess that’s my 3am brain not connecting the dots. I was thinking “run locally” was “in process” when “on prem” is more practical.
Several ways to do it, however running ollama locally and accessing its web API works, although there is a lot of fiddling around to get the format correct, spaces seem to be an issue, however i think ollama may have fixed this in the latest release.
For the main online ones, there are detailed documented api calls.
I’ve got Microsoft’s Phi-4 model running in Xojo with declares, using the open source llama.cpp binaries.
Also got Whisper.cpp running to transcribe audio voice recordings to text. This is the same engine used by ChatGPT in their mobile app to convert voice prompts into text.
Got these models up and running on Windows, but it shouldn’t be too difficult to get it running on OSX though. You just need to download the OSX binaries and change the library names in the declares.
Here are the links to these projects for those who are interested…