Running a local LLM directly from Xojo is now easier than ever thanks to the MBS Xojo Plugins and their integration with llama.cpp. With just a few lines of code, you can load a model, create a context, and generate text—all on-device, with optional GPU acceleration.
In this article, we’ll walk through the basics of setting up llama.cpp with Xojo and the MBS Plugin, and then examine a complete example that loads a model and asks it simple questions.
What Is llama.cpp?
llama.cpp is a high-performance C/C++ implementation for running LLaMA-family language models locally, optimized for CPUs and GPUs (Metal, CUDA, etc.). It is lightweight, fast, and ideal for on-device inference with small to medium-sized models.
The MBS Xojo Plugins provide a direct bridge between Xojo and llama.cpp, exposing model loading, context creation, sampling, and inference capabilities through the LlamaMBS, LlamaModelMBS, LlamaContextMBS, and related classes.
Requirements
To follow along, you will need:
- Xojo 2006r4 or newer
- Latest MBS Xojo Tools Plugin with llama.cpp support
- A compiled llama.cpp library:
- libllama.dylib on macOS
- libllama.dll on Windows
- libllama.so on Linux
- A GGUF model file (.gguf format)
For a lot of platforms, you find downloads on the llama.cpp release page.
Installation with Homebrew
On macOS you can install homebrew from their website. Then you can use brew to install the llama.cpp package:
brew install llama.cpp
This provides libllama.dylib inside your Homebrew cellar with e.g. this path
/opt/homebrew/Cellar/llama.cpp/6710/lib/libllama.dylib
If you have a newer version, the path will be different, but luckily you can use the path to the libs folder instead:
/opt/homebrew/lib/libllama.dylib
Step 1 — Loading the llama.cpp Library
Before interacting with any model, you must load the llama.cpp dynamic library:
If Not LlamaMBS.LoadLibrary("/opt/homebrew/Cellar/llama.cpp/6710/lib/libllama.dylib") Then
System.DebugLog LlamaMBS.LoadErrorMessage
Return 2
End If
For macOS, please pass full path to the dylib. For Linux you may just pass the file name, if the package manager installed it properly. Otherwise you pass the full path. On Windows you pass the name of the DLL. You may want to use SetDllDirectoryMBSfunction to set the folder with the DLL, so Windows can find all the related DLL files.
If the path is wrong or dependencies are missing, you’ll get a detailed error message from LoadErrorMessage. On Windows you may see error 193 if the architecture of the DLL doesn’t match the application or error 126 if either the path to the DLL is invalid or some dependency is not found.
Step 2 — Initialize the Backend
llama.cpp supports multiple compute backends: CPU, Metal (macOS/iOS), CUDA with Nvidia GPUs, ROCm / HIP for AMD GPUs, or Vulkan.
The MBS plugin can load all available ones:
// load dynamic backends
LlamaMBS.BackendLoadAll
This ensures GPU-accelerated layers are enabled if available.
Step 3 — Load the Model
You specify the path to your .gguf model file and configure parameters such as the number of GPU layers:
// path to the model gguf file
Var modelPath As String = "/Users/cs/Temp/test.gguf"
// number of layers to offload to the GPU
Var ngl As Integer = 99
// initialize the model
Var ModelParams As New LlamaModelParametersMBS
ModelParams.n_gpu_layers = ngl
Var model As New LlamaModelMBS(modelPath, ModelParams)
If your GPU supports it, offloading 20–100 layers can dramatically speed up inference. Otherwise, just set it to 0 for CPU-only execution.
Step 4 — Create the Context
The context manages the state of a conversation and the token buffer.
// initialize the context
Var contextParams As New LlamaContextParametersMBS
Var context As New LlamaContextMBS(model, contextParams)
If context.Handle = 0 Then
System.DebugLog "Failed to create context."
Return 3
End If
Each context is independent, so you can have multiple simultaneous sessions with the same model.
You may set properties in LlamaContextParametersMBS class before calling LlamaModelMBS constructor to set the parameters. For example n_ctx defines the context size.
Step 5 — Set Up a Sampler
In llama.cpp, samplers determine how tokens are selected.
For simple deterministic output, we can use a Greedy sampler:
// initialize the sampler
Var SampleParameters As New LlamaSamplerChainParametersMBS
SampleParameters.no_perf = True
Var smpl As New LlamaSamplerMBS(SampleParameters)
smpl.AddToChain( LlamaSamplerMBS.InitGreedy )
You could also add temperature sampling, top-p sampling, or multiple samplers chained together.
Step 6 — Ask the Model a Question
Once everything is initialized, generating text is as simple as:
System.DebugLog context.Ask(smpl, "Can you add 5 and 3 together?")
System.DebugLog context.Ask(smpl, "And now double?")
Each call feeds your prompt into the model, runs inference and returns the generated completion as a string. The output depends on the settings applied above and what the model is trained on.
You may also use LlamaSamplerMBS class to do the Ask method yourself. We have that as an alternative in the example project.
Complete sample code
Here is the complete, ready-to-run example:
// path to the model gguf file
Var modelPath As String = "/Users/cs/Temp/test.gguf"
// prompt to generate text from
Var prompt As String = "Hello my name is"
// number of layers to offload to the GPU
Var ngl As Integer = 99
// number of tokens to predict
If Not LlamaMBS.LoadLibrary("/opt/homebrew/Cellar/llama.cpp/6710/lib/libllama.dylib") Then
System.DebugLog LlamaMBS.LoadErrorMessage
Return 2
End If
// load dynamic backends
LlamaMBS.BackendLoadAll
// initialize the model
Var ModelParams As New LlamaModelParametersMBS
ModelParams.n_gpu_layers = ngl
Var model As New LlamaModelMBS(modelPath, ModelParams)
// initialize the context
Var contextParams As New LlamaContextParametersMBS
Var context As New LlamaContextMBS(model, contextParams)
If context.Handle = 0 Then
System.DebugLog "Failed to create context."
Return 3
End If
// initialize the sampler
Var SampleParameters As New LlamaSamplerChainParametersMBS
SampleParameters.no_perf = True
Var smpl As New LlamaSamplerMBS(SampleParameters)
smpl.AddToChain( LlamaSamplerMBS.InitGreedy )
System.DebugLog context.Ask(smpl, "Can you add 5 and 3 together?")
System.DebugLog context.Ask(smpl, "And now double?")
Conclusion
With only a handful of API calls, the MBS Xojo Plugins let you load llama.cpp models, run inference, and build fully local AI features directly into your Xojo applications. Whether you’re building chatbots, reasoning tools, or creative assistants, this integration gives you full control and zero cloud dependency.
Please try and let us know how well this works.
Example projects: Llama.zip