Abstract
Ollama is a popular tool to run large language models on your own machine or infrastructure. Since its first release (and support for two models) in Summer 2023 it went on to become a widely used tool supporting various state of the art open weights models. With a standardized API, it can serve as a testing ground for applications or alternative to other LLM service providers.
In this workshop we follow the question of what happens when you type a prompt into ollama in detail. We look at the different layers, preprocessing stages, customization and integration options, llama.cpp wrapping and text and structured output generation and will illustrate parts of the architecture with various code examples.