Ollama

Use Ollama for quick local model serving with a simple CLI or Docker-based deployment.

Ollama uses GGUF models and supports GPU acceleration (CUDA, Metal, ROCm).

The official Ollama v0.17.0 (latest stable) from ollama.com fails with a missing tensor 'output_norm.weight' error on the lfm2moe architecture. This affects all LFM MoE models (e.g. LFM2-24B-A2B, LFM2-8A-A1B). To run any LFM MoE model you specifically need v0.17.1-rc0 or later.

Installation

macOS and Windows
Linux
Docker

Download directly from ollama.com/download.

curl -fsSL https://ollama.com/install.sh | sh

Run Ollama with GPU acceleration inside Docker containers:CPU only:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

NVIDIA GPU:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Then run a model:

docker exec -it ollama ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF

See the Ollama Docker documentation for more details.

Using LFM2 Models

Ollama can load GGUF models directly from Hugging Face or from local files.

Running GGUFs

You can run LFM2 models directly from Hugging Face:

ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF

See the Models page for all available GGUF repositories. To use a local GGUF file, first download a model from Hugging Face:

uv pip install huggingface-hub
hf download LiquidAI/LFM2.5-1.2B-Instruct-GGUF {quantization}.gguf --local-dir .

Replace {quantization} with your preferred quantization level (e.g., q4_k_m, q8_0). Then run the local model:

ollama run /path/to/model.gguf

Custom Setup with Modelfile

For custom configurations (specific quantization, chat template, or parameters), create a Modelfile.Create a plain text file named Modelfile (no extension) with the following content:

FROM /path/to/model.gguf

TEMPLATE """<|startoftext|><|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER temperature 0.1
PARAMETER top_k 50
PARAMETER repeat_penalty 1.05
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"

Import the model with the Modelfile:

ollama create my-model -f Modelfile

Then run it:

ollama run my-model

Basic Usage

Interact with models through the command-line interface.

Interactive Chat

ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF

Type your messages and press Enter. Use /bye to exit.

Single Prompt

ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF "What is machine learning?"

If you imported a model with a custom name using a Modelfile, use that name instead (e.g., ollama run my-model).

Serving Models

Ollama automatically starts a server on http://localhost:11434 with an OpenAI-compatible API for programmatic access.

Python Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF",
    messages=[
        {"role": "user", "content": "Explain quantum computing."}
    ],
    temperature=0.1,
    extra_body={"top_k": 50, "repeat_penalty": 1.05},
)
print(response.choices[0].message.content)

Curl request examples

Ollama provides two native API endpoints:Generate API (simple completion):

curl http://localhost:11434/api/generate -d '{
  "model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
  "prompt": "What is artificial intelligence?"
}'

Chat API (conversational format):

curl http://localhost:11434/api/chat -d '{
  "model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ]
}'

Vision Models

LFM2-VL GGUF models can also be used for multimodal inference with Ollama.

Interactive Chat with Images

Run a vision model directly and provide images in the chat:

ollama run hf.co/LiquidAI/LFM2.5-VL-1.6B-GGUF

In the interactive chat, you can ask questions about images using the /image command followed by the file path:

>>> /image path/to/image.jpg What's in this image?

Or provide the image path directly in your prompt:

>>> Describe the contents of ~/Downloads/photo.png

Using the API

from openai import OpenAI
import base64

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="not-needed"
)

# Encode image to base64
with open("image.jpg", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode("utf-8")

response = client.chat.completions.create(
    model="hf.co/LiquidAI/LFM2.5-VL-1.6B-GGUF",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
                {"type": "text", "text": "What's in this image?"}
            ]
        }
    ]
)
print(response.choices[0].message.content)

Model Management

List installed models:

ollama list

Remove a model:

ollama rm hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF

Show model information:

ollama show hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF

Getting Started

On-Device

GPU Inference

Tools

Installation

Using LFM2 Models

Running GGUFs

Basic Usage

Interactive Chat

Single Prompt

Serving Models

Python Client

Vision Models

Model Management

​Installation

​Using LFM2 Models

​Running GGUFs

​Basic Usage

​Interactive Chat

​Single Prompt

​Serving Models

​Python Client

​Vision Models

​Model Management

Installation

Using LFM2 Models

Running GGUFs

Basic Usage

Interactive Chat

Single Prompt

Serving Models

Python Client

Vision Models

Model Management