How to Run Llama Locally with Ollama (Step by Step)
By LocalLLMGear Editorial · Editorial Team · Updated 2026-06-28
We test hardware hands-on and may use AI tools in research — every guide is human-reviewed. Editorial policy.
We may earn a commission from links in this article, at no extra cost to you. Disclosure.
Running a capable LLM on your own machine is easier than most people expect. With Ollama you can go from nothing to chatting with Llama in a couple of minutes — no cloud, no API keys, fully private. Here’s the whole process.
The 30-second answer: Install Ollama, run
ollama run llama3, and you’re chatting locally. That’s genuinely it for an 8B model on a decent GPU or a 16 GB+ Mac.
What you need
A GPU with enough VRAM (or an Apple Silicon Mac). If you’re not sure your hardware is up to it, check Best GPU for local LLMs or, for Macs, Apple Silicon for local LLMs.
Step 1 — Install Ollama
Download the installer from ollama.com for macOS, Windows or Linux
(Linux: curl -fsSL https://ollama.com/install.sh | sh). It runs as a small background
service.
Step 2 — Pull and run a model
One command downloads the model and starts a chat:
ollama run llama3
Swap llama3 for mistral, qwen, gemma or others. Quantized versions download by
default so they fit common GPUs.
Step 3 — Use it (CLI, API, or a UI)
- CLI: just type in the terminal session Ollama opens.
- API: Ollama serves a local REST API at
http://localhost:11434— point your apps at it. - UI: install a front-end like Open WebUI for a ChatGPT-style interface.
Going further
To really understand prompting, fine-tuning and building on top of local models, structured courses help — these cover the fundamentals:
Ready to upgrade your hardware for bigger models? See Build a local LLM rig under $2,000.