LocalLLMGear

How to Run Llama Locally with Ollama (Step by Step)

By LocalLLMGear Editorial · Editorial Team · Updated 2026-06-28

We test hardware hands-on and may use AI tools in research — every guide is human-reviewed. Editorial policy.

We may earn a commission from links in this article, at no extra cost to you. Disclosure.

Running a capable LLM on your own machine is easier than most people expect. With Ollama you can go from nothing to chatting with Llama in a couple of minutes — no cloud, no API keys, fully private. Here’s the whole process.

The 30-second answer: Install Ollama, run ollama run llama3, and you’re chatting locally. That’s genuinely it for an 8B model on a decent GPU or a 16 GB+ Mac.

What you need

A GPU with enough VRAM (or an Apple Silicon Mac). If you’re not sure your hardware is up to it, check Best GPU for local LLMs or, for Macs, Apple Silicon for local LLMs.

Step 1 — Install Ollama

Download the installer from ollama.com for macOS, Windows or Linux (Linux: curl -fsSL https://ollama.com/install.sh | sh). It runs as a small background service.

Step 2 — Pull and run a model

One command downloads the model and starts a chat:

ollama run llama3

Swap llama3 for mistral, qwen, gemma or others. Quantized versions download by default so they fit common GPUs.

Step 3 — Use it (CLI, API, or a UI)

  • CLI: just type in the terminal session Ollama opens.
  • API: Ollama serves a local REST API at http://localhost:11434 — point your apps at it.
  • UI: install a front-end like Open WebUI for a ChatGPT-style interface.

Going further

To really understand prompting, fine-tuning and building on top of local models, structured courses help — these cover the fundamentals:

Ready to upgrade your hardware for bigger models? See Build a local LLM rig under $2,000.

Frequently asked questions

What hardware do I need to run Llama locally?+

An 8B model runs on any GPU with ~8 GB VRAM, or on an Apple Silicon Mac with 16 GB+. Bigger models need more VRAM — see our GPU and rig guides.

Is Ollama free?+

Yes, Ollama is free and open source. You only pay for the hardware (or cloud GPU) you run it on.

Can I run Llama without a GPU?+

Yes, on CPU — but it's slow for anything beyond small models. A modern GPU or an Apple Silicon Mac makes it genuinely usable.

Disclosure: some links above are affiliate links. See our affiliate disclosure.