Official launch date: April 2, 2026 // terminal ops runbook

Choose the right Gemma 4 model, runtime, and hardware path before you pull weights.

This homepage is optimized like a local deployment dashboard, not a launch recap. Start from your device, your VRAM budget, and your workflow, then jump straight to the runtime, download source, and command path that makes sense.

Open model matrix Open download hub

Apache 2.0 140+ languages 128K / 256K context Text + image Audio on E2B / E4B Ollama / LM Studio

Runtime selector

Choose a deployment path before you download anything.

Start from hardware and outcome. The recommendation updates instantly and points to the most sensible first runtime, not the loudest model name.

What are you running on? Device profile

What matters most? Priority

Recommended first setup

Gemma 4 E4B + LM Studio

Start with E4B when you want the easiest credible local run before stepping into 26B A4B or 31B.

Use E4B for laptop-friendly local testing.
Move to 26B A4B when you want a stronger default reasoning model.
Use LM Studio or Ollama for the fastest first run.

Ops default

26B A4B

The official Get started guide says a good place to start is Gemma 4 26B A4B for many tasks.

Fastest local

E4B

E4B is the easiest local recommendation when you want a capable first run on laptop-class hardware.

Top-end quality

31B

31B is the dense quality-first model for stronger coding, reasoning, and long-context work.

Model cards

Gemma 4 models by fit, not hype

These four cards are the real homepage. They tell visitors who each model is for, what memory range to expect, and which runtime path makes the most sense first.

Memory ranges below use the official Q4_0 to BF16 numbers from the Google AI for Developers model overview.

E2B

Edge-first and smallest footprint

Best for phones, browsers, Raspberry Pi style experiments, and the cases where getting Gemma 4 onto the device matters more than maximum depth.

3.2 GB to 9.6 GB 128K context Audio support

E4B

The safest first local choice

Best for laptop users who want the smoothest first run in LM Studio or Ollama without dropping all the way to the smallest model.

5 GB to 15 GB 128K context Audio support

26B A4B

The official default for many tasks

Best for consumer GPUs and builders who want a real step up in reasoning without going straight to the heaviest dense model.

15.6 GB to 48 GB 256K context MoE

31B

Flagship quality when hardware allows

Best for workstations, deeper coding workflows, document-heavy reasoning, and anyone explicitly optimizing for stronger output over lighter setup.

17.4 GB to 58.3 GB 256K context Dense

Download hub

Open the right Gemma 4 path the first time

Users searching “Gemma 4 download” usually do not want a long explanation. They want to know whether they should open weights, a GUI runtime, a CLI runtime, or a hosted sandbox.

Official sources first, then runtime shortcuts.

Official weights

Hugging Face

Best when you want the canonical weights, model cards, and the broadest handoff into the open-model tooling ecosystem.

Open Hugging Face

Official distribution

Kaggle

Best when you want the official Google distribution path and a direct bridge into notebooks, experiments, and model assets.

Open Kaggle

Fastest CLI

Ollama

Best when you want the shortest local command line path. This is the most common first stop for self-hosted users.

Open Ollama

Fastest GUI

LM Studio

Best when you want a desktop app, model browser, and the easiest non-terminal first run on a laptop or workstation.

Open LM Studio

Hosted test

Google AI Studio

Best when you want to test 26B A4B or 31B before investing in local setup or deciding how much model quality you really need.

Open AI Studio

Hosted build path

Gemini API

Best when your real goal is application development with a hosted endpoint, not local inference on your own machine.

Open Gemini API docs

Memory planner

Check the official memory budget before you choose a runtime

This table is the page’s most practical anchor because it answers the question that blocks real adoption: can your machine carry the model you want, at the precision you want, with enough headroom left for context and runtime overhead?

The official docs say these numbers cover the static weights only. KV cache and long prompts still add more VRAM.

Model	BF16	SFP8	Q4_0	Use this as
E2B	9.6 GB	4.6 GB	3.2 GB	Smallest path for phones, edge, and browser-oriented experiments.
E4B	15 GB	7.5 GB	5 GB	Best laptop-class starting point when you want credible local performance fast.
31B	58.3 GB	30.4 GB	17.4 GB	Flagship dense model for quality-first reasoning and coding.
26B A4B	48 GB	25 GB	15.6 GB	Officially recommended good place to start for many tasks when you can afford the extra memory.

Fast hardware rule of thumb

Under 6 GB points you toward E2B or E4B at lighter precision. Around 16 GB makes 26B A4B realistic only at aggressive quantization. Around 18 GB and above opens the door to a Q4 31B test. Beyond that, your decision shifts from “can it fit?” to “which quality tier is worth the cost?”

What changes the real answer

Context length, KV cache, serving stack, and prompt size all move actual usage. If you are near the limit, test the runtime early instead of trusting a base-weight number as a production guarantee.

Quick commands

Copy the shortest Gemma 4 starting points

This section is intentionally biased toward action. If a visitor already knows the model they want, the fastest way to keep them on the page is to let them copy something useful immediately.

Check the current runtime catalog before production use. Launch support changes quickly.

Ollama

Run locally with Ollama

Best for the shortest local CLI path across the full family.

ollama run gemma4:e2b
ollama run gemma4:e4b
ollama run gemma4:26b
ollama run gemma4:31b

Python API

Test hosted access with Python

Best when you want to try Gemma 4 quickly through the Gemini API before self-hosting.

from google import genai

client = genai.Client(api_key="YOUR_API_KEY")
response = client.models.generate_content(
    model="gemma-4-31b-it",
    contents="Roses are red..."
)
print(response.text)

REST API

Call Gemma 4 over REST

Best when you want the raw hosted request path without installing an SDK first.

curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-31b-it:generateContent?key=YOUR_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [{
      "parts": [{"text": "Roses are red..."}]
    }]
  }'

Next after quick start

Once Gemma 4 is running, your next question is usually not “what is Gemma 4?” anymore. It becomes “should I move up a model size, stay local, or shift to a hosted API?” That is why the two most important follow-up pages are model sizes and pricing/license.

Where to go deeper