The official Get started guide says a good place to start is Gemma 4 26B A4B for many tasks.
Choose the right Gemma 4 model, runtime, and hardware path before you pull weights.
This homepage is optimized like a local deployment dashboard, not a launch recap. Start from your device, your VRAM budget, and your workflow, then jump straight to the runtime, download source, and command path that makes sense.
E4B is the easiest local recommendation when you want a capable first run on laptop-class hardware.
31B is the dense quality-first model for stronger coding, reasoning, and long-context work.
Gemma 4 models by fit, not hype
These four cards are the real homepage. They tell visitors who each model is for, what memory range to expect, and which runtime path makes the most sense first.
Memory ranges below use the official Q4_0 to BF16 numbers from the Google AI for Developers model overview.
Edge-first and smallest footprint
Best for phones, browsers, Raspberry Pi style experiments, and the cases where getting Gemma 4 onto the device matters more than maximum depth.
The safest first local choice
Best for laptop users who want the smoothest first run in LM Studio or Ollama without dropping all the way to the smallest model.
The official default for many tasks
Best for consumer GPUs and builders who want a real step up in reasoning without going straight to the heaviest dense model.
Flagship quality when hardware allows
Best for workstations, deeper coding workflows, document-heavy reasoning, and anyone explicitly optimizing for stronger output over lighter setup.
Open the right Gemma 4 path the first time
Users searching “Gemma 4 download” usually do not want a long explanation. They want to know whether they should open weights, a GUI runtime, a CLI runtime, or a hosted sandbox.
Official sources first, then runtime shortcuts.
Hugging Face
Best when you want the canonical weights, model cards, and the broadest handoff into the open-model tooling ecosystem.
Open Hugging FaceKaggle
Best when you want the official Google distribution path and a direct bridge into notebooks, experiments, and model assets.
Open KaggleOllama
Best when you want the shortest local command line path. This is the most common first stop for self-hosted users.
Open OllamaLM Studio
Best when you want a desktop app, model browser, and the easiest non-terminal first run on a laptop or workstation.
Open LM StudioGoogle AI Studio
Best when you want to test 26B A4B or 31B before investing in local setup or deciding how much model quality you really need.
Open AI StudioGemini API
Best when your real goal is application development with a hosted endpoint, not local inference on your own machine.
Open Gemini API docsCheck the official memory budget before you choose a runtime
This table is the page’s most practical anchor because it answers the question that blocks real adoption: can your machine carry the model you want, at the precision you want, with enough headroom left for context and runtime overhead?
The official docs say these numbers cover the static weights only. KV cache and long prompts still add more VRAM.
| Model | BF16 | SFP8 | Q4_0 | Use this as |
|---|---|---|---|---|
| E2B | 9.6 GB | 4.6 GB | 3.2 GB | Smallest path for phones, edge, and browser-oriented experiments. |
| E4B | 15 GB | 7.5 GB | 5 GB | Best laptop-class starting point when you want credible local performance fast. |
| 31B | 58.3 GB | 30.4 GB | 17.4 GB | Flagship dense model for quality-first reasoning and coding. |
| 26B A4B | 48 GB | 25 GB | 15.6 GB | Officially recommended good place to start for many tasks when you can afford the extra memory. |
Fast hardware rule of thumb
Under 6 GB points you toward E2B or E4B at lighter precision. Around 16 GB makes 26B A4B realistic only at aggressive quantization. Around 18 GB and above opens the door to a Q4 31B test. Beyond that, your decision shifts from “can it fit?” to “which quality tier is worth the cost?”
What changes the real answer
Context length, KV cache, serving stack, and prompt size all move actual usage. If you are near the limit, test the runtime early instead of trusting a base-weight number as a production guarantee.
Copy the shortest Gemma 4 starting points
This section is intentionally biased toward action. If a visitor already knows the model they want, the fastest way to keep them on the page is to let them copy something useful immediately.
Check the current runtime catalog before production use. Launch support changes quickly.
Run locally with Ollama
Best for the shortest local CLI path across the full family.
ollama run gemma4:e2b
ollama run gemma4:e4b
ollama run gemma4:26b
ollama run gemma4:31b
Test hosted access with Python
Best when you want to try Gemma 4 quickly through the Gemini API before self-hosting.
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
response = client.models.generate_content(
model="gemma-4-31b-it",
contents="Roses are red..."
)
print(response.text)
Call Gemma 4 over REST
Best when you want the raw hosted request path without installing an SDK first.
curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-31b-it:generateContent?key=YOUR_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{
"parts": [{"text": "Roses are red..."}]
}]
}'
Next after quick start
Once Gemma 4 is running, your next question is usually not “what is Gemma 4?” anymore. It becomes “should I move up a model size, stay local, or shift to a hosted API?” That is why the two most important follow-up pages are model sizes and pricing/license.