Google DeepMind Unveils Gemma 4, an Apache 2.0 Open Model Family Built for On‑Device Agents

Key Takeaways

Google DeepMind has launched Gemma 4, a new family of open models designed to run directly on user hardware. This includes phones, desktops, and IoT boards.
The models are released under an Apache 2.0 license, offering full commercial freedom. They target advanced reasoning and multi-step agentic workflows in over 140 languages.
Gemma 4 supports up to a 128K context window on-device via LiteRT‑LM. It processes about 4,000 input tokens across 2 skills in under 3 seconds on optimized GPUs.
Edge performance targets include running a Gemma 4 E2B model in <1.5 GB of memory. It can reach up to 3,700 prefill / 31 decode tokens per second on Qualcomm Dragonwing IQ8 NPUs.

Quick Recap

Google DeepMind has officially announced Gemma 4, its latest family of open AI models purpose-built for advanced reasoning and agentic workflows on user-owned hardware. The launch post on the Google Developers blog confirms that Gemma 4 is available under the Apache 2.0 license and can run across mobile, desktop, web, and edge devices. This enables multi-step planning, offline code generation, and audio-visual processing directly on-device. The announcement was amplified via DeepMind’s social chan

nels. Therefore, Gemma 4 is positioned as a state-of-the-art open alternative to closed frontier systems.

Meet Gemma 4: our new family of open models you can run on your own hardware.

Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵 pic.twitter.com/u19GbEIoLJ
— Google DeepMind (@GoogleDeepMind) April 2, 2026

Built for Edge Agents, Not Just Chatbots

Gemma 4 is framed less as a generic chatbot model and more as an agentic runtime that can plan, reason, and act using tools across a wide range of devices. The family includes small “edge” variants like E2B and E4B, which can run under 1.5 GB of memory using LiteRT‑LM’s 2‑bit and 4‑bit quantization. In addition, there are larger models with extended 128K token context windows for complex multi-step workflows. Native function calling, structured JSON output, and system-instruction support mean Gemma 4 is optimized for building agents that chain skills—such as querying Wikipedia, generating visualizations, or orchestrating other media models—without leaving the device. Google’s AI Edge stack ties this together with Android’s AICore, iOS and desktop runtimes, Raspberry Pi 5, and Qualcomm Dragonwing IQ8 NPUs. As a result, the company promises sub‑3‑second processing for 4,000‑token multi-skill prompts and multi-thousand token-per-second throughput on NPUs.

Why This Open Release Matters Now?

Gemma 4 arrives as demand surges for on-device, privacy-preserving AI that still approaches cloud-scale intelligence. By shipping an Apache 2.0–licensed family with advanced reasoning and agentic capabilities, Google is pushing directly into the territory currently occupied by meta-open ecosystems like Llama. It also enters the space of proprietary mid-tier models that charge per-token for similar workflows. The positioning is strategic: developers get a fully open, commercially usable stack that can run everything from local coding assistants to multimodal mobile agents. Meanwhile, Google reinforces Android, WebGPU, and its Edge tooling as the default rails for this new class of AI workloads.

Competitive Landscape & Comparison Table

For this launch, the most relevant peers are Meta’s Llama 3.1 8B (a strong open model used on-device and in self-hosted setups) and Mistral’s Mixtral 8x7B (a popular open Mixture-of-Experts model optimized for efficiency and reasoning).

Feature/Metric	Gemma 4 (Subject)	Llama 3.1 8B (Competitor A)	Mixtral 8x7B (Competitor B)
Context Window	Up to 128K tokens via LiteRT‑LM on-device.	Typically around 128K tokens in latest releases.	Around 32K–64K tokens in common deployments.
Pricing per 1M Tokens	Open, Apache 2.0; self-hosted infra cost only.	Open, Apache-style; infra cost only.	Open, Apache-style; infra cost only.
Multimodal Support	Built-in audio-visual processing across all sizes.	Primarily text; multimodal requires separate add-ons.	Primarily text; multimodal via external models.
Agentic Capabilities	Native tools, structured JSON, multi-skill agents on-device.	Tool use supported via frameworks, not built-in edge stack.	Strong reasoning; agentic features via third-party orchestration.

From a strategic standpoint, Gemma 4 appears to “win” on out-of-the-box agentic capabilities and multimodal support, especially for edge devices. Its LiteRT‑LM optimizations and Android AICore integration provide a highly integrated path. Llama and Mixtral remain strong choices for general self-hosted text workloads and have broader existing community ecosystems. However, Gemma 4 narrows that gap by combining open licensing with a vertically integrated edge stack.

Sci-Tech Today’s Takeaway

I think this is a big deal because Gemma 4 finally makes serious agentic AI feel native to your own hardware, not just something you rent from a cloud API. In my experience, open models only hit escape velocity when they combine permissive licensing with a clean developer path. Gemma 4 checks both boxes by pairing Apache 2.0 freedom with a tightly engineered Android, WebGPU, and LiteRT‑LM toolchain. I generally prefer setups where I control the infrastructure and data plane. So the fact that you can get multi‑step planning, tool use, and multimodal reasoning running locally—down to Raspberry Pi and mobile NPUs—looks decidedly bullish for user adoption, edge AI startups, and enterprises trying to escape pure SaaS lock‑in.

Add Sci-Tech Today as a Preferred Source on Google for instant updates!

Sources

Joseph D'Souza

(Founder)

Joseph D'Souza founded Sci-Tech Today as a personal passion project to share statistics, expert analysis, product reviews, and experiences with tech gadgets. Over time, it evolved into a full-scale tech blog specializing in core science and technology. Founded in 2004 by Joseph D’Souza, Sci-Tech Today has become a leading voice in the realms of science and technology. This platform is dedicated to delivering in-depth, well-researched statistics, facts, charts, and graphs that industry experts rigorously verify. The aim is to illuminate the complexities of technological innovations and scientific discoveries through clear and comprehensive information.

Google DeepMind Unveils Gemma 4, an Apache 2.0 Open Model Family Built for On‑Device Agents

Key Takeaways

Quick Recap

Built for Edge Agents, Not Just Chatbots

Why This Open Release Matters Now?

Competitive Landscape & Comparison Table

Sci-Tech Today’s Takeaway

Sources

Agentic AI Companies

How To Add A User In Google Merchant Center?

How Many People Work in Anthropics?

How Many Calories in an Egg

How Many Calories in Shrimp

How Many Calories in Yogurt

Meditation Management Apps Market

Top 10 Most Expensive Essential Oils

Bayer Statistics

Digital Health Statistics

mRNA Technology Statistics