Google’s Latest AI Breakthrough Might Redefine Mobile Performance — But Is It the Game-Changer Everyone Expects?
Google has unveiled a major leap in mobile AI performance with its new LiteRT accelerator, Qualcomm AI Engine Direct (QNN). Designed specifically for Snapdragon-powered Android devices, QNN promises astonishing efficiency — up to 100 times faster than CPUs and 10 times faster than GPUs when handling on-device artificial intelligence workloads. But here’s where things get interesting: this could spark a shift in how developers and users think about mobile AI performance.
Until now, mobile GPUs have been the go-to hardware for running AI tasks on Android devices. However, Google’s senior software engineers Lu Wang, Wiyi Wanf, and Andrew Wang point out a surprising limitation: GPUs, despite their power, can become a bottleneck in real-world applications. Imagine generating a text-to-image model while your camera simultaneously uses machine learning for real-time segmentation — even top-tier GPUs can struggle, causing lag, frame drops, and jittery visuals. That’s where NPUs come in.
What’s the NPU advantage? Neural processing units, now common in many smartphones, are purpose-built for AI workloads. They can handle complex neural network calculations with far greater speed and efficiency than GPUs — and they do it while consuming much less power. QNN taps into that capability by serving as a next-generation delegate for LiteRT, replacing the old TFLite QNN delegate with a more cohesive, powerful architecture.
Developed through a deep collaboration between Google and Qualcomm, QNN simplifies the developer workflow by wrapping various SoC compilers and runtimes into a unified framework, accessible through a single streamlined API. The new framework currently supports 90 LiteRT operations, laying the foundation for full model delegation — the secret to maximizing both performance and efficiency. It includes optimized kernels designed specifically for large language models (LLMs) like Gemma and FastVLM, boosting their speed and responsiveness.
In performance testing, Google benchmarked QNN across 72 machine learning models, and 64 of them achieved complete NPU delegation. The improvement is dramatic: up to 100x faster than CPUs and 10x faster than GPUs in execution. On Qualcomm’s flagship Snapdragon 8 Elite Gen 5, 56 models ran in under 5 milliseconds when powered by the NPU — compared to just 13 models reaching that mark on the CPU. This leap opens the door to real-time AI experiences once thought impossible on mobile platforms.
To showcase what this means in practice, Google’s engineers built a concept app using a fine-tuned version of Apple’s FastVLM-0.5B vision-encoding model. The result? Astonishing performance. The app can interpret live camera scenes in real time, achieving a time-to-first-token (TTFT) of just 0.12 seconds on 1024×1024 images, generating over 11,000 tokens per second during prefill, and more than 100 tokens per second during decoding. The secret behind this lightning-fast performance lies in precision optimization — the model uses int8 weight quantization and int16 activation quantization, maximizing the NPU’s advanced int16 kernel capabilities.
There’s a catch, though. QNN support currently extends only to a limited range of devices, mainly those powered by Snapdragon 8 and Snapdragon 8+ SoCs. Developers eager to experiment can visit Google’s NPU acceleration guide and download LiteRT directly from GitHub.
So, is QNN the long-awaited step toward full-scale mobile AI independence — or just another proprietary ecosystem piece that favors high-end hardware? Google’s claims are impressive, but skeptics might question whether NPUs will become the new standard across all Android devices or remain a premium feature.
What’s your take — will NPUs truly revolutionize the mobile AI experience, or is the hype getting ahead of the hardware? Let’s hear your thoughts in the comments.