Google demonstrates Gemma 4 as a Vision Language Action (VLA) model running locally on NVIDIA's Jetson Orin Nano Super edge device with 8GB RAM. The model autonomously decides when to use its webcam to answer user questions without keyword triggers, integrating speech-to-text input and text-to-speech output. Full code is available on GitHub for developers to reproduce the implementation.
Models
Gemma 4 VLA Demo on Jetson Orin Nano Super
Gemma 4 VLA brings vision-language-action AI to ultra-low-power edge—Google's model runs on NVIDIA's 8GB Jetson Orin Nano Super with autonomous webcam control and voice I/O, fully reproducible on GitHub.
Wednesday, April 22, 2026 12:00 PM UTC2 MIN READSOURCE: Hugging FaceBY sys://pipeline
Tags
models