HONOR debuts on-device large speech model in Magic V5

HONOR has announced the debut of the industry’s first on-device large speech model on the international versions of the HONOR Magic V5 ahead of launch later this week.

This new technology successfully resolves key technical challenges in multilingual speech recognition and translation on devices, including breakthroughs in low-latency streaming speech recognition and the efficient deployment of large-scale models.

Two research papers related to this technological advancement have been recognized at INTERSPEECH 2025, the world’s largest and most comprehensive conference on the science and technology of spoken language processing.

Addressing Privacy and Performance Concerns

Current mainstream translation solutions rely heavily on cloud infrastructure, which raises privacy concerns, especially for sensitive conversations like phone calls.

While some on-device solutions exist, they often compromise on performance, including speed, accuracy, and memory footprint, due to the inherent limitations of mobile devices. HONOR’s new technology aims to overcome these limitations by providing a cloud-comparable experience directly on the device, ensuring both privacy and performance.

On-Device Communication Benefits

HONOR’s solutions are designed to offer several consumer benefits. The technology reduces the memory footprint for six language packs (Chinese, English, German, French, Spanish, and Italian) from 3-4GB to 800MB. This eliminates the need for six separate 500MB downloads, saving approximately 2.78GB of storage.

The technology enables real-time, “speak-as-you-go” translation, which allows for translations as a user speaks, rather than waiting for a full sentence to be completed. According to HONOR, this results in a 38% increase in inference speed and a 16% increase in translation accuracy.

INTERSPEECH 2025 Validates Research

Two papers submitted to INTERSPEECH 2025 provide further details on the technology. The first paper, “MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition,” addresses the challenge of achieving low-latency and high-accuracy streaming speech recognition on devices. The paper highlights HONOR’s integration of a Continuous Integrate-and-Fire (CIF)-based predictor with the Wait-k strategy, which helps adapt a low-latency approach from the text domain to the continuous nature of speech.

The second paper, “Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation,” was developed in collaboration with Shanghai Jiao Tong University.

This paper introduces a parasitic dual-scale speculative sampling acceleration strategy designed to overcome the limitations of real-time inference for large speech models on resource-constrained devices. The paper states that the strategy achieves a 38% increase in inference speed without compromising model performance.

HONOR states that this technology is part of its ongoing commitment to on-device AI and is intended to lead to more intelligent, private, and seamless human-device interactions.


Srivatsan Sridhar: Srivatsan Sridhar is a Mobile Technology Enthusiast who is passionate about Mobile phones and Mobile apps. He uses the phones he reviews as his main phone. You can follow him on Twitter and Instagram
Related Post