Shengwang R2 Conversational AI Robot Development Kit

联系二维码

Scan the code to enquire about the product

Product Description

The R2 Conversational AI Robot Development Kit is Shengwang's integrated solution for desktop robots and emotional companion robots. Building upon the real-time AI voice interaction capabilities of the R1 series—including full-duplex conversation, background noise reduction, and intelligent interruption—the R2 kit adds local visual recognition and multi-degree-of-freedom motion control, achieving the critical leap from "hearing and speaking" to "seeing and moving". Product Introduction: The R2 kit integrates powerful NPU and ISP, providing a complete edge- side multimodal AI solution. It enables complex visual functions such as face tracking, gesture recognition, and object following, combined with multi-degree-of-freedom motion control to allow robots to perform emotionally engaging physical interactions like "walking up to greet users" and "turning to look at the speaker". Key Features: Multimodal Interaction: Integrates voice, vision, and motion control capabilities Emotional Design: Establishes emotional connections through visual gaze and body movements All-Scenario Adaptation: A single base empowers diverse scenarios including educational companionship, office collaboration, home interaction, and wearable recording Rapid Development: Provides a turnkey solution that significantly shortens the productization path Main Application Scenarios: Desktop emotional robots, intelligent learning assistants, meeting assistants, home visual control centers, lightweight AI recorders, etc..

Product Specifications

The R2 All-Scenario AI Robot Development Kit achieves multiple breakthroughs in technical performance. In voice interaction, it fully inherits the industry-leading capabilities of the R1 series, including real-time AI voice interaction technologies such as full-duplex conversation, background noise reduction, and smooth interruption. Conversation latency can be as low as 650ms, with interruption response times as low as 340ms, providing near-human-like conversation response speed and rhythm. In complex environments, it can block 95% of ambient human voices and noise interference, achieving precise recognition of conversational speech. In visual capabilities, leveraging the powerful integrated NPU and ISP of the BK7259 chip, R2 adds local visual recognition and processing capabilities, supporting functions such as face tracking, gesture recognition, and object following. Visual processing latency is controlled at the millisecond level, enabling real-time recognition and response to visual commands. In motion control, it supports precise multi-degree-of-freedom control, combined with visual and voice functions to achieve emotionally engaging physical interactions like "walking up to greet users" and "turning to look at the speaker". The kit adopts a low-power design scheme, supporting ultra-long standby time and effectively addressing device battery life anxiety. It also supports 47 languages, achieving low-latency responses through overseas-deployed servers and real-time multilingual conversion and content output. In development efficiency, it takes only 1 hour to run a demo and 1 day to complete product prototype sampling, significantly shortening the product development cycle.

Company Profile

上海声网科技有限公司

声网成立于2014年,是全球实时音视频云服务开创者,为人与人、人与 Agent、Agent 与 Agent 的多模态实时交互提供最佳体验。开发者只需简单调用声网 API,即可在应用内构建诸如对话式 AI、音视频通话、直播等多种实时互动场景。声网 API 已赋能 AI、社交直播、教育、游戏、IoT、金融、医疗、企业协作等20余行业,共计200多种场景。 2020年6月26日,声网母公司 Agora, Inc. 成功登陆纳斯达克,股票代码为“API”。截至2025年12月31日,声网全球注册应用数超过100万。2025年全年服务超1万亿分钟。 声网推出了全球首个对话式 AI 引擎,以赋能开发者基于任何大语言模型构建实时语音对话体验。创造了全球首个、迄今为止规模最大的实时音视频网络——软件定义实时网 SD-RTN™。 声网的技术服务覆盖全球 200 多个国家和地区,客户包括小米、陌陌、斗鱼、哔哩哔哩、小红书、Yalla等巨头、独角兽及创业企业;声网的技术同样被HTC VIVE 、The Meet Group、Bunch等遍布全球的知名企业采用。
Visitor Registration Call us WeChat

Leave Messages

Exhibitor Application
Visitor Registration Contacts Add WeChat

Reserve a Booth

Exhibitor Application

Medium/EN