Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots

Abstract

The rapid advancements in machine learning (ML) techniques have led to significant achievements in various robotic tasks. Deploying these ML approaches on real-world robots requires fast and energyefficient inference of their deep neural network (DNN) models. To our knowledge, distributed inference, which involves inference across multiple powerful GPU devices, has emerged as a promising optimization to improve inference performance in modern data centers. However, when deployed on real-world robots, existing parallel methods can not simultaneously meet the robots’ latency and energy requirements and raise significant challenges. This paper reveals and evaluates the problems hindering the application of these parallel methods in robotic IoT, including the failure of data parallelism, the unacceptable communication overhead of tensor parallelism, and the significant transmission bottlenecks in pipeline parallelism. By raising awareness of these new problems, we aim to stimulate research toward finding a new parallel method to achieve fast and energy-efficient distributed inference in robotic IoT.

Publication
In the Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems
Junming Wang
Junming Wang
MPhil Student

My research interests focus on robotic vision and distributed robotic systems.