Human-robot collaboration (HRC) is a growing topic of interest. Rather than replacing humans, collaboration enables both humans and robots to work together. This eliminates the need for robots to be kept separate from the workforce for safety concerns. The advantages include boosting efficiency, productivity, and flexibility on production lines, while also playing an inclusive role by assisting workers with physical disabilities or seniors to stay productive. Achieving this goal requires effective communication between humans and robots to create a trusty work environment that can keep the individual work performance or even increase it. To improve how well both works together, they must both adjust to one another's skills. Computer vision-based detection and gesture recognition have long been used as a helpful way for people to communicate with robots, due to the inflated cost and resource requirements of wearable solutions.

The SMARTHANDLE HRC system is composed of three modules: Detection & Tracking, Speed Separation, and Gesture Recognition:

The first module uses camera images from 2D RGB cameras and machine learning algorithms to construct a layered perception system that identifies one or more workers in a robot's vicinity. Additionally, it extracts data on workers' movements and limb positions. Our implementation achieved a frame rate of 10 frames per second. The ML algorithm is based in YOLOv8 that for each person estimates the pixel location of 17 key points on the human body, corresponding mainly to body joints. This process does not involve personal information, there are no facial recognition or any other identification methods. Therefore, the privacy of the workers is guaranteed. This module provides the input for the following ones.

The Speed Separation Module creates solid objects in real-time representing the body's position and the robot's digital twin to help calculate the distance between them. The robot adjusts its operating speed based on this distance and some previously defined thresholds. This module facilitates collaboration between workers and robots.

The algorithm for detecting gestures computes the angle between the key points and identifies a gesture by referencing the quaternion in a lookup table. This facilitates understanding between robots and humans. The robot can alter its actions by proceeding to the following step in the sequence, transitioning to a different task, ceasing movement, or canceling the current task.

This system was tested at AIMEN’s lab and other locations owned by our partners and clients. Some of the limitations we found with this approach are occlusions or people wearing clothes that match the background. In the future, we will focus on improving the frame rate, the scalability and addressing the limitations of the system. Due to their advantages, we hope that this type of solutions will become more and more frequent in industrial installations in different sectors.


The author of this piece is AIMEN.