Apple has trained AI to recognize hand gestures from sensor data

10 hours ago

4 minutes read

Apple has trained AI to recognize hand gestures from sensor data

In a new study, Apple taught an AI model to recognize hand gestures that weren’t part of its original training dataset. Here are the details.

What is EMG?

Apple has published new research on its Machine Learning Research blog, called EMBridge: Enhancing Gesture Generalization from EMG Signals through Cross-Modal Representation Learning. The research will be presented at the ICLR 2026 Conference in April.

In it, the researchers describe how they trained an AI model to recognize hand gestures, even if those specific gestures were not part of its original dataset.

To accomplish this, they developed EMBridge, “a cross-representation learning framework that bridges the procedural gap between EMG and posture.”

EMG, or Electromyography, measures the electrical activity produced by muscles during contraction. Its practical applications range from medical diagnosis and physical therapy to prosthetic management.

More recently (although this is not a new area), it has been widely explored in wearables and AR/VR applications.

Meta’s Ray-Ban Display glasses, for example, use EMG technology in what Meta calls the Neural Band, a wrist-worn device that “interprets your muscle signals to navigate Meta Ray-Ban Display features,” according to the company description.

In Apple’s research, the EMG signals used for training were not detected by the wrist-worn device. Instead, the researchers used two datasets:

em2pose: “[…] a large open-source EMG data dataset containing 370 hours of sEMG and synchronized hand data from all 193 consenting users, 29 different behavioral groups covering a variety of different and continuous movements such as making a fist or counting to five. Hand pose labels are generated using a high-resolution motion capture system. The full dataset contains more than 80 million pose labels and has the same scale as a large computer vision scale. Each user completed four recording sessions for each touch category, each with a different EMG-band placement. Each session lasted 45–120 seconds, during which users repeatedly performed a combination of 3–5 identical gestures or unrestricted free movements. We use non-overlapping 2-second windows as input sequences. The EMG is sampled, band-pass filtered (2–250 Hz), and notch filtered at 60 Hz.”

NinaPro DB2: “We used two NinaPro EMG data sets for detailed analysis of EMBridge. Specifically, Ninapro DB2 is used for pre-training , which includes paired EMG-pose data from 40 subjects. It contains 49 hand gestures (including basic finger flexion, functional grip, and integrated EMG recordings of healthy movements). 12 electrodes are placed on the arm with a sample rate 2 kHz, near the hand kinematics data captured by the data glove For downstream touch classification, we use NinaPro DB7, which contains data from 20 unbroken subjects collected with the same EMG device and touch set as DB2.

With all that said, it’s easy to see how Apple’s EMBridge could pave the way for future Apple Watch models (or other wearables) to control devices like the Apple Vision Pro, Macs, iPhones, and other wearables, including rumors of its upcoming smart glasses.

In practice, from new methods of communication to improved accessibility, the possibilities can be significant.

Granted, the research itself doesn’t apparently mention any upcoming Apple products or apps, but it does mention the following:

A potential application of our framework is the application of Human-Computer Interaction. In
situations such as VR/AR and prosthetic control applications, a wrist-worn device must continuously generate hand gestures from EMG to drive a virtual image or robotic hand.

What is EMBridge?

EMBridge was the researchers’ way of bridging the gap between real EMG muscle signals and structured hand posture data.

Trained using a cross-modal framework, the model was first pre-trained with EMG and hand posture data separately.

Then, the researchers aligned the two representations for the EMG encoder to read from the pose encoder. This allowed EMBridge to learn to recognize touch patterns in EMG signals.

Once that was done, they trained the system using masked pose reconstruction, hiding parts of the pose data and asking the model to reconstruct it using only the information extracted from the EMG signals.

The result, as described by the researchers:

“To the best of our knowledge, EMBridge is the first parallel representation learning framework to achieve implicit gesture classification from wearable EMG signals, demonstrating the potential of real-world gesture recognition in wearable devices.”

To reduce training errors caused by similar gestures being perceived as negative, the researchers taught the model to recognize when postures represent similar hand configurations, allowing it to generate soft targets for those postures instead of treating them as completely unrelated.

This helped organize the model’s representational environment, improving its ability to adapt to gestures it had never seen before.

The authors tested EMBridge on two benchmarks, emg2pose and NinaPro, and found that it outperformed all existing methods, especially in detecting zero-shot (or, never-before-seen) gestures. Importantly, it did so with only 40% of the training data.

One important limitation noted in the paper is that the model relies on a dataset containing both EMG signals and synchronized hand position data. This means that its training still depends on special datasets that can be difficult to collect.

Nevertheless, the research is interesting, especially at a time when EMG-based device control seems to be on the rise.

For full technical details on EMBridge, including its Q-Former, MPRL, and CASLe components, follow this link.