Research · 2025

ViTHL: Vision Transformer-Based Hybrid Localization for Humanoid Robots

A hybrid localization approach combining vision transformers with traditional geometric methods for robust self-localization of humanoid robots in RoboCup competition environments.

Project

roboticsvision-transformerslocalizationhumanoidrobocup

Venue

RoboCup Symposium 2025

Authors

S. Khatibi, A. Rahmani, A. Azadfar, V. P. da Fonseca, T. E. A. Oliveira

Abstract

Robust self-localization is a critical capability for autonomous humanoid robots competing in dynamic environments. We present ViTHL, a hybrid localization architecture that combines the representational power of vision transformers with geometric constraints for accurate and reliable robot pose estimation. Our approach leverages pre-trained ViT models for feature extraction from onboard camera feeds, fused with traditional odometry signals through a probabilistic filter. Evaluated on humanoid robots in RoboCup competition scenarios, ViTHL demonstrates improved localization accuracy over baseline methods under challenging lighting and field conditions.

ViTHL

ViTHL is the first hybrid research entry in the repository.

The research record owns canonical metadata such as title, venue, and authors. This writeup provides the narrative context needed for a readable public page.