approved
AFF-ttention! - Affordances and Attention models for Short-Term Object Interaction Anticipation

Short-Term object-interaction Anticipation (STA) consists of detecting the location of the next-active objects, the noun and verb categories of the interaction, and the time to contact from the observation of egocentric video. This ability is fundamental for wearable assistants or human-robot interaction to understand the user’s goals, but there is still room for improvement to perform STA in a precise and reliable way. In this work, we improve the performance of STA predictions with two contributions: 1) We propose STAformer, a novel attention-based architecture integrating frame-guided temporal pooling, dual image-video attention, and multiscale feature fusion to support STA predictions from an image-input video pair; 2) We introduce two novel modules to ground STA predictions on human behavior by modeling affordances. First, we integrate an environment affordance model which acts as a persistent memory of interactions that can take place in a given physical scene. Second, we predict interaction hotspots from the observation of hands and object trajectories, increasing confidence in STA predictions localized around the hotspot. Our results show significant relative Overall Top-5 mAP improvements of up to +45% on Ego4D and +42% on a novel set of curated EPIC-Kitchens STA labels. We release the code, annotations, and pre-extracted affordances on Ego4D and EPICKitchens to encourage future research in this area.

Tags
Data and Resources
To access the resources you must log in
Additional Info
Field Value
Accessibility OnLine
AccessibilityMode Download
Associate Project FAIR
Basic rights Download
CreationDate 2025-03-24
Creator Farinella, Giovanni, giovanni.farinella@unict.it, orcid.org/0000-0002-6034-0432
Field/Scope of use Any use
Group Others
Owner Farinella, Giovanni, giovanni.farinella@unict.it, orcid.org/0000-0002-6034-0432
Programming Language Python
SoBigData Node SoBigData IT
Sublicense rights No
Territory of use World Wide
Thematic Cluster Other
system:type Method
Management Info
Field Value
Author Farinella Giovanni Maria
Maintainer Farinella Giovanni Maria
Version 1
Last Updated 22 June 2025, 01:08 (CEST)
Created 22 June 2025, 01:08 (CEST)