
The 1st International Workshop on Human-Autonomous Vehicle Interaction at WACV 2025 will provide a platform for researchers focused on the human aspect of autonomous vehicles. We aim to encourage discussions on innovative solutions and cross-disciplinary research. Specifically, the workshop topics will include (but are not limited to):
- Human perception (face, hand, gaze and etc.) for autonomous vehicles.
- Human-centric autonomous driving.
- In-vehicle human interaction.
- Driver assistance and monitoring systems.
- Pedestrian detection, re-identification, and trajectory prediction.
- Simulation and generation for autonomous vehicles.
- Large Language Models (LLMs) for autonomous vehicles.
- New datasets, benchmarks, and evaluation metrics for autonomous vehicles.
- Analysis of drivers, passengers, pedestrians, and all individuals related to autonomous vehicles.
Call for Contributions
Full Workshop Papers
We invite authors to submit unpublished papers to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. Accepted papers will be published in the official WACV Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.
Submission CMT*: All contributions must be submitted (along with supplementary materials, if any) at this CMT link.
Author guidelines: 8-page, following WACV main conference WACV format
Templates: Overleaf template; .zip template.
Important Dates
Paper Submission Deadline | 6 December, 2024 (23:59 Pacific time). Submission Now! | |
Papers Reviews Deadline | 20 December, 2024 | |
Notification to Authors | 27 December, 2024 | |
Camera-Ready Deadline | 10 January, 2025 | |
Workshop Day | 1:00PM-5:00PM, 28 February, 2025 |
Workshop Schedule
Time in MST | Start Time in your time zone* |
Item |
---|---|---|
1:00pm - 1:10pm | 28 Feb 2025 20:00:00 UTC | Opening Remark |
1:10pm - 1:50pm | 28 Feb 2025 20:10:00 UTC | Keynote Speaker Xiatian Zhu |
1:50pm - 2:05pm | 28 Feb 2025 20:50:00 UTC | AAT-DA: Accident Anticipation Transformer with Driver Attention. |
2:05pm - 2:20pm | 28 Feb 2025 21:05:00 UTC | Snapshot: Towards Application-centered Models for Pedestrian Trajectory Prediction in Urban Traffic Environments. |
2:20pm - 2:35pm | 28 Feb 2025 21:20:00 UTC | What's Happening- A Human-centered Multimodal Interpreter Explaining the Actions of Autonomous Vehicles. |
2:35pm - 2:50pm | 28 Feb 2025 21:35:00 UTC | Deep Learning-based rPPG Models towards Automotive Applications: A Benchmark Study. |
3:00pm - 3:45pm | 28 Feb 2025 22:00:00 UTC | Poster Session. |
3:45pm - 4:25pm | 28 Feb 2025 22:45:00 UTC | Keynote Speaker Jingbo Wang. |
4:25pm - 4:35pm | 28 Feb 2025 23:25:00 UTC | Award & Closing Remark. |
For example, those in Los Angeles may see UTC-7,
while those in Beijing may see UTC+7.
Please note that there may be differences to your actual time zone.

Surrey University, U.K.
Safer Autonomous Systems with Predictive Intelligence & Generative Simulation
Abstract
Safety in autonomous driving relies on accurately predicting the motion of surrounding agents and generating realistic driving environments for robust simulation and testing. In this talk, I present two advancements that enhance these capabilities. First, I introduce RealMotion, a motion forecasting framework designed for continuous driving. Unlike traditional models that process scenes independently, RealMotion captures evolving situational and contextual relationships across time, improving forecasting accuracy and real-world efficiency for safer decision-making. Next, I explore DriveX, a driving scene synthesis approach that enables free-form trajectory simulation. While existing methods struggle with novel trajectories due to limited video perspectives, DriveX leverages video generative priors to optimize a 3D scene model across diverse paths, allowing for scalable, high-fidelity simulations that support safer and more adaptable autonomous systems. By bridging predictive intelligence with generative simulation, this talk highlights new pathways toward safer, more reliable autonomous driving.
Dr. Xiatian Zhu is a Senior Lecturer at the Surrey Institute of People-Centred AI and the Centre for Vision, Speech, and Signal Processing (CVSSP) at the University of Surrey in Guildford, UK. He leads the Universal Perception (UP) lab, which focuses on advancing multimodal generative AI for real-world applications and business. Dr. Zhu earned his PhD from Queen Mary University of London and received the 2016 Sullivan Doctoral Thesis Prize from the British Machine Vision Association, an honour recognizing excellence in AI technologies within computer vision. His contributions include the development and commercialization of multi-camera object association systems for industry. During his time as a research scientist at the Samsung AI Centre in Cambridge, Dr. Zhu pioneered sustainable AI algorithms for understanding visual content in images and videos. His work has garnered several best paper awards, and he has been recognized as one of the UK's and the world's best rising stars in science. Dr. Zhu's extensive research output includes over 120 articles in top-tier conferences and journals, with more than 17,000 citations and an H-index of 54. He actively contributes to the academic community through workshop organization, serving as a senior program committee member and area chair, and participating in panel debates on emerging trends in AI. Additionally, Dr. Zhu holds five US patents in the fields of AI and computer vision.

Shanghai AI Lab, China
Capture, Generation, and Interaction, towards generalizable pedestrian simulation in driving scenarios.
Abstract
TBC
Dr.Jingbo Wang obtained his Ph.D. from The Chinese University of Hong Kong (MMLAB), supervised by Prof. Dahua Lin. Before that, he received his Master degree from Peking University in 2019, supervised by Prof. Gang Zeng, and his Bachelor degree from Beijing Institute of Technology in July 2016. He's interested in computer vision, deep learning, generative AI, character animation, and embodied AI. Most of his research is about generating realistic character animations as human in the real world. Before this, he also did research on scene understanding with efficient model (A.K.A BiseNet V1/V2) and multi-modality input.

University of Birmingham

University of Birmingham

University of Birmingham

Durham University

Imperial College London

University of Birmingham

University of Birmingham

Sun Yat-sen University

University of Birmingham

Korea Electronics Technology Institute (KETI)

Samsung Electronics

University of Michigan

The University of Tokyo
Boeun Kim (b.e.kim@bham.ac.uk); Zhongqun Zhang (zxz064@student.bham.ac.uk)