Workshop on Emerging Topics in Multimedia 2025

Schedule

☕ Coffee & Pastry

Welcome refreshments at 9:00 AM

10:00 AM - 11:00 AM

Possibilities and Limitations of Vision-Language Pretrained Models
Shin'ichi Satoh (NII, Japan)

11:00 AM - 12:00 PM

Learnings from A Decade of Lifelog Access & Retrieval
Cathal Gurrin (Dublin City University, Ireland)

12:00 PM - 1:00 PM

Knowledge Graphs and Multimedia
Things we can do, things we can't, and how to change the latter
Luca Rossetto (Dublin City University, Ireland)

🍕 Lunch Break

Pizza will be served at 1:00 PM

Speakers & Presentations

Possibilities and Limitations of Vision-Language Pretrained Models

Shin'ichi Satoh

National Institute of Informatics (NII), Japan

Vision-language pretrained models, such as CLIP and its variants, are widely used by computer vision community, natural language community, multimedia community, and so on. They are very powerful, especially for semantic analysis of visual contents. On the other hand, couple of drawbacks have been pointed out by recent research activities. In this talk, vision-language pretrained models are first briefly explained, followed by couple of drawbacks will be discussed. Especially, issues related to fine-grained image recognition and image similarity retrieval will be discussed.

Speaker Bio:

Shin'ichi Satoh received his BE degree in Electronics Engineering in 1987, his ME and PhD degrees in Information Engineering in 1989 and 1992 at the University of Tokyo. He joined National Center for Science Information Systems (NACSIS), Tokyo, in 1992. He is a full professor at National Institute of Informatics (NII), Tokyo, since 2004. He was a visiting scientist at the Robotics Institute, Carnegie Mellon University, from 1995 to 1997. His research interests include image processing, video content analysis and multimedia database. Currently he is leading the video processing project at NII, addressing video analysis, indexing, retrieval, and mining for broadcasted video archives.

Learnings from A Decade of Lifelog Access & Retrieval

Cathal Gurrin

Dublin City University, Ireland

Over the past decade, lifelogging has evolved from a niche research topic into a vibrant interdisciplinary field at the intersection of computer vision, information retrieval, and human-computer interaction. In this talk, I will reflect on ten years of research into interactive lifelog retrieval, drawing insights from the ACM Lifelog Search Challenge (LSC) and other major initiatives that have shaped the community. I will highlight the key milestones in the development of multimodal lifelog datasets, advances in semantic indexing and search, and the emergence of novel user interfaces for interactive retrieval. By examining how the field has addressed the challenge of turning vast personal archives into searchable, meaningful content, I will offer a forward-looking perspective on the opportunities and open questions that lie ahead in building intelligent, user-centric lifelog systems.

Speaker Bio:

Professor Cathal Gurrin is a researcher and academic at Dublin City University (DCU) in Ireland. He is also the Deputy Director of the national ADAPT Centre for digital content technologies. Gurrin's research interests focus primarily on personal analytics and lifelogging, which involves creating extensive personal databases of lifelog images and other sensor data to capture and analyse daily activities and experiences. He has been a pioneer in this field, amassing a continuous personal archive since 2006 that includes over 15 million wearable camera images and hundreds of millions of other sensor readings. His work aims to develop assistive technologies that use wearable sensors and data analytics to infer knowledge about real-world activities and enhance individual performance and health. Gurrin is heavily involved in community conference organising activities and he as been the general co-chair of many leading conferences in his field, such as ECIR'11, MM'14, ICMR'20, MM'22/23, ICMR'24 and will be the general co-chair of ACM MM'25 and ACM Web'27 in Dublin.

Knowledge Graphs and Multimedia

Luca Rossetto

Dublin City University, Ireland

Knowledge Graphs are an effective mechanism for representing Knowledge as a structure of interconnected facts. They work especially well for information atoms that can be easily captured using a short textual label or a literal value. For information best represented in different modalities, such as visual or aural, these graphs experience several limitations. Multimodal Knowledge Graphs commonly incorporate multimodal information by linking to external documents which are opaque to a query engine and hence only of limited use in complex graph queries. This talk will present an overview of the current state of multimodal knowledge graphs before introducing the MediaGraph concept that aims to make multimodal information into first class citizens in knowledge graphs.

Speaker Bio:

Luca Rossetto is an Assistant Professor at the School of Computing at Dublin City University. His research focuses on managing, analyzing, and retrieving multi-modal data. He is one of the core developers of the open-source multimedia retrieval engine 'vitrivr' and co-creator of the 'Distributed Retrieval Evaluation Server' used for interactive multimedia evaluations in different areas. More recently, Luca's research focuses on the intersection between Knowledge Graphs and Multimedia Data with the aim of seamlessly integrating multimodal information into graph structures.

New Trends in Multimedia

Organized by:

Schedule

☕ Coffee & Pastry

🍕 Lunch Break

Speakers & Presentations

Possibilities and Limitations of Vision-Language Pretrained Models

Learnings from A Decade of Lifelog Access & Retrieval

Knowledge Graphs and Multimedia