Cees Snoek
Cees G.M. Snoek is a full professor in computer science at the University of Amsterdam, where he leads the Video & Image Sense Lab. Additionally, he serves as the director of three public-private AI research labs: the QUVA Lab in collaboration with Qualcomm, the Atlas Lab with TomTom, and the AIM Lab with Core42. At University spin-off Kepler Vision Technologies he acts as Chief Scientific Officer. Professor Snoek also directs the ELLIS Amsterdam Unit and is the scientific director of Amsterdam AI, a collaboration between government, academic, medical and other organisations in Amsterdam to study, develop and deploy responsible AI.
He received his M.Sc. degree in business information systems (2000) and the Ph.D. degree in computer science (2005) both from the University of Amsterdam, The Netherlands. Previously, he held roles as an assistant and associate professor at the University of Amsterdam, as well as Visiting Scientist at Carnegie Mellon University in the U.S., and Fulbright Junior Scholar at UC Berkeley. He also headed R&D at University spin-off Euvision Technologies and worked as Managing Principal Engineer at Qualcomm Research Europe.
Professor Snoek's research centers on understanding video and image content. He has published over 250 peer-reviewed book chapters, journal and conference papers, and frequently serves as an area chair of leading conferences in computer vision, machine learning, and multimedia. He is currently an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence.
More information
LinkedIn:http://nl.linkedin.com/in/cgmsnoek
Visit Cees Snoek's website: https://www.ceessnoek.info/
Session
Multimodal foundation models are a revolutionary class of AI models that provide impressive abilities to generate multimedia content and do so by interactive prompts in a seemingly creative manner. These foundation models are often self-supervised transformer-based models pre-trained on large volumes of data, typically collected from the web. They already form the basis of all state-of-the-art systems in computer vision and natural language processing across a wide range of tasks and have shown impressive transfer learning abilities. Despite their immense potential, these foundation models face challenges in fundamental perception tasks such as spatial grounding and temporal reasoning, have difficulty to operate on low-resource scenarios, and neglect human-alignment for ethical, legal, and societal acceptance.