In this tutorial, we overview the cutting-edge research on spatial and temporal language understanding and its applications. This includes how spatial and temporal semantics are represented, the existing datasets and annotations, the connection between information extraction models, qualitative reasoning based on spatial and temporal language, and end-to-end deep learning models. We review recent large language models used for spatial and temporal language comprehension, their evaluation, and related limitations and challenges. We clarify the role of spatial and temporal language in downstream applications, highlighting applications such as grounding language in the visual world for navigation and wayfinding agents, as well as human-machine interaction and situated dialogue systems. We also review research on events' temporal relationships and the extraction of event timelines.
Michigan State University, kordjams@msu.edu
Parisa Kordjamshidi is Assistant Professor of Computer Science Department at Michigan State University. Her research interests are in Natural language Processing and Machine Learning. She has been working on spatial semantics extraction and annotation schemes, mapping language to formal spatial representations, spatial ontologies, structured output prediction models for information extraction, combining vision and language for spatial language understanding. She was awarded an NSF CAREER award in Feb 2019 to work on combining learning and reasoning for spatial language understanding. She is working on Neuro-symbolic modeling and integration of domain knowledge in neural models. She is the PI of an ONR project on integration of domain knowledge in learning as well as Compositional Generalization in Combining Vision and Language. Further related to the topic of this tutorial, she has been organizing/co-organizing shared tasks on Spatial role labeling, SpRL-2012, SpRL-2013 and the Space Evaluation workshop, SpaceEval-2015, in SemEval Series and Multimodal spatial role labeling workshop mSpRL at CLEF-2017 with the goal of considering vision and language media for spatial information extraction and organized SpLU-2018, and Robonlp-SpLU-2019, SpLU-2020 collocated with NAACL-18, NAACL-2019 and EMNLP-2020 respectively.
AWS, qiangning.01@gmail.com
Qiang Ning is an applied scientist at AWS (2022-now) leading the human alignment team for Titan LLMs. Prior to that, Qiang was an applied scientist at Alexa (2020-2022) and a research scientist at the Allen Institute for AI (2019-2020). Qiang received his Ph.D. from the University of Illinois at Urbana-Champaign in 2019 in Electrical and Computer Engineering. Qiang's research interests span in information extraction, question answering, and the application of weak supervision methods in these NLP problems in both theoretical and practical aspects.
Brandeis University, jamesp@cs.brandeis.edu
James Pustejovsky is the TJX FeldbergChair in Computer Science at Brandeis University, where he is also Chair of the Linguistics Program, Chair of the Computational Linguistics MA Program, and Director of the Lab for Linguistics and Computation. He received his B.S. from MIT and his Ph.D. from UMASS at Amherst. He has worked on computational and lexical semantics for 25 years and is chief developer of Generative Lexicon Theory. Since 2002, he has been working on the development of a platform for temporal reasoning in language, called TARSQI(www.tarsqi.org). Pustejovsky is chief architect of TimeML and ISO-TimeML, a recently adopted ISO standard for temporal information in language, as well as the recently adopted standard, ISO-Space, a specification for spatial information in language. He has developed a modeling framework for representing linguistic expressions and interactions as multimodal simulations. This platform, VoxML, enables real-time communication between humans and computers or robots for joint tasks, utilizing speech, gesture, gaze, and action. He is currently working with robotics researchers in HRI to allow the VoxML platform to act as both a dialogue management system as well as a simulation environment that reveals realtime epistemic state and perceptual input to a computational agent. His areas of interest include: Computational semantics, temporal and spatial reasoning, language annotation for machine.
KU Leuven, sien.moens@cs.kuleuven.be
Marie-Francine Moens is Full Professor at the Department of Computer Science, KU Leuven. She has a special interest in machine learning for natural language understanding and in grounding language in a visual context. She is holder of the prestigious ERCAdvanced Grant CALCULUS (2018-2023) granted by the European Research Council on the topic of language understanding. She is currently associate editor of the journal IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). In 2011 and 2012 she was appointed as chair of the European Chapter of the Association for Computational Linguistics (EACL) and was a member of the executive board of the Association for Computational Linguistics (ACL). From 2014 till 2018 she was the scientific manager of the EU COST action iVL Net (The European Net-work on Integrating Vision and Language).