Multimodal Learning and Reasoning is a research group whose goal is to develop models capable of jointly learning from and reasoning over multiple data modalities, such as images, audio, and text. The group studies methods for the effective integration of heterogeneous sources and for building advanced multimodal representations, with particular attention to reasoning enabled by Large Language Models and to agent-based paradigms for interaction and decision-making. The main research areas include audio-visual learning, video understanding, and cross-modal learning across text, audio, images, and video. An additional focus is placed on interpretability and privacy preservation, with the aim of making models more transparent, reliable, and understandable, while ensuring the responsible use of data