Talk details - Devoxx Morocco 2024

See, Hear, Learn: Building Smarter AI with Multimodal Fusion

Conference (INTERMEDIATE level)

Wednesday from 16:30 17:20

Mimosa 1

The world is inherently multimodal, consisting of sights, sounds, and other sensory data. To achieve a human-like understanding of this complex world, AI models require multimodal data for analysis. This talk delves into Multimodal AI, exploring why it's essential for achieving superior AI performance. We'll explore how multimodal learning approaches work, how generative AI models like Gemini can supercharge these capabilities, and best practices for maximizing the value of your multimodal AI projects. Since building and running these models can be resource-intensive, we'll discuss strategies to optimize their utilization. Additionally, we'll showcase practical examples of multimodal AI models using platforms like Google AI Studio (text-to-image, image-to-text, video-to-text) and tools like PaliGemma and Idefics2 (https://huggingface.co/docs/transformers/main/en/model_doc/idefics2).

Sara El-Ateif

Anajia

Sara El-Ateif, Co-Founder of Anajia and AI Wonder Girls, Google Developer Expert in Machine Learning, Google PhD Fellow, NVIDIA DLI Instructor-University Ambassador, and Mindvalley Certified Business Coach, is on a mission to demystify AI for value creation, empower individuals with tools and mindset required to build solutions that matter to their community/to humanity.