Scaling Uber's Batch and Streaming Data Infrastructure - Past, Present, and Future
Conference (INTERMEDIATE level)
At Uber, we are running large fleet data infrastructure for both batch and streaming data stack. We operate Hadoop datalake with tens of thousands nodes storing Exabytes of data and serving transactional data that brings traditional relational database functionalities as well as data warehouse capacities. We run compute fleet which powers hundreds of thousands of Spark, Flink and ETL jobs daily for both offline and real-time use cases. We manage Kafka clusters to serve tens of trillion messages daily from various business messaging and streaming cases as well as application logs. In this talk, we will share the lessons learned over the past few years for operating and scaling our data infrastructure fleet, best practices in containerization, auto-scaling, security, efficiency, and operational excellence. We will also talk about our innovations in big data storage, compute, messaging and lakehouse technologies, envision the future roadmap and how they will evolve with newer business needs and challenges.
Mingmin is director of engineering and head of Data Infrastructure engineering team at Uber. He has been leading the team to build and operate Hadoop data lake to power multiple Exabytes data in storage, Kafka infrastructure to power tens of trillions messages per day, and compute infrastructure to power hundreds of thousands compute jobs per day. His team builds highly scalable, highly reliable yet efficient data infrastructure with innovative ideas while leveraging many open-source technologies such as Hadoop (HDFS/YARN), Hudi, Kafka, Spark, Flink, Zookeeper etc. He got his PhD in computer science from UC Davis.