Filmed at https://2017.dotscale.io on April 24th in Paris. More talks on https://dotconferences.com/talks
What happens if you take everything that is happening in your company—every click, every database change, every application log—and make it all available as a real-time stream of well-structured data?
Neha discusses the experience at LinkedIn and elsewhere moving from batch-oriented ETL to real-time streams using Apache Kafka, including how the design and implementation of Kafka was driven by the goal of acting as a real-time platform for event data as well as some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, applications and data systems in a self-service fashion.
She describes how real-time streams can become the source of ETL into Hadoop or a relational data warehouse, how real-time data can supplement the role of batch-oriented analytics in Hadoop or a traditional data warehouse, and how applications and stream processing systems such as Storm, Spark, or Samza can make use of these feeds for sophisticated real-time data processing as events occur.