Florian travaille depuis plus de 8 ans dans le conseil, il est co-fondateur et CEO de StreamThoughts. Au cours de sa carrière, il a travaillé sur divers projets impliquant la mise en oeuvre de plateformes d’intégration et de traitement de la data à travers les mondes technologiques de Hadoop et de Spark. Passionné par les systèmes distribués, il se spécialise dans les technologies d’event-streaming comme Apache Kafka, Apache Pulsar. Aujourd’hui, il accompagne les entreprises dans leur transition vers les architectures orientées streaming d’événements. Florian est certifié Confluent Administrator & Developer pour Apache Kafka. Il est nommé deux années consécutive (2019 et 2020) “Confluent Community Catalyst” pour ses contributions sur le projet Apache Kafka Streams et son implication dans la communauté open-source. Il fait partie des organisateurs du Paris Apache Kafka Meetup.
Kafka for Administrators Training
Skills for operating and optimizing an Apache Kafka cluster
For more information about this training course, please free to contact :
training@streamthoughts.io
Description
This 3-day course provides participants with the skills to configure, administer, and optimize an Apache Kafka cluster to ensure reliability and performance in a production environment.
Course Objectives
This course enables participants to acquire the following skills:
- Understanding the uses of the Apache Kafka solution.
- Understanding the fundamental concepts of the Apache Kafka architecture.
- Understanding how the Kafka's Storage Layer works.
- Understanding how Producers and Consumer work
- Use tools to administer an Apache Kafka platform.
- Set up a data replication solution.
- Secure a Kafka cluster and applications.
- Configure and optimize a Kafka Broker.
- Monitor a cluster.
Pedagogy
50% theory, 50% practise
Who Should Attend ?
This course is intended for the following attendees: Developers, Architects, Data Engineers, System Administrators and DevOps.
Course Duration
3 Days
Course Prerequisites
Attendees should have a good knowledge of Linux/Unix, basic knowledge of TCP/IP networks. No previous knowledge of Apache Kafka is required.
Course Content
Module 1: Introduction to Apache Kafka
- Event Streaming, the motivations ?
- What is Apache Kafka
- The Apache Kafka project
- The key benefits of Kafka
- What is it used for ?
- The alternative solutions
- The Confluent Streaming Platform
Module 2: Kafka Fundamentals
- Broker, Message, Topic & Partitions
- Producers Basics
- Consumers & Consumer Groups
- Replication & Fault-tolerance
- Data retention and compression
- Understanding Zookeeper’s roles
- Understanding Kafka’s performance
Module 3: Replication, Fault Tolerance and Data Reliability
- Understanding Data Replication
- Understanding Replica Placement
- Managing Rack-Awareness
- Broker Controller
- Broker Recovery Process
- Producer’s Delivery Acknowledgment
- Idempotent Producer & Transaction (Exactly Once Semantics)
Module 4: Kafka’s Storage Layer
- Partitions and Segments Log Files
- Managing Page Cache
- Log Retention and Cleanup Policies
- Zookeeper
Module 5: Managing Kafka Consumers
- Consumer Groups
- Managing Offset
- Understanding the Consumer Rebalancing
- Monitoring Consumer Lag
Module 6 : Installing & Administrating a Kafka Cluster
- Installing and Running Kafka
- Managing Cluster Configurations
- Managing Topic Configurations
- Upgrading a Kafka Cluster
- Kafka Cluster Elasticity
- Capacity Planning
- Hardware & Deployment Considerations
Module 7: Deploying Kafka on multiple Data Centers
- Multiple Data Centers Deployment Strategies
- Managing Cross Data Center Replication (MirrorMaker 2)
Module 8: Optimizing a Kafka Cluster for performances
- Tuning Kafka Producers and Consumers
- Tuning Kafka Brokers Write Path
- Tuning Kafka Brokers Read Path
- Managing Kafka Shutdown and Restart
- Testing a Kafka Cluster.
Module 9: Security
- Consumer Groups
- Managing Offset
- Understanding the Consumer Rebalancing
- Monitoring Consumer Lag