cta quote button

Best Books to Learn Apache Kafka

Read More

How Much Does It Cost to Hire Web Developers in Ukraine?

Our pricing is completely transparent: you pay your engineers’ salaries and a flat monthly fee for our services. No hidden charges.

Read More

1. Apache Kafka (2013)

Apache Kafka is the platform that handles real-time data feeds with a high-throughput, and this book is all you need to harness its power, quickly and painlessly. A step by step tutorial with a practical approach.

Overview

  • Write custom producers and consumers with message partition techniques
  • Integrate Kafka with Apache Hadoop and Storm for use cases such as processing streaming data
  • Provide an overview of Kafka tools and other contributions that work with Kafka in areas such as logging, packaging, and so on

In Detail

Message publishing is a mechanism of connecting heterogeneous applications together with messages that are routed between them, for example by using a message broker like Apache Kafka. Such solutions deal with real-time volumes of information and route it to multiple consumers without letting information producers know who the final consumers are.

Apache Kafka is a practical, hands-on guide providing you with a series of step-by-step practical implementations, which will help you take advantage of the real power behind Kafka, and give you a strong grounding for using it in your publisher-subscriber based architectures.

Apache Kafka takes you through a number of clear, practical implementations that will help you to take advantage of the power of Apache Kafka, quickly and painlessly. You will learn everything you need to know for setting up Kafka clusters. This book explains how Kafka basic blocks like producers, brokers, and consumers actually work and fit together. You will then explore additional settings and configuration changes to achieve ever more complex goals. Finally you will learn how Kafka works with other tools like Hadoop, Storm, and so on.

You will learn everything you need to know to work with Apache Kafka in the right format, as well as how to leverage its power of handling hundreds of megabytes of messages per second from multiple clients.

What you will learn from this book

  • Download and build Kafka
  • Set up single as well as multi-node Kafka clusters and send messages
  • Learn Kafka design internals and message compression
  • Understand how replication works in Kafka
  • Write Kafka message producers and consumers using the Kafka producer API
  • Get an overview of consumer configurations
  • Integrate Kafka with Apache Hadoop and Storm
  • Use Kafka administration tools

Approach

The book will follow a step-by-step tutorial approach which will show the readers how to use Apache Kafka for messaging from scratch.

Who this book is written for

Apache Kafka is for readers with software development experience, but no prior exposure to Apache Kafka or similar technologies is assumed. This book is also for enterprise application developers and big data enthusiasts who have worked with other publisher-subscriber based systems and now want to explore Apache Kafka as a futuristic scalable solution.

Author(s): Nishant Garg

2. Apache Kafka 1.0 Cookbook: Over 100 practical recipes on using distributed enterprise messaging to handle real-time data (2017)

Simplify real-time data processing by leveraging the power of Apache Kafka 1.0

Key Features

  • Use Kafka 1.0 features such as Confluent platforms and Kafka streams to build efficient streaming data applications to handle and process your data
  • Integrate Kafka with other Big Data tools such as Apache Hadoop, Apache Spark, and more
  • Hands-on recipes to help you design, operate, maintain, and secure your Apache Kafka cluster with ease

Book Description

Apache Kafka provides a unified, high-throughput, low-latency platform to handle real-time data feeds. This book will show you how to use Kafka efficiently, and contains practical solutions to the common problems that developers and administrators usually face while working with it.

This practical guide contains easy-to-follow recipes to help you set up, configure, and use Apache Kafka in the best possible manner. You will use Apache Kafka Consumers and Producers to build effective real-time streaming applications. The book covers the recently released Kafka version 1.0, the Confluent Platform and Kafka Streams. The programming aspect covered in the book will teach you how to perform important tasks such as message validation, enrichment and composition.Recipes focusing on optimizing the performance of your Kafka cluster, and integrate Kafka with a variety of third-party tools such as Apache Hadoop, Apache Spark, and Elasticsearch will help ease your day to day collaboration with Kafka greatly. Finally, we cover tasks related to monitoring and securing your Apache Kafka cluster using tools such as Ganglia and Graphite.

If you’re looking to become the go-to person in your organization when it comes to working with Apache Kafka, this book is the only resource you need to have.

What you will learn

  • Install and configure Apache Kafka 1.0 to get optimal performance
  • Create and configure Kafka Producers and Consumers
  • Operate your Kafka clusters efficiently by implementing the mirroring technique
  • Work with the new Confluent platform and Kafka streams, and achieve high availability with Kafka
  • Monitor Kafka using tools such as Graphite and Ganglia
  • Integrate Kafka with third-party tools such as Elasticsearch, Logstash, Apache Hadoop, Apache Spark, and more

Who This Book Is For

This book is for developers and Kafka administrators who are looking for quick, practical solutions to problems encountered while operating, managing or monitoring Apache Kafka. If you are a developer, some knowledge of Scala or Java will help, while for administrators, some working knowledge of Kafka will be useful.

Table of Contents

  1. Configuring Kafka
  2. Kafka Clusters
  3. Message Validation
  4. Message Enrichment
  5. The Confluent Platform
  6. Kafka Streams
  7. Managing Kafka
  8. Operating Kafka
  9. Monitoring and Security
  10. Third Party Tools integration

Author(s): Raúl Estrada

3. Streaming Architecture: New Designs Using Apache Kafka and MapR Streams (2016)

More and more data-driven companies are looking to adopt stream processing and streaming analytics. With this concise ebook, you’ll learn best practices for designing a reliable architecture that supports this emerging big-data paradigm.

Authors Ted Dunning and Ellen Friedman (Real World Hadoop) help you explore some of the best technologies to handle stream processing and analytics, with a focus on the upstream queuing or message-passing layer. To illustrate the effectiveness of these technologies, this book also includes specific use cases.

Ideal for developers and non-technical people alike, this book describes:

  • Key elements in good design for streaming analytics, focusing on the essential characteristics of the messaging layer
  • New messaging technologies, including Apache Kafka and MapR Streams, with links to sample code
  • Technology choices for streaming analytics: Apache Spark Streaming, Apache Flink, Apache Storm, and Apache Apex
  • How stream-based architectures are helpful to support microservices
  • Specific use cases such as fraud detection and geo-distributed data streams

Ted Dunning is Chief Applications Architect at MapR Technologies, and active in the open source community. He currently serves as VP for Incubator at the Apache Foundation, as a champion and mentor for a large number of projects, and as committer and PMC member of the Apache ZooKeeper and Drill projects. Ted is on Twitter as @ted_dunning.

Ellen Friedman, a committer for the Apache Drill and Apache Mahout projects, is a solutions consultant and well-known speaker and author, currently writing mainly about big data topics. With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics. Ellen is on Twitter as @Ellen_Friedman.

Author(s): Ted Dunning, Ellen Friedman

4. Building Data Streaming Applications with Apache Kafka: Design, develop and streamline applications using Apache Kafka, Storm, Heron and Spark (2017)

Design and administer fast, reliable enterprise messaging systems with Apache Kafka

About This Book

  • Build efficient real-time streaming applications in Apache Kafka to process data streams of data
  • Master the core Kafka APIs to set up Apache Kafka clusters and start writing message producers and consumers
  • A comprehensive guide to help you get a solid grasp of the Apache Kafka concepts in Apache Kafka with pracitcalpractical examples

Who This Book Is For

If you want to learn how to use Apache Kafka and the different tools in the Kafka ecosystem in the easiest possible manner, this book is for you. Some programming experience with Java is required to get the most out of this book

What You Will Learn

  • Learn the basics of Apache Kafka from scratch
  • Use the basic building blocks of a streaming application
  • Design effective streaming applications with Kafka using Spark, Storm &, and Heron
  • Understand the importance of a low -latency , high- throughput, and fault-tolerant messaging system
  • Make effective capacity planning while deploying your Kafka Application
  • Understand and implement the best security practices

In Detail

Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur.

This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security.

By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.

Style and approach

A step-by –step, comprehensive guide filled with practical and real- world examples

Author(s): Manish Kumar, Chanchal Singh

5. Learning Apache Kafka, Second Edition (2015)

Start from scratch and learn how to administer Apache Kafka effectively for messaging

About This Book

  • Quickly set up Apache Kafka clusters and start writing message producers and consumers
  • Write custom producers and consumers with message partition techniques
  • Integrate Kafka with Apache Hadoop and Storm for use cases such as processing streaming data

Who This Book Is For

This book is for readers who want to know more about Apache Kafka at a hands-on level; the key audience is those with software development experience but no prior exposure to Apache Kafka or similar technologies. It is also useful for enterprise application developers and big data enthusiasts who have worked with other publisher-subscriber-based systems and want to explore Apache Kafka as a futuristic solution.

What You Will Learn

  • Set up both single- and multi-node Kafka clusters and start sending messages
  • Understand the internals of Kafka’s design and learn about message compression and replication in Kafka
  • Explore additional settings and configuration changes to achieve ever more complex goals
  • Write Kafka message producers and custom consumers using the Kafka API
  • Integrate Kafka with Apache Hadoop and Storm
  • Integrate Kafka with other tools for logging, packaging, and so on
  • Administer Kafka effectively and consistently with cluster management tools

In Detail

Kafka is one of those systems that is very simple to describe at a high level but has an incredible depth of technical detail when you dig deeper.

Learning Apache Kafka Second Edition provides you with step-by-step, practical examples that help you take advantage of the real power of Kafka and handle hundreds of megabytes of messages per second from multiple clients. This book teaches you everything you need to know, right from setting up Kafka clusters to understanding basic blocks like producer, broker, and consumer blocks. Once you are all set up, you will then explore additional settings and configuration changes to achieve ever more complex goals. You will also learn how Kafka is designed internally and what configurations make it more effective. Finally, you will learn how Kafka works with other tools such as Hadoop, Storm, and so on.

Author(s): Nishant Garg

6. Apache Kafka Cookbook (2015)

Over 50 hands-on recipes to efficiently administer, maintain, and use your Apache Kafka installation

About This Book

  • Quickly configure and manage your Kafka cluster
  • Learn how to use the Apache Kafka cluster and connect it with tools for big data processing
  • A practical guide to monitor your Apache Kafka installation

Who This Book Is For

If you are a programmer or big data engineer using or planning to use Apache Kafka, then this book is for you. This book has several recipes which will teach you how to effectively use Apache Kafka. You need to have some basic knowledge of Java. If you don’t know big data tools, this would be your stepping stone for learning how to consume the data in these kind of systems.

What You Will Learn

  • Learn how to configure Kafka brokers for better efficiency
  • Explore how to configure producers and consumers for optimal performance
  • Set up tools for maintaining and operating Apache Kafka
  • Create producers and consumers for Apache Kafka in Java
  • Understand how Apache Kafka can be used by several third party system for big data processing, such as Apache Storm, Apache Spark, Hadoop, and more
  • Monitor Apache Kafka using tools like graphite and Ganglia

In Detail

This book will give you details about how to manage and administer your Apache Kafka Cluster.

We will cover topics like how to configure your broker, producer, and consumer for maximum efficiency for your situation. Also, you will learn how to maintain and administer your cluster for fault tolerance. We will also explore tools provided with Apache Kafka to do regular maintenance operations. We shall also look at how to easily integrate Apache Kafka with big data tools like Hadoop, Apache Spark, Apache Storm, and Elasticsearch.

Style and approach

Easy-to-follow, step-by-step recipes explaining from start to finish how to accomplish real-world tasks.

Author(s): Saurabh Minni

7. Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka (2016)

Learn how to integrate full-stack open source big data architecture and to choose the correct technology―Scala/Spark, Mesos, Akka, Cassandra, and Kafka―in every layer. 

Big data architecture is becoming a requirement for many different enterprises. So far, however, the focus has largely been on collecting, aggregating, and crunching large data sets in a timely manner. In many cases now, organizations need more than one paradigm to perform efficient analyses.

Big Data SMACK explains each of the full-stack technologies and, more importantly, how to best integrate them. It provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation. This book focuses on the problems and scenarios solved by the architecture, as well as the solutions provided by every technology. It covers the six main concepts of big data architecture and how integrate, replace, and reinforce every layer:

  • The language: Scala
  • The engine: Spark (SQL, MLib, Streaming, GraphX)
  • The container: Mesos, Docker
  • The view: Akka
  • The storage: Cassandra
  • The message broker: Kafka
  • What You Will Learn:

    • Make big data architecture without using complex Greek letter architectures
    • Build a cheap but effective cluster infrastructure
    • Make queries, reports, and graphs that business demands
    • Manage and exploit unstructured and No-SQL data sources
    • Use tools to monitor the performance of your architecture
    • Integrate all technologies and decide which ones replace and which ones reinforce

    Who This Book Is For:

    Developers, data architects, and data scientists looking to integrate the most successful big data open stack architecture and to choose the correct technology in every layer

    Author(s): Raul Estrada, Isaac Ruiz

    8. Mastering Apache Storm: Real-time big data streaming using Kafka, Hbase and Redis (2017)

    Key Features

    • Exploit the various real-time processing functionalities offered by Apache Storm such as parallelism, data partitioning, and more
    • Integrate Storm with other Big Data technologies like Hadoop, HBase, and Apache Kafka
    • An easy-to-understand guide to effortlessly create distributed applications with Storm

    Book Description

    Apache Storm is a real-time Big Data processing framework that processes large amounts of data reliably, guaranteeing that every message will be processed. Storm allows you to scale your data as it grows, making it an excellent platform to solve your big data problems. This extensive guide will help you understand right from the basics to the advanced topics of Storm.

    The book begins with a detailed introduction to real-time processing and where Storm fits in to solve these problems. You’ll get an understanding of deploying Storm on clusters by writing a basic Storm Hello World example. Next we’ll introduce you to Trident and you’ll get a clear understanding of how you can develop and deploy a trident topology. We cover topics such as monitoring, Storm Parallelism, scheduler and log processing, in a very easy to understand manner. You will also learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm.

    With real-world examples and clear explanations, this book will ensure you will have a thorough mastery of Apache Storm. You will be able to use this knowledge to develop efficient, distributed real-time applications to cater to your business needs.

    What you will learn

    • Understand the core concepts of Apache Storm and real-time processing
    • Follow the steps

    Author(s): Ankit Jain

    9. Event Streams in Action: Unified log processing with Kafka and Kinesis (2018)

    Event Streams in Action is a foundational book introducing the ULP paradigm and presenting techniques to use it effectively in data-rich environments. The book begins with an architectural overview, illustrating how ULP addresses the thorny issues associated with processing data from multiple sources. It then guides the reader through examples using the unified log technologies Apache Kafka and Amazon Kinesis and a variety of stream processing frameworks and analytics databases.

    Readers learn to aggregate events from multiple sources, store them in a unified log, and build data processing applications on the resulting event streams. As readers progress through the book, they learn how to validate, filter, enrich, and store event streams, master key stream processing approaches, and explore important patterns like the lambda architecture, stream aggregation, and event re-processing. The book also dives into the methods and tools usable for event modelling and event analytics, along with scaling, resiliency, and advanced stream patterns.

    Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

    Author(s): Alexander Dean

    10. Complete Guide to Open Source Big Data Stack (2018)

    See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together.

    In the Complete Guide to Open Source Big Data Stack, the author begins by creating a private cloud and then installs and examines Apache Brooklyn. After that, he uses each chapter to introduce one piece of the big data stack―sharing how to source the software and how to install it. You learn by simple example, step by step and chapter by chapter, as a real big data stack is created. The book concentrates on Apache-based systems and shares detailed examples of cloud storage, release management, resource management, processing, queuing, frameworks, data visualization, and more.

    What You’ll Learn

    • Install a private cloud onto the local cluster using Apache cloud stack
    • Source, install, and configure Apache: Brooklyn, Mesos, Kafka, and Zeppelin
    • See how Brooklyn can be used to install Mule ESB on a cluster and Cassandra in the cloud
    • Install and use DCOS for big data processing
    • Use Apache Spark for big data stack data processing

    Who This Book Is For

    Developers, architects, IT project managers, database administrators, and others charged with developing or supporting a big data system. It is also for anyone interested in Hadoop or big data, and those experiencing problems with data size.

    Author(s): Michael Frampton

    11. Professional Hadoop (2016)

    The professional’s one-stop guide to this open-source, Java-based big data framework

    Professional Hadoop is the complete reference and resource for experienced developers looking to employ Apache Hadoop in real-world settings. Written by an expert team of certified Hadoop developers, committers, and Summit speakers, this book details every key aspect of Hadoop technology to enable optimal processing of large data sets. Designed expressly for the professional developer, this book skips over the basics of database development to get you acquainted with the framework’s processes and capabilities right away. The discussion covers each key Hadoop component individually, culminating in a sample application that brings all of the pieces together to illustrate the cooperation and interplay that make Hadoop a major big data solution. Coverage includes everything from storage and security to computing and user experience, with expert guidance on integrating other software and more.

    Hadoop is quickly reaching significant market usage, and more and more developers are being called upon to develop big data solutions using the Hadoop framework. This book covers the process from beginning to end, providing a crash course for professionals needing to learn and apply Hadoop quickly.

    • Configure storage, UE, and in-memory computing
    • Integrate Hadoop with other programs including Kafka and Storm
    • Master the fundamentals of Apache Big Top and Ignite
    • Build robust data security with expert tips and advice

    Hadoop’s popularity is largely due to its accessibility. Open-source and written in Java, the framework offers almost no barrier to entry for experienced database developers already familiar with the skills and requirements real-world programming entails. Professional Hadoop gives you the practical information and framework-specific skills you need quickly.

    Author(s): Benoy Antony, Konstantin Boudnik