cta quote button

Best Cassandra Books that You Should Have on Your Bookshelf

Read More

How Much Does It Cost to Hire Web Developers in Ukraine?

Our pricing is completely transparent: you pay your engineers’ salaries and a flat monthly fee for our services. No hidden charges.

Read More

1. Cassandra: The Definitive Guide (2010)

What could you do with data if scalability wasn’t a problem? With this hands-on guide, you’ll learn how Apache Cassandra handles hundreds of terabytes of data while remaining highly available across multiple data centers — capabilities that have attracted Facebook, Twitter, and other data-intensive companies. Cassandra: The Definitive Guide provides the technical details and practical examples you need to assess this database management system and put it to work in a production environment.

Author Eben Hewitt demonstrates the advantages of Cassandra’s nonrelational design, and pays special attention to data modeling. If you’re a developer, DBA, application architect, or manager looking to solve a database scaling issue or future-proof your application, this guide shows you how to harness Cassandra’s speed and flexibility.

  • Understand the tenets of Cassandra’s column-oriented structure
  • Learn how to write, update, and read Cassandra data
  • Discover how to add or remove nodes from the cluster as your application requires
  • Examine a working application that translates from a relational model to Cassandra’s data model
  • Use examples for writing clients in Java, Python, and C#
  • Use the JMX interface to monitor a cluster’s usage, memory patterns, and more
  • Tune memory settings, data storage, and caching for better performance

Author(s): Eben Hewitt

2. Learn Cassandra in 1 Day: Definitive Guide to Learn Cassandra for Beginners (2017)

This book is a step by step beginners guide to learning Cassandra. The book uses tons of charts, graphs, images and code to aid your Cassandra learning.

The book gives a detailed introduction to Cassandra. It proceeds to give step-by-step instructions to installing Cassandra. Cassandra Architecture and Replication Factor Strategy is lucidly explained. Data Modelling, Keyspace CQL are also described in detail.

The book will teach you enough to get started with Cassandra.

Here is what is included

Chapter 1: Introduction

Cassandra History

Nosql Cassandra Database

Nosql Cassandra Database Vs Relational databases

Apache Cassandra Features

Cassandra Use Cases

Chapter 2: Download and Install

Prerequisite for Apache Cassandra Installation

How to Download and Install Cassandra

Chapter 3: Architecture

Components of Cassandra

Data Replication

Write Operation

Read Operation

Chapter 4: Data Model and Rules

Cassandra Data Model Rules

Model Your Data in Cassandra

Handling One to One Relationship

Handling one to many Relationship

Handling Many to Many Relationship

Chapter 5: Cassandra CQL

Create, Alter & Drop Keyspace

Cassandra Table: Create, Alter, Drop & Truncate

Cassandra Query Language(CQL): Insert, Update, Delete, Read Data

Create & Drop INDEX

Data Types & Expiration

SET, LIST & MAP

Chapter 6: Cassandra Cluster

Prerequisites for Cassandra Cluster

Enterprise Edition Installation

Starting Cassandra Node

Chapter 7: DevCenter & OpsCenter Installation

DevCenter Installation

OpsCenter Installation

Chapter 8: Security

What is Internal Authentication and Authorization

Configure Authentication and Authorization

Logging in

Create New User

Authorization

Configuring Firewall

Enabling JMX Authentication

★★★Download Free – For Kindle Unlimited Subscribers!★★★

Author(s): Krishna Rungta

3. NoSQL for Mere Mortals (2015)

The Easy, Common-Sense Guide to Solving Real Problems with NoSQL

 

The Mere Mortals ® tutorials have earned worldwide praise as the clearest, simplest way to master essential database technologies. Now, there’s one for today’s exciting new NoSQL databases. NoSQL for Mere Mortals guides you through solving real problems with NoSQL and achieving unprecedented scalability, cost efficiency, flexibility, and availability.

 

Drawing on 20+ years of cutting-edge database experience, Dan Sullivan explains the advantages, use cases, and terminology associated with all four main categories of NoSQL databases: key-value, document, column family, and graph databases. For each, he introduces pragmatic best practices for building high-value applications. Through step-by-step examples, you’ll discover how to choose the right database for each task, and use it the right way.

 

Coverage includes

 

–Getting started: What NoSQL databases are, how they differ from relational databases, when to use them, and when not to Data management principles and design criteria: Essential knowledge for creating any database solution, NoSQL or relational

–Key-value databases: Gaining more utility from data structures

–Document databases: Schemaless databases, normalization and denormalization, mutable documents, indexing, and design patterns

–Column family databases: Google’s BigTable design, table design, indexing, partitioning, and Big Data

 

Graph databases: Graph/network modeling, design tips, query methods, and traps to avoid

 

Whether you’re a database developer, data modeler, database user, or student, learning NoSQL can open up immense new opportunities. As thousands of database professionals already know,  For Mere Mortals is the fastest, easiest route to mastery.

 

Author(s): Dan Sullivan

4. Big Data: Principles and best practices of scalable realtime data systems (2015)

Summary

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they’re built.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Book

Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive.

Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You’ll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you’ll learn specific technologies like Hadoop, Storm, and NoSQL databases.

This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.

What’s Inside

  • Introduction to big data systems
  • Real-time processing of web-scale data
  • Tools like Hadoop, Cassandra, and Storm
  • Extensions to traditional database skills

About the Authors

Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.

Table of Contents

  1. A new paradigm for Big Data
  2. PART 1 BATCH LAYER
  3. Data model for Big Data
  4. Data model for Big Data: Illustration
  5. Data storage on the batch layer
  6. Data storage on the batch layer: Illustration
  7. Batch layer
  8. Batch layer: Illustration
  9. An example batch layer: Architecture and algorithms
  10. An example batch layer: Implementation
  11. PART 2 SERVING LAYER
  12. Serving layer
  13. Serving layer: Illustration
  14. PART 3 SPEED LAYER
  15. Realtime views
  16. Realtime views: Illustration
  17. Queuing and stream processing
  18. Queuing and stream processing: Illustration
  19. Micro-batch stream processing
  20. Micro-batch stream processing: Illustration
  21. Lambda Architecture in depth

Author(s): Nathan Marz, James Warren

5. Programming Pig: Dataflow Scripting with Hadoop (2011)

This guide is an ideal learning tool and reference for Apache Pig, the open source engine for executing parallel data flows on Hadoop. With Pig, you can batch-process data without having to create a full-fledged application—making it easy for you to experiment with new datasets.

Programming Pig introduces new users to Pig, and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell, and User Defined Functions (UDFs) for extending Pig. If you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig.

  • Delve into Pig’s data model, including scalar and complex data types
  • Write Pig Latin scripts to sort, group, join, project, and filter your data
  • Use Grunt to work with the Hadoop Distributed File System (HDFS)
  • Build complex data processing pipelines with Pig’s macros and modularity features
  • Embed Pig Latin in Python for iterative processing and other advanced tasks
  • Create your own load and store functions to handle data formats and storage mechanisms
  • Get performance tips for running scripts on Hadoop clusters in less time

Author(s): Alan Gates

6. Expert Apache Cassandra Administration (2017)

Follow this handbook to build, configure, tune, and secure Apache Cassandra databases. Start with the installation of Cassandra and move on to the creation of a single instance, and then a cluster of Cassandra databases.
Cassandra is increasingly a key player in many big data environments, and this book shows you how to use Cassandra with Apache Spark, a popular big data processing framework. Also covered are day-to-day topics of importance such as the backup and recovery of Cassandra databases, using the right compression and compaction strategies, and loading and unloading data. 
Expert Apache Cassandra Administration provides numerous step-by-step examples starting with the basics of a Cassandra database, and going all the way through backup and recovery, performance optimization, and monitoring and securing the data. The book serves as an authoritative and comprehensive guide to the building and management of simple to complex Cassandra databases. The book:
  • Takes you through building a Cassandra database from installation of the software and creation of a single database, through to complex clusters and data centers
  • Provides numerous examples of actual commands in a real-life Cassandra environment that show how to confidently configure, manage, troubleshoot, and tune Cassandra databases
  • Shows how to use the Cassandra configuration properties to build a highly stable, available, and secure Cassandra database that always operates at peak efficiency
What You’ll Learn
  • Install the Cassandra software and create your first database
  • Understand the Cassandra data model, and the internal architecture of a Cassandra database
  • Create your own Cassandra cluster, step-by-step
  • Run a Cassandra cluster on Docker
  • Work with Apache Spark by connecting to a Cassandra database
  • Deploy Cassandra clusters in your data center, or on Amazon EC2 instances
  • Back up and restore mission-critical Cassandra databases
  • Monitor, troubleshoot, and tune production Cassandra databases, and cut your spending on resources such as memory, servers, and storage
Who This Book Is For
Database administrators, developers, and architects who are looking for an authoritative and comprehensive single volume for all their Cassandra administration needs. Also for administrators who are tasked with setting up and maintaining highly reliable and high-performing Cassandra databases. An excellent choice for big data administrators, database administrators, architects, and developers who use Cassandra as their key data store, to support high volume online transactions, or as a decentralized, elastic data store.

Author(s): Sam R. Alapati

7. Learning Apache Cassandra – Second Edition (2017)

Key Features

  • Install Cassandra and set up multi-node clusters
  • Design rich schemas that capture the relationships between different data types
  • Master the advanced features available in Cassandra 3.x through a step-by-step tutorial and build a scalable, high performance database layer

Book Description

Cassandra is a distributed database that stands out thanks to its robust feature set and intuitive interface, while providing high availability and scalability of a distributed data store. This book will introduce you to the rich feature set offered by Cassandra, and empower you to create and manage a highly scalable, performant and fault-tolerant database layer.

The book starts by explaining the new features implemented in Cassandra 3.x and get you set up with Cassandra. Then you’ll walk through data modeling in Cassandra and the rich feature set available to design a flexible schema. Next you’ll learn to create tables with composite partition keys, collections and user-defined types and get to know different methods to avoid denormalization of data. You will then proceed to create user-defined functions and aggregates in Cassandra. Then, you will set up a multi node cluster and see how the dynamics of Cassandra change with it. Finally, you will implement some application-level optimizations using a Java client.

By the end of this book, you’ll be fully equipped to build powerful, scalable Cassandra database layers for your applications.

What you will learn

  • Install Cassandra
  • Create keyspaces and tables with multiple clustering columns to organize related data
  • Use secondary indexes and materialized views to avoid denormalization of data

Author(s): Sandeep Yarabarla

8. Mastering Apache Cassandra – Second Edition (2015)

Build, manage, and configure high-performing, reliable NoSQL database for your application with Cassandra

About This Book

  • Develop applications for modelling data with Cassandra 2
  • Manage large amounts of structured, semi-structured, and unstructured data with Cassandra
  • Explore a wide-range of Cassandra components and how they interact to create a robust, distributed system.

Who This Book Is For

The book is aimed at intermediate developers with an understanding of core database concepts who want to become a master at implementing Cassandra for their application.

What You Will Learn

  • Write programs using Cassandra’s features more efficiently
  • Get the most out of a given infrastructure, improve performance, and tweak JVM
  • Use CQL3 in your application, which makes working with Cassandra more simple
  • Configure Cassandra and fine-tune its parameters depending on your needs
  • Set up a cluster and learn how to scale it
  • Monitor Cassandra cluster in different ways
  • Use Hadoop and other big data processing tools with Cassandra

In Detail

With ever increasing rates of data creation comes the demand to store data as fast and reliably as possible, a demand met by modern databases such as Cassandra. Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Through this practical guide, you will program pragmatically and understand completely the power of Cassandra. Starting with a brief recap of the basics to get everyone up and running, you will move on to deploy and monitor a production setup, dive under the hood, and optimize and integrate it with other software.

You will explore the integration and interaction of Cassandra components, and explore great new features such as CQL3, vnodes, lightweight transactions, and triggers. Finally, by learning Hadoop and Pig, you will be able to analyze your big data.

Author(s): Nishant Neeraj

9. Beginning Apache Cassandra Development (2014)

Beginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON document model. It is specifically designed to manage large amounts of data across many commodity servers without there being any single point of failure. This design approach makes Apache Cassandra a robust and easy-to-implement platform when high availability is needed.

Apache Cassandra can be used by developers in Java, PHP, Python, and JavaScript—the primary and most commonly used languages. In Beginning Apache Cassandra Development, author and Cassandra expert Vivek Mishra takes you through using Apache Cassandra from each of these primary languages. Mishra also covers the Cassandra Query Language (CQL), the Apache Cassandra analog to SQL. You’ll learn to develop applications sourcing data from Cassandra, query that data, and deliver it at speed to your application’s users.

Cassandra is one of the leading NoSQL databases, meaning you get unparalleled throughput and performance without the sort of processing overhead that comes with traditional proprietary databases. Beginning Apache Cassandra Development will therefore help you create applications that generate search results quickly, stand up to high levels of demand, scale as your user base grows, ensure operational simplicity, and—not least—provide delightful user experiences.

Author(s): Vivek Mishra

10. Cassandra High Availability (2014)

Apache Cassandra is a massively scalable, peer-to-peer database designed for 100 percent uptime, with deployments in the tens of thousands of nodes supporting petabytes of data. This book offers readers a practical insight into building highly available, real-world applications using Apache Cassandra. 

The book starts with the fundamentals, helping you to understand how the architecture of Apache Cassandra allows it to achieve 100 percent uptime when other systems struggle to do so. You’ll have an excellent understanding of data distribution, replication, and Cassandra’s highly tunable consistency model. This is followed by an in-depth look at Cassandra’s robust support for multiple data centers, and how to scale out a cluster. Next, the book explores the domain of application design, with chapters discussing the native driver and data modeling. Lastly, you’ll find out how to steer clear of common antipatterns and take advantage of Cassandra’s ability to fail gracefully.

What you will learn:

  • Understand how the core architecture of Cassandra enables highly available applications
  • Use replication and tunable consistency levels to balance consistency, availability, and performance
  • Set up multiple data centers to enable failover, load balancing, and geographic distribution
  • Add capacity to your cluster with zero down time
  • Take advantage of high availability features in the native driver
  • Create data models that scale well and maximize availability
  • Understand common anti-patterns so you can avoid them
  • Keep your system working well even during failure scenarios

Author(s): Robbie Strickland

11. Big Data SMACK: A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka (2016)

Learn how to integrate full-stack open source big data architecture and to choose the correct technology―Scala/Spark, Mesos, Akka, Cassandra, and Kafka―in every layer. 

Big data architecture is becoming a requirement for many different enterprises. So far, however, the focus has largely been on collecting, aggregating, and crunching large data sets in a timely manner. In many cases now, organizations need more than one paradigm to perform efficient analyses.

Big Data SMACK explains each of the full-stack technologies and, more importantly, how to best integrate them. It provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples in every situation. This book focuses on the problems and scenarios solved by the architecture, as well as the solutions provided by every technology. It covers the six main concepts of big data architecture and how integrate, replace, and reinforce every layer:

  • The language: Scala
  • The engine: Spark (SQL, MLib, Streaming, GraphX)
  • The container: Mesos, Docker
  • The view: Akka
  • The storage: Cassandra
  • The message broker: Kafka
  • What You Will Learn:

    • Make big data architecture without using complex Greek letter architectures
    • Build a cheap but effective cluster infrastructure
    • Make queries, reports, and graphs that business demands
    • Manage and exploit unstructured and No-SQL data sources
    • Use tools to monitor the performance of your architecture
    • Integrate all technologies and decide which ones replace and which ones reinforce

    Who This Book Is For:

    Developers, data architects, and data scientists looking to integrate the most successful big data open stack architecture and to choose the correct technology in every layer

    Author(s): Raul Estrada, Isaac Ruiz