cta quote button

Top 10 CUDA Books You Should Read

Read More

How Much Does It Cost to Hire Web Developers in Ukraine?

Our pricing is completely transparent: you pay your engineers’ salaries and a flat monthly fee for our services. No hidden charges.

Read More

1. Professional CUDA C Programming (2014)

Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide

Designed for professionals across multiple industrial sectors, Professional CUDA C Programming  presents CUDA — a parallel computing platform and programming model designed to ease the development of GPU programming — fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers to explore both the “hard” and “soft” aspects of GPU programming.

Computing architectures are experiencing a fundamental shift toward scalable parallel computing motivated by application requirements in industry and science. This book demonstrates the challenges of efficiently utilizing compute resources at peak performance, presents modern techniques for tackling these challenges, while increasing accessibility for professionals who are not necessarily parallel programming experts. The CUDA programming model and tools empower developers to write high-performance applications on a scalable, parallel computing platform: the GPU. However, CUDA itself can be difficult to learn without extensive programming experience. Recognized CUDA authorities John Cheng, Max Grossman, and Ty McKercher guide readers through essential GPU programming skills and best practices in Professional CUDA C Programming, including:

  • CUDA Programming Model
  • GPU Execution Model
  • GPU Memory model
  • Streams, Event and Concurrency
  • Multi-GPU Programming
  • CUDA Domain-Specific Libraries
  • Profiling and Performance Tuning

The book makes complex CUDA concepts easy to understand for anyone with knowledge of basic software development with exercises designed to be both readable and high-performance. For the professional seeking entrance to parallel computing and the high-performance computing community, Professional CUDA C Programming is an invaluable resource, with the most current information available on the market.

Author(s): John Cheng, Max Grossman

2. CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs (Applications of Gpu Computing) (2012)

If you need to learn CUDA but don’t have experience with parallel computing, CUDA Programming: A Developer’s Introduction offers a detailed guide to CUDA with a grounding in parallel fundamentals. It starts by introducing CUDA and bringing you up to speed on GPU parallelism and hardware, then delving into CUDA installation. Chapters on core concepts including threads, blocks, grids, and memory focus on both parallel and CUDA-specific issues. Later, the book demonstrates CUDA in practice for optimizing applications, adjusting to new hardware, and solving common problems.

  • Comprehensive introduction to parallel programming with CUDA, for readers new to both
  • Detailed instructions help readers optimize the CUDA software development kit
  • Practical techniques illustrate working with memory, threads, algorithms, resources, and more
  • Covers CUDA on multiple hardware platforms: Mac, Linux and Windows with several NVIDIA chipsets
  • Each chapter includes exercises to test reader knowledge

Author(s): Shane Cook

3. CUDA by Example: An Introduction to General-Purpose GPU Programming (2010)

“This book is required reading for anyone working with accelerator-based computing systems.”

–From the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory

CUDA is a computing architecture designed to facilitate the development of parallel programs. In conjunction with a comprehensive software platform, the CUDA Architecture enables programmers to draw on the immense power of graphics processing units (GPUs) when building high-performance applications. GPUs, of course, have long been available for demanding graphics and game applications. CUDA now brings this valuable resource to programmers working on applications in other domains, including science, engineering, and finance. No knowledge of graphics programming is required–just the ability to program in a modestly extended version of C.


CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature. You’ll discover when to use each CUDA C extension and how to write CUDA software that delivers truly outstanding performance.


Major topics covered include

  • Parallel programming
  • Thread cooperation
  • Constant memory and events
  • Texture memory
  • Graphics interoperability
  • Atomics
  • Streams
  • CUDA C on multiple GPUs
  • Advanced atomics
  • Additional CUDA resources

All the CUDA software tools you’ll need are freely available for download from NVIDIA.


Author(s): Jason Sanders, Edward Kandrot

4. Programming Massively Parallel Processors, Third Edition: A Hands-on Approach (2016)

Programming Massively Parallel Processors: A Hands-on Approach, Third Edition shows both student and professional alike the basic concepts of parallel programming and GPU architecture, exploring, in detail, various techniques for constructing parallel programs.

Case studies demonstrate the development process, detailing computational thinking and ending with effective and efficient parallel programs. Topics of performance, floating-point format, parallel patterns, and dynamic parallelism are covered in-depth.

For this new edition, the authors have updated their coverage of CUDA, including coverage of newer libraries, such as CuDNN, moved content that has become less important to appendices, added two new chapters on parallel patterns, and updated case studies to reflect current industry practices.

  • Teaches computational thinking and problem-solving techniques that facilitate high-performance parallel computing
  • Utilizes CUDA version 7.5, NVIDIA’s software development tool created specifically for massively parallel environments
  • Contains new and updated case studies
  • Includes coverage of newer libraries, such as CuDNN for Deep Learning

Author(s): David B. Kirk, Wen-mei W. Hwu

5. CUDA Handbook: A Comprehensive Guide to GPU Programming, The (2013)


The CUDA Handbook begins where CUDA by Example (Addison-Wesley, 2011) leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes commands and how the driver checks progress; more experienced CUDA developers will appreciate the expert coverage of topics such as the driver API and context migration, as well as the guidance on how best to structure CPU/GPU data interchange and synchronization.


The accompanying open source code–more than 25,000 lines of it, freely available at www.cudahandbook.com–is specifically intended to be reused and repurposed by developers.


Designed to be both a comprehensive reference and a practical cookbook, the text is divided into the following three parts:

Part I, Overview, gives high-level descriptions of the hardware and software that make CUDA possible.

Part II, Details, provides thorough descriptions of every aspect of CUDA, including

  •  Memory
  • Streams and events
  •  Models of execution, including the dynamic parallelism feature, new with CUDA 5.0 and SM 3.5
  • The streaming multiprocessors, including descriptions of all features through SM 3.5
  • Programming multiple GPUs
  • Texturing

The source code accompanying Part II is presented as reusable microbenchmarks and microdemos, designed to expose specific hardware characteristics or highlight specific use cases.

Part III, Select Applications, details specific families of CUDA applications and key parallel algorithms, including

  •  Streaming workloads
  • Reduction
  • Parallel prefix sum (Scan)
  • N-body
  • Image Processing

These algorithms cover the full range of potential CUDA applications.


Author(s): Nicholas Wilt

6. CUDA for Engineers: An Introduction to High-Performance Parallel Computing (2015)

CUDA for Engineers

Author(s): Duane Storti, Mete Yurtoglu

7. The CUDA Handbook: A Comprehensive Guide to GPU Programming (2nd Edition) (2018)

The CUDA Handbook is the only comprehensive reference to CUDA that exists. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes commands and how the driver checks progress; more experienced CUDA developers will appreciate the expert coverage of topics such as the driver API and context migration, as well as the guidance on how best to structure CPU/GPU data interchange and synchronization.


The accompanying open source code—more than 30,000 lines of it, freely available from github — is specifically intended to be reused and repurposed by developers.

Author(s): Nicholas Wilt

8. CUDA Fortran for Scientists and Engineers: Best Practices for Efficient CUDA Fortran Programming (2013)

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran.

To help you add CUDA Fortran to existing Fortran codes, the book explains how to understand the target GPU architecture, identify computationally intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance. All of this is done in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison.

  • Leverage the power of GPU computing with PGI’s CUDA Fortran compiler
  • Gain insights from members of the CUDA Fortran language development team
  • Includes multi-GPU programming in CUDA Fortran, covering both peer-to-peer and message passing interface (MPI) approaches
  • Includes full source code for all the examples and several case studies
  • Download source code and slides from the book’s companion website

Author(s): Gregory Ruetsch, Massimiliano Fatica

9. Cuda Programming-A Developers Guide To Parrallel Computing With Gpus (2014)

The Book is brand new.Guaranteed customer satisfaction.

Author(s): Cook

10. GPU Programming in MATLAB (2016)

GPU programming in MATLAB is intended for scientists, engineers, or students who develop or maintain applications in MATLAB and would like to accelerate their codes using GPU programming without losing the many benefits of MATLAB. The book starts with coverage of the Parallel Computing Toolbox and other MATLAB toolboxes for GPU computing, which allow applications to be ported straightforwardly onto GPUs without extensive knowledge of GPU programming. The next part covers built-in, GPU-enabled features of MATLAB, including options to leverage GPUs across multicore or different computer systems. Finally, advanced material includes CUDA code in MATLAB and optimizing existing GPU applications. Throughout the book, examples and source codes illustrate every concept so that readers can immediately apply them to their own development.

  • Provides in-depth, comprehensive coverage of GPUs with MATLAB, including the parallel computing toolbox and built-in features for other MATLAB toolboxes
  • Explains how to accelerate computationally heavy applications in MATLAB without the need to re-write them in another language
  • Presents case studies illustrating key concepts across multiple fields
  • Includes source code, sample datasets, and lecture slides

Author(s): Nikolaos Ploskas, Nikolaos Samaras

11. Parallel Programming with OpenACC (2016)

Parallel Programming with OpenACC is a modern, practical guide to implementing dependable computing systems. The book explains how anyone can use OpenACC to quickly ramp-up application performance using high-level code directives called pragmas. The OpenACC directive-based programming model is designed to provide a simple, yet powerful, approach to accelerators without significant programming effort.

Author Rob Farber, working with a team of expert contributors, demonstrates how to turn existing applications into portable GPU accelerated programs that demonstrate immediate speedups. The book also helps users get the most from the latest NVIDIA and AMD GPU plus multicore CPU architectures (and soon for Intel® Xeon Phi™ as well). Downloadable example codes provide hands-on OpenACC experience for common problems in scientific, commercial, big-data, and real-time systems.

Topics include writing reusable code, asynchronous capabilities, using libraries, multicore clusters, and much more. Each chapter explains how a specific aspect of OpenACC technology fits, how it works, and the pitfalls to avoid. Throughout, the book demonstrates how the use of simple working examples that can be adapted to solve application needs.

  • Presents the simplest way to leverage GPUs to achieve application speedups
  • Shows how OpenACC works, including working examples that can be adapted for application needs
  • Allows readers to download source code and slides from the book’s companion web page

Author(s): Rob Farber

12. GPU parallel computing for machine learning in Python: how to build a parallel computer (2017)

This book illustrates how to build a GPU parallel computer. If you don’t want to waste your time for building, you can buy a built-in-GPU desktop/laptop machine. All you need to do is to install GPU-enabled software for parallel computing. Imagine that we are in the midst of a parallel computing era. The GPU parallel computer is suitable for machine learning, deep (neural network) learning. For example, GeForce GTX1080 Ti is a GPU board with 3584 CUDA cores. Using the GeForce GTX1080 Ti, the performance is roughly 20 times faster than that of an INTEL i7 quad-core CPU. We have benchmarked the MNIST hand-written digits recognition problem (60,000 persons: hand-written digits from 0 to 9). The result of MNIST benchmark for machine learning shows that GPU of a single GeForce GTX1080 Ti board takes only less than 48 seconds while the INTEL i7 quad-core CPU requires 15 minutes and 42 seconds. A CUDA core is most commonly referring to the single-precision floating point units in an SM (streaming multiprocessor). A CUDA core can initiate one single precision floating point instruction per clock cycle. CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing. The GPU parallel computer is based on SIMD ( single instruction, multiple data) computing. The first GPU for neural networks was used by Kyoung-Su Oh, et al. for image processing published in 2004 (1). A minimum GPU parallel computer is composed of a CPU board and a GPU board. This book contains the important issue on which CPU/GPU board you should buy and also illustrates how to integrate them in a single box by considering the heat problem. The power consumption of GPU is so large that we should take care of the temperature and heat from the GPU board in the single box. Our goal is to have the faster parallel computer with lower power dissipation. Software installation is another critical issue for machine learning in Python. Two operating system examples including Ubuntu16.04 and Windows 10 system will be described. This book shows how to install CUDA and cudnnlib in two operating systems. Three frameworks including pytorch, keras, and chainer for machine learning on CUDA and cudnnlib will be introduced. Matching problems between operating system (Ubuntu, Windows 10), library (CUDA, cudnnlib), and machine learning framework (pytorch, keras, chainer) are discussed. The paper entitled “GPU” and “open source software” play a key role for advancing deep learning was published in Science (eLetter, July 20 2017) http://science.sciencemag.org/content/357/6346/16/tab-e-letters

Author(s): Yoshiyasu Takefuji