1. Professional CUDA C Programming (2014)
Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide
Designed for professionals across multiple industrial sectors, Professional CUDA C Programming presents CUDA — a parallel computing platform and programming model designed to ease the development of GPU programming — fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers to explore both the “hard” and “soft” aspects of GPU programming.
Computing architectures are experiencing a fundamental shift toward scalable parallel computing motivated by application requirements in industry and science. This book demonstrates the challenges of efficiently utilizing compute resources at peak performance, presents modern techniques for tackling these challenges, while increasing accessibility for professionals who are not necessarily parallel programming experts. The CUDA programming model and tools empower developers to write high-performance applications on a scalable, parallel computing platform: the GPU. However, CUDA itself can be difficult to learn without extensive programming experience. Recognized CUDA authorities John Cheng, Max Grossman, and Ty McKercher guide readers through essential GPU programming skills and best practices in Professional CUDA C Programming, including:
- CUDA Programming Model
- GPU Execution Model
- GPU Memory model
- Streams, Event and Concurrency
- Multi-GPU Programming
- CUDA Domain-Specific Libraries
- Profiling and Performance Tuning
The book makes complex CUDA concepts easy to understand for anyone with knowledge of basic software development with exercises designed to be both readable and high-performance. For the professional seeking entrance to parallel computing and the high-performance computing community, Professional CUDA C Programming is an invaluable resource, with the most current information available on the market.
Author(s): John Cheng, Max Grossman
2. CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs (Applications of Gpu Computing) (2012)
If you need to learn CUDA but don’t have experience with parallel computing, CUDA Programming: A Developer’s Introduction offers a detailed guide to CUDA with a grounding in parallel fundamentals. It starts by introducing CUDA and bringing you up to speed on GPU parallelism and hardware, then delving into CUDA installation. Chapters on core concepts including threads, blocks, grids, and memory focus on both parallel and CUDA-specific issues. Later, the book demonstrates CUDA in practice for optimizing applications, adjusting to new hardware, and solving common problems.
- Comprehensive introduction to parallel programming with CUDA, for readers new to both
- Detailed instructions help readers optimize the CUDA software development kit
- Practical techniques illustrate working with memory, threads, algorithms, resources, and more
- Covers CUDA on multiple hardware platforms: Mac, Linux and Windows with several NVIDIA chipsets
- Each chapter includes exercises to test reader knowledge
Author(s): Shane Cook
–From the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory
“This book is required reading for anyone working with accelerator-based computing systems.”
CUDA is a computing architecture designed to facilitate the development of parallel programs. In conjunction with a comprehensive software platform, the CUDA Architecture enables programmers to draw on the immense power of graphics processing units (GPUs) when building high-performance applications. GPUs, of course, have long been available for demanding graphics and game applications. CUDA now brings this valuable resource to programmers working on applications in other domains, including science, engineering, and finance. No knowledge of graphics programming is required–just the ability to program in a modestly extended version of C.
CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature. You’ll discover when to use each CUDA C extension and how to write CUDA software that delivers truly outstanding performance.
Major topics covered include
- Parallel programming
- Thread cooperation
- Constant memory and events
- Texture memory
- Graphics interoperability
- CUDA C on multiple GPUs
- Advanced atomics
- Additional CUDA resources
All the CUDA software tools you’ll need are freely available for download from NVIDIA.http://developer.nvidia.com/object/cuda-by-example.html
Author(s): Jason Sanders, Edward Kandrot
Programming Massively Parallel Processors: A Hands-on Approach, Third Edition shows both student and professional alike the basic concepts of parallel programming and GPU architecture, exploring, in detail, various techniques for constructing parallel programs.
Case studies demonstrate the development process, detailing computational thinking and ending with effective and efficient parallel programs. Topics of performance, floating-point format, parallel patterns, and dynamic parallelism are covered in-depth.
For this new edition, the authors have updated their coverage of CUDA, including coverage of newer libraries, such as CuDNN, moved content that has become less important to appendices, added two new chapters on parallel patterns, and updated case studies to reflect current industry practices.
- Teaches computational thinking and problem-solving techniques that facilitate high-performance parallel computing
- Utilizes CUDA version 7.5, NVIDIA’s software development tool created specifically for massively parallel environments
- Contains new and updated case studies
- Includes coverage of newer libraries, such as CuDNN for Deep Learning
Author(s): David B. Kirk, Wen-mei W. Hwu
The CUDA Handbook begins where CUDA by Example (Addison-Wesley, 2011) leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes commands and how the driver checks progress; more experienced CUDA developers will appreciate the expert coverage of topics such as the driver API and context migration, as well as the guidance on how best to structure CPU/GPU data interchange and synchronization.
The accompanying open source code–more than 25,000 lines of it, freely available at www.cudahandbook.com–is specifically intended to be reused and repurposed by developers.
Designed to be both a comprehensive reference and a practical cookbook, the text is divided into the following three parts:
Part I, Overview, gives high-level descriptions of the hardware and software that make CUDA possible.
Part II, Details, provides thorough descriptions of every aspect of CUDA, including
- Streams and events
- Models of execution, including the dynamic parallelism feature, new with CUDA 5.0 and SM 3.5
- The streaming multiprocessors, including descriptions of all features through SM 3.5
- Programming multiple GPUs
The source code accompanying Part II is presented as reusable microbenchmarks and microdemos, designed to expose specific hardware characteristics or highlight specific use cases.
Part III, Select Applications, details specific families of CUDA applications and key parallel algorithms, including
- Streaming workloads
- Parallel prefix sum (Scan)
- Image Processing
These algorithms cover the full range of potential CUDA applications.
Author(s): Nicholas Wilt
Author(s): Duane Storti, Mete Yurtoglu
The CUDA Handbook is the only comprehensive reference to CUDA that exists. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes commands and how the driver checks progress; more experienced CUDA developers will appreciate the expert coverage of topics such as the driver API and context migration, as well as the guidance on how best to structure CPU/GPU data interchange and synchronization.
The accompanying open source code—more than 30,000 lines of it, freely available from github — is specifically intended to be reused and repurposed by developers.
Author(s): Nicholas Wilt
8. CUDA Fortran for Scientists and Engineers: Best Practices for Efficient CUDA Fortran Programming (2013)
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran.
To help you add CUDA Fortran to existing Fortran codes, the book explains how to understand the target GPU architecture, identify computationally intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance. All of this is done in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison.
- Leverage the power of GPU computing with PGI’s CUDA Fortran compiler
- Gain insights from members of the CUDA Fortran language development team
- Includes multi-GPU programming in CUDA Fortran, covering both peer-to-peer and message passing interface (MPI) approaches
- Includes full source code for all the examples and several case studies
- Download source code and slides from the book’s companion website
Author(s): Gregory Ruetsch, Massimiliano Fatica
10. GPU Programming in MATLAB (2016)
GPU programming in MATLAB is intended for scientists, engineers, or students who develop or maintain applications in MATLAB and would like to accelerate their codes using GPU programming without losing the many benefits of MATLAB. The book starts with coverage of the Parallel Computing Toolbox and other MATLAB toolboxes for GPU computing, which allow applications to be ported straightforwardly onto GPUs without extensive knowledge of GPU programming. The next part covers built-in, GPU-enabled features of MATLAB, including options to leverage GPUs across multicore or different computer systems. Finally, advanced material includes CUDA code in MATLAB and optimizing existing GPU applications. Throughout the book, examples and source codes illustrate every concept so that readers can immediately apply them to their own development.
- Provides in-depth, comprehensive coverage of GPUs with MATLAB, including the parallel computing toolbox and built-in features for other MATLAB toolboxes
- Explains how to accelerate computationally heavy applications in MATLAB without the need to re-write them in another language
- Presents case studies illustrating key concepts across multiple fields
- Includes source code, sample datasets, and lecture slides
Author(s): Nikolaos Ploskas, Nikolaos Samaras
11. Parallel Programming with OpenACC (2016)
Parallel Programming with OpenACC is a modern, practical guide to implementing dependable computing systems. The book explains how anyone can use OpenACC to quickly ramp-up application performance using high-level code directives called pragmas. The OpenACC directive-based programming model is designed to provide a simple, yet powerful, approach to accelerators without significant programming effort.
Author Rob Farber, working with a team of expert contributors, demonstrates how to turn existing applications into portable GPU accelerated programs that demonstrate immediate speedups. The book also helps users get the most from the latest NVIDIA and AMD GPU plus multicore CPU architectures (and soon for Intel® Xeon Phi™ as well). Downloadable example codes provide hands-on OpenACC experience for common problems in scientific, commercial, big-data, and real-time systems.
Topics include writing reusable code, asynchronous capabilities, using libraries, multicore clusters, and much more. Each chapter explains how a specific aspect of OpenACC technology fits, how it works, and the pitfalls to avoid. Throughout, the book demonstrates how the use of simple working examples that can be adapted to solve application needs.
- Presents the simplest way to leverage GPUs to achieve application speedups
- Shows how OpenACC works, including working examples that can be adapted for application needs
- Allows readers to download source code and slides from the book’s companion web page
Author(s): Rob Farber
Author(s): Yoshiyasu Takefuji