Course on Advanced GPU computing

 

Dr. Manuel Carcenac

(manuel.carcenac@gmail.com)

 

 

The following course lecture notes are freely available to anyone interested.

They may be used and modified without any restrictions.

  


Chapters:

 

Presentation  [ pdf ; doc ]

Compute Unified Device Architecture (CUDA) - Graphic Processing Unit (GPU) -

coprocessor - General Purpose Graphic Processing Unit (GPGPU) -

compute capability - Tesla C2050 - NVIDIA

 

CUDA architecture and programming model  [ pdf ; doc ]

shared memory system - Single Instruction Multiple Threads (SIMT) -

data-based parallelism - grid - block - thread - multiprocessor - core - host - device

 

CUDA programming interface - CUDA C  [ pdf ; doc ]

host memory - device memory - host code - device code -

kernel - thread synchronization - nvcc

 

Optimization of a CUDA code  [ pdf ; doc ]

cache memory - shared memory

 

CUBLAS linear algebra library  [ pdf ; doc ]

CUda Basic Linear Algebra Subprograms (CUBLAS) - helper function -

core function - leading dimension - column-major storage - matrix multiplication

 

Application: linear system resolution with Gauss method  [ pdf ; doc ]

CUDA - CUBLAS

 

Using OpenGL with CUDA  [ pdf ; doc ]

installation - basic use - interoperability

 

Application: all pairs n-body problem  [ pdf ; doc ]

time integration - law of gravity - collision of galaxies

 


Some programs:

 

matrix multiplication with either CUDA or CUBLAS

 

Gauss method with either CUDA or CUBLAS

 

All pairs nbody problem

 

Important: batch file for compilation with Windows 7 64 bits, CUDA 3.2, CUBLAS and compute capability 2.0 (C2050)

 


 

other available courses (Java, Graphics in Java, Advanced 3D Graphics in Java,

Analysis of Algorithms)

 

 

Manuel Carcenac's Homepage