Numba for CUDA Programmers

Presented internally at NVIDIA during 2020.


The course focuses on using CUDA concepts in Python, rather than going over basic CUDA concepts – its primary audience is those who are familiar with CUDA C/C++ programming, but perhaps less so with Python and its ecosystem. That said, it should be useful to those familiar with the Python and PyData ecosystem.

Those unfamiliar with CUDA may want to build a base understanding by working through: Mark Harris’s An Even Easier Introduction to CUDA blog post, and briefly reading through the CUDA Programming Guide Chapters 1 and 2 (Introduction and Programming Model).

There are five sessions:

  • An introduction to Numba and CUDA Python
  • Typing
  • Porting strategies, performance, interoperability, and debugging
  • Extending Numba
  • Memory Management


Available in the repository on the Numba Github Account: https://github.com/numba/nvidia-cuda-tutorial

Customising a RISC-V Core

Presented at OSHCamp 2019.


Starting from an Open-source RISC-V core, add new instructions to it that you design! This workshop walks through the process of getting started with simulating an open-source RISC-V core and making the necessary modifications to decode and execute new instructions.

A processor that supports a new instruction is not much good if you can’t write any code for it, so the workshop also leads you through using the assembler to encode your new instructions, so that you can write programs using them and see that they execute correctly (or do not, and help you to work out the bugs in your implementation).


The tutorial materials provide enough of the implementation and sufficient guidance to be able to work through with a little experience of Verilog and C++. For those new to Verilog, the materials from last year’s talk and workshop (see below) provide a more accessible starting point.


An Introduction to cycle-accurate Verilog simulation of open-source RISC-V cores

Presented at OSHCamp 2018.


Developing hardware designs in Verilog is tricky, for both FPGA platforms and ASIC hardware targets. Understanding the behaviour of a design, testing it, and debugging are made much easier by simulating in software. This tutorial gives a brief overview of approaches, focusing on cycle-accurate modelling, which is a relatively fast approach that is robustly implemented in an open-source tool called Verilator. The main focus is be on working with CPU designs, but the software and techniques are generally applicable to other areas.


The slides give a brief overview of how to use Verilator to simulate a design, to develop testbenches, and to visualise simulation output using GTKWave. The exercises begin with a simple Verilog example and walk through generating simulations of some popular open-source RISC-V cores. Although this tutorial focuses on simulation, the cores can in general be instantiated on FPGAs for use in real applications (and higher performance!)

Two different RISC-V implementations are used  -Clifford Wolf’s PicoRV32 and RI5CY from the PuLP Platform. Loading and executing programs onto these bare metal systems through a testbench and also through a debugger (GDB) is be covered, along with some examples of interacting with the cores, and inspecting their state. Gathering accurate performance measurements is also possible, because the simulations are cycle-accurate.


The workshop should be of interest to people with a background in software who would like to tinker with open-source processor core development, and people with a background in hardware who would like to tinker with software toolchains.

The tutorial materials provide enough implementation that it is possible to follow this workshop without having had prior experience of hardware design or Verilog specifically – however, some understanding of programming and the organisation of computer hardware is required.


Accelerating Scientific Code with Numba

Presented at PyData London 2015 and PyCon UK 2015.