Monday, October 5, 2015

GPU path tracing tutorial 1: Drawing First Blood

In early 2011 I developed a simple real-time path traced Pong game together with Kerrash on top of an open source GPU path tracer called tokaspt (developed by Thierry Berger-Perrin) which could only render spheres, but was bloody fast at it. The physics were bodged, but the game proved that path tracing of very simple scenes at 30 fps was feasible, although a bit noisy. You can still download it from Since that time I've always wanted to write a short and simple tutorial about GPU path tracing and now is a good time to do exactly that.

This tutorial is not meant as an introduction to ray tracing or path tracing as there are plenty of excellent ray tracing tutorials for beginners online such as Scratch-a-Pixel (also check out the old version which contains more articles) and Minilight (more links at the bottom of this article). The goal of this tutorial is simply to show how incredibly easy it is to turn a simple CPU path tracer into a CUDA accelerated version. Being a fan of the KISS principle from engineering (Keep It Simple Stupid), I've chosen to cudafy Kevin Beason's smallpt, the most basic (but still fully functional) CPU path tracer around. It's a very straightforward piece of code and doesn't require the user to install any tedious libraries to compile the code (apart from Nvidia's CUDA Toolkit).

The full CPU version of smallpt can be found at Due to its compactness the code is not very easy to read, but fortunately David Cline made a great Powerpoint presentation explaining what each line in smallpt is doing with references to Peter Shirley's "Realistic Ray Tracing" book. 

To keep things simple and free of needless clutter, I've stripped out the code for the tent filter, supersampling, Russian Roulette and the material BRDFs for reflective and refractive materials, leaving only the diffuse BRDF. The 3D vector class from smallpt is replaced by CUDA's own float3 type which has the same linear algebra math functions as a vector such as addition, subtraction, multiplication, normalize, length, dot product and cross product. To compile the code, Nvidia's GPU Computing Toolkit. Assuming the user has a CUDA capable device with up-to-date drivers and for reasons of code clarity, there are no safety checks when initialising CUDA (such as CUDA_SAFE_CALL()).

After reading the slides from David Cline, the commented code below should speak for itself, but feel free to ask any questions in the comments section if some things are still not clear.

So without further ado, here's the full CUDA code:

// smallptCUDA by Sam Lapere, 2015
// based on smallpt, a path tracer by Kevin Beason, 2008  
#include <iostream>
#include <cuda_runtime.h>
#include <vector_types.h>
#include "device_launch_parameters.h"
#include <cutil_math.h>

#define M_PI 3.14159265359f
#define width 512  // screenwidth
#define height 384 // screenheight
#define samps 1024 // samples 

struct Ray { 
 float3 orig; // ray origin
 float3 dir;  // ray direction 
 __device__ Ray(float3 o_, float3 d_) : orig(o_), dir(d_) {} 

enum Refl_t { DIFF, SPEC, REFR };  // material types, used in radiance(), only DIFF used here

struct Sphere {

 float rad;           // radius 
 float3 pos, emi, col; // position, emission, color 
 Refl_t refl;          // reflection type 

__device__ float intersect(const Ray &r) const { 
 // ray/sphere intersection
 // returns distance to intersection point, 0 if no hit  
 // ray equation: p(x,y,z) = ray.orig + t*ray.dir
 // general sphere equation: x^2 + y^2 + z^2 = rad^2 
 // classic quadratic equation of form ax^2 + bx + c = 0 
 // solution x = (-b +- sqrt(b*b - 4ac)) / 2a
 // solve t^2*dir.dir + 2*t*(orig-p).dir + (orig-p).(orig-p)-rad^2 = 0 
 // more details in "Realistic Raytracing" book by P. Shirley or

  float3 op = pos - r.orig;   
  float t, epsilon = 0.0001f;
  float b = dot(op, r.dir);
  float disc = b*b - dot(op, op) + rad*rad; 
  if (disc<0) return 0; 
   else disc = sqrtf(disc);
  return (t = b - disc)>epsilon ? t : ((t = b + disc)>epsilon ? t : 0);

// 9 spheres forming a Cornell box
// small enough to be in constant GPU memory
__constant__ Sphere spheres[] = {
 { 1e5f, { 1e5f + 1.0f, 40.8f, 81.6f }, { 0.0f, 0.0f, 0.0f }, { 0.75f, 0.25f, 0.25f }, DIFF }, //Left 
 { 1e5f, { -1e5f + 99.0f, 40.8f, 81.6f }, { 0.0f, 0.0f, 0.0f }, { .25f, .25f, .75f }, DIFF }, //Rght 
 { 1e5f, { 50.0f, 40.8f, 1e5f }, { 0.0f, 0.0f, 0.0f }, { .75f, .75f, .75f }, DIFF }, //Back 
 { 1e5f, { 50.0f, 40.8f, -1e5f + 600.0f }, { 0.0f, 0.0f, 0.0f }, { 1.00f, 1.00f, 1.00f }, DIFF }, //Frnt 
 { 1e5f, { 50.0f, 1e5f, 81.6f }, { 0.0f, 0.0f, 0.0f }, { .75f, .75f, .75f }, DIFF }, //Botm 
 { 1e5f, { 50.0f, -1e5f + 81.6f, 81.6f }, { 0.0f, 0.0f, 0.0f }, { .75f, .75f, .75f }, DIFF }, //Top 
 { 16.5f, { 27.0f, 16.5f, 47.0f }, { 0.0f, 0.0f, 0.0f }, { 1.0f, 1.0f, 1.0f }, DIFF }, // small sphere 1
 { 16.5f, { 73.0f, 16.5f, 78.0f }, { 0.0f, 0.0f, 0.0f }, { 1.0f, 1.0f, 1.0f }, DIFF }, // small sphere 2
 { 600.0f, { 50.0f, 681.6f - .77f, 81.6f }, { 2.0f, 1.8f, 1.6f }, { 0.0f, 0.0f, 0.0f }, DIFF }  // Light

__device__ inline bool intersect(const Ray &r, float &t, int &id){

 float n = sizeof(spheres) / sizeof(Sphere), d, inf = t = 1e20;
 for (int i = int(n); i--;)  // test all scene objects for intersection
  if ((d = spheres[i].intersect(r)) && d<t){
    t = d;  // keep track of distance along ray to closest intersection point 
    id = i; // and closest intersected object
 return t<inf; // returns true if an intersection with any scene object occurred

// random number generator from

__device__ static float getrandom(unsigned int *seed0, unsigned int *seed1) {
 *seed0 = 36969 * ((*seed0) & 65535) + ((*seed0) >> 16);
 *seed1 = 18000 * ((*seed1) & 65535) + ((*seed1) >> 16);

 unsigned int ires = ((*seed0) << 16) + (*seed1);

 /* Convert to float */
 union {
  float f;
  unsigned int ui;
 } res;
 res.ui = (ires & 0x007fffff) | 0x40000000;

 return (res.f - 2.f) / 2.f;

// radiance function, the meat of path tracing 
__device__ float3 radiance(Ray &r, unsigned int s1, unsigned int s2){ // returns ray color

 float3 pixelcolor = make_float3(0.0f, 0.0f, 0.0f);
 float3 accucolor = make_float3(1.0f, 1.0f, 1.0f); // accumulates ray colour with each bounce

 // ray bounces (no Russian Roulette used) 
 for (int bounces = 0; bounces < 4; bounces++){  // iteration up to 4 bounces (replaces recursion in CPU code)

  float t;           // distance to intersection 
  int id = 0;        // index of intersected sphere 

// intersect ray with scene
  if (!intersect(r, t, id))
   return make_float3(0.0f, 0.0f, 0.0f); // if miss, return black

  // else, when HIT:
  const Sphere &obj = spheres[id];  // hitobject
  float3 x = r.orig + r.dir*t;          // hitpoint 
  float3 n = normalize(x - obj.pos);    // normal
  float3 nl = dot(n, r.dir) < 0 ? n : n * -1; // front facing normal

  pixelcolor += accucolor * obj.emi;

  // calculate new DIFFUSE ray with hitpoint of last ray as origin
  // and with random direction in hemisphere above hitpoint (see "Realistic Ray Tracing", P. Shirley)

  // create 2 random numbers
  float r1 = 2 * M_PI * getrandom(&s1, &s2);
  float r2 = getrandom(&s1, &s2);
  float r2s = sqrtf(r2); 

  // compute orthonormal coordinate frame uvw with hitpoint as origin 
  float3 w = nl;
  float3 u = normalize(cross((fabs(w.x) > .1 ? make_float3(0, 1, 0) : make_float3(1, 0, 0)), w)); 
  float3 v = cross(w,u);  

  // compute random ray direction on hemisphere (using polar coordinates)
  float3 d = normalize(u*cos(r1)*r2s + v*sin(r1)*r2s + w*sqrtf(1 - r2));
  // new ray origin is hitpoint of last ray with scene
  r.orig = x + nl*0.05f; // offset ray origin slightly to prevent self intersection
  r.dir = d;

  accucolor *= obj.col;    // multiply with colour of object       
  accucolor *= dot(d,nl);  // dot product of ray direction and normal: weighting w.r.t. angle of incident light
  accucolor *= 2;          // fudge factor

 return pixelcolor;

// this kernel runs in parallel on all the CUDA threads
__global__ void render_kernel(float3 *output){

 // assign a CUDA thread to every pixel by using the threadIndex
 unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;   
 unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

 unsigned int i = (height - y - 1)*width + x; // index of current pixel (calculated using thread index) 

 unsigned int s1 = x;  // seeds for random number generator
 unsigned int s2 = y;

 Ray cam(make_float3(50, 52, 295.6), normalize(make_float3(0, -0.042612, -1))); // first hardcoded camera ray(position, direction) 
 float3 cx = make_float3(width * .5135 / height, 0.0f, 0.0f); // ray direction offset in x direction
 float3 cy = normalize(cross(cx, cam.dir)) * .5135; // ray direction offset in y direction (.5135 is FOV angle)
 float3 r; // r is final pixel color       
 r = make_float3(0.0f); // reset r to zero for every pixel 

 for (int s = 0; s < samps; s++){  // samples per pixel
  getrandom(&s1, &s2); // creates new seeds for radiance() function with every loop iteration 
  // compute primary ray direction
  float3 d = cam.dir + cx*((.25 + x) / width - .5) + cy*((.25 + y) / height - .5);
  // create primary ray, add incoming radiance to pixelcolor
  r = r + radiance(Ray(cam.orig + d * 40, normalize(d)), s1, s2)*(1. / samps); 
 }       // Camera rays are pushed ^^^^^ forward to start in interior 

 // write rgb value of pixel to image buffer on the GPU 
 output[i] = make_float3(clamp(r.x, 0.0f, 1.0f), clamp(r.y, 0.0f, 1.0f), clamp(r.z, 0.0f, 1.0f));

inline double clamp(float x){ return x < 0 ? 0 : x > 1 ? 1 : x; } 

inline int toInt(float x){ return int(pow(clamp(x), 1 / 2.2) * 255 + .5); }  // RGB float in range [0,1] to int in range [0, 255]

int main(){

 float3* output_h = new float3[width*height]; // pointer to memory for image on the host (system RAM)
 float3* output_d;    // pointer to memory for image on the device (GPU VRAM)

 // allocate memory on the CUDA device (GPU VRAM)
 cudaMalloc(&output_d, width * height * sizeof(float3));
 // dim3 is CUDA specific type, block and grid are required to schedule CUDA threads over streaming multiprocessors
 dim3 block(4, 4, 1);   
 dim3 grid(width / block.x, height / block.y, 1);

 printf("CUDA initialised.\nStart rendering...\n");
 // schedule threads on device and launch CUDA kernel from host
 render_kernel <<< grid, block >>>(output_d);  

 // copy results of computation from device back to host
 cudaMemcpy(output_h, output_d, width * height *sizeof(float3), cudaMemcpyDeviceToHost);  
 // free CUDA memory


 // Write image to PPM file, a very simple image file format
 FILE *f = fopen("smallptcuda.ppm", "w");          
 fprintf(f, "P3\n%d %d\n%d\n", width, height, 255);
 for (int i = 0; i < width*height; i++)  // loop over pixels, write RGB values
  fprintf(f, "%d %d %d ", toInt(output_h[i].x),
 printf("Saved image to 'smallptcuda.ppm'\n");

 delete[] output_h;

In this example, it's pretty easy to turn C/C++ code into CUDA code (CUDA is a subset of the C language). The differences with the CPU version of smallpt are as follows:

  • the CUDA specific keyword __device__ before functions that should run on the GPU
  • CUDA specific keyword __global__ in front of the kernel that is called from the host (CPU) and which runs in parallel on all CUDA threads
  • a custom random number generator that runs on the GPU
  • as GPUs don't handle recursion well, the radiance function needs to be converted from a recursive function to an iterative function (see Richie Sam's blogpost for more details) with a fixed number of bounces (Russian roulette could be implemented here to terminate paths with a certain probability, but I took it out for simplicity)
  • in a CPU raytracer, you loop over each pixel of the image with two nested loops (one for image rows and one for image columns). On the GPU the loops are replaced by a kernel which runs for each pixel in parallel and a global thread index is computed instead from the grid dimensions, block dimensions and local thread index. See for more details
  • the main() function calls CUDA specific functions to allocate memory on the CUDA device (cudaMalloc()), launch the CUDA kernel using the "<<< grid, block >>>" syntax and copy the results (in this case the rendered image) from the GPU back to the CPU, where the image is saved in PPM format (a supersimple image format)

When running the code above, we get the following image (1024 samples per pixel, brute force path tracing):

Path traced color bleeding rendered entirely on the GPU! On my laptop's GPU (Geforce 840M) it renders about 24x faster than the multithreaded CPU version (laptop Core-i7 clocked at 2.00 Ghz). The neat thing here is that it only took about 100 lines (if you take out the comments) to get path tracing working on the GPU. The beauty lies in its simplicity.

Even though the path tracing code already works well, it is actually very unoptimized and there are many techniques to speed it up:

  • explicit light sampling (or next event estimation): sample the light source directly instead of using brute force path tracing. This makes an enormous difference in reducing noise.
  • jittered sampling (also called stratified sampling): instead of sampling a pixel randomly, divide the pixel up into a number of layers (strata) in which random sampling is performed. According to Peter Shirley this is one of the most important noise reduction methods
  • better random number generators
  • ray tracing acceleration structures: kd-trees, octrees, grids, bounding volume hierarchies provide massive speedups

GPU specific optimisations (see and Karl Li's course slides linked below):
  • using shared memory and registers whenever possible is many times faster than using global/local memory
  • memory alignment for coalesced reads from GPU memory
  • thread compaction: since CUDA launches a kernel in groups of 32 threads in parallel ("warps"), threads taking different code paths can give rise to thread divergence which reduces the GPU's occupancy. Thread compaction aims to mitigate the effects of thread divergence by bundling threads following similar code paths

I plan to cover the following topics (with CUDA implementations) in upcoming tutorials whenever I find some time:
  • an interactive viewport camera with progressive rendering, 
  • textures (and bump mapping), 
  • environment lighting, 
  • acceleration structures,  
  • triangles and triangle meshes
  • building more advanced features on top of Aila and Laine's GPU ray tracing framework which is also used by Blender's Cycles GPU renderer
  • dissecting some code snippets from Cycles GPU render or SmallLuxGPU 

References used:

Tuesday, August 18, 2015

FireRays, AMD's OpenCL based high performance ray tracing renderer

Pretty big news for GPU rendering: about 6 years after Nvidia released the source code of their high performance GPU ray tracing kernels and 4 years after Intel released Embree (high performance CPU ray tracing kernels), last week at Siggraph AMD finally released their own GPU rendering framework in the form of FireRays, an OpenCL based ray tracing SDK, first shown in prototype form at Siggraph 2014 by Takahiro Harada (who also conducted research into foveated ray tracing for VR):

The FireRays, SDK can be downloaded from the AMD Developer site:

More details  can be found at The acceleration structure is a BVH with spatial splits and the option to build the BVH with or without the surface area heuristic (SAH). For instances and motion blur, a two level BVH is used, which enables very efficient object transformations (translation, rotation, scaling) at virtually no cost. 

AMD's own graphs show that their OpenCL renderer is roughly 10x faster running on 2 D700 FirePro GPUs than Embree running on the CPU:

There are already a few OpenCL based path tracers available today such as Blender's Cycles engine and LuxRays (even V-Ray RT GPU was OpenCL based at some point), but none of them have been able to challenge their CUDA based GPU rendering brethren. AMD's OpenCL dev tools have historically been lagging behind Nvidia's CUDA SDK tools which made compiling large and complex OpenCL kernels a nightmare (splitting the megakernel in smaller parts was the only option). Hopefully the OpenCL developer tools have gotten a makeover as well with the release of this SDK, but at least I'm happy to see AMD taking GPU ray tracing serious. This move could truly bring superfast GPU rendering to the masses and with the two big GPU vendors in the ray tracing race, there will hopefully be more ray tracing specific hardware improvements in future GPU architectures.

(thanks heaps to CPFUUU for pointing me to this)

UPDATE: Alex Evans from Media Molecule had a great talk at Siggraph 2015 about his research into raymarching signed distance fields for Dreams. Alex Evans is currently probably the biggest innovator in real-time game rendering since John Carmack (especially since Carmack spends all his time on VR now, which is a real shame). Alex's presentation can be downloaded from and is well worth reading. It sums up a bunch of approaches to rendering voxels, signed distance fields and global illumination in real-time that ultimately were not as successful as hoped, but they came very close to real-time on the PS4 (and research is still ongoing).

For people interested in the real-world physics of light bouncing, there was also this very impressive video from Karoly Zsolnai about ultra high speed femto-photography cameras able to shoot images at the speed of light, demonstrating how light propagates and is transprorted as an electromagnetic wave through a scene, illuminating objects a fraction of a nanosecond before their mirror image becomes visible:

Thursday, May 21, 2015

Brand new GPU path tracing research from Nvidia and AMD

A very interesting paper called "Gradient domain path tracing" was just published by Nvidia researchers (coming from the same incredibly talented Helsinki university research group as Timo Aila, Samuli Laine and Tero Karras who developed highly optimized open source CUDA ray tracing kernels for Tesla, Fermi and Kepler GPUs), describing a new technique derived from the ideas in the paper Gradient domain Metropolis Light Transport, which drastically reduces noise without blurring details. 
We introduce gradient-domain rendering for Monte Carlo image synthesis. While previous gradient-domain Metropolis Light Transport sought to distribute more samples in areas of high gradients, we show, in contrast, that estimating image gradients is also possible using standard (non-Metropolis) Monte Carlo algorithms, and furthermore, that even without changing the sample distribution, this often leads to significant error reduction. This broadens the applicability of gradient rendering considerably. To gain insight into the conditions under which gradient-domain sampling is beneficial, we present a frequency analysis that compares Monte Carlo sampling of gradients followed by Poisson reconstruction to traditional Monte Carlo sampling. Finally, we describe Gradient-Domain Path Tracing (G-PT), a relatively simple modification of the standard path tracing algorithm that can yield far superior results. 
This picture shows a noise comparison between gradient domain path tracing (GPT) and regular path tracing (PT). Computing a sample with the new technique is about 2.5x slower, but path tracing noise seems to clear up much faster, far outweighing the computational overhead: 

More images and details of the technique can be found in

Related to the previous post about using real-time ray tracing for augmented reality, a brand new Nvidia paper titled "Filtering Environment Illumination for Interactive Physically-Based Rendering in Mixed Reality" demonstrates the feasibility of real-time Monte Carlo path tracing for augmented or mixed reality: 
Physically correct rendering of environment illumination has been a long-standing challenge in interactive graphics, since Monte-Carlo ray-tracing requires thousands of rays per pixel. We propose accurate filtering of a noisy Monte-Carlo image using Fourier analysis. Our novel analysis extends previous works by showing that the shape of illumination spectra is not always a line or wedge, as in previous approximations, but rather an ellipsoid. Our primary contribution is an axis-aligned filtering scheme that preserves the frequency content of the illumination. We also propose a novel application of our technique to mixed reality scenes, in which virtual objects are inserted into a real video stream so as to become indistinguishable from the real objects. The virtual objects must be shaded with the real lighting conditions, and the mutual illumination between real and virtual objects must also be determined. For this, we demonstrate a novel two-mode path tracing approach that allows ray-tracing a scene with image-based real geometry and mesh-based virtual geometry. Finally, we are able to de-noise a sparsely sampled image and render physically correct mixed reality scenes at over 5 fps on the GPU.

While Nvidia is certainly at the forefront of GPU path tracing research (with CUDA), AMD has recently begun venturing into GPU rendering as well and has previewed its own OpenCL based path tracer at the Siggraph 2014 conference. The path tracer is developed by Takahiro Harada, who is a bit of an OpenCL rendering genius. He recently published an article in GPU Pro 6 about rendering on-the-fly vector displacement mapping with OpenCL based GPU path tracing. Vector displacement mapping differs from regular displacement mapping in that it allows the extrusion of overlapping geometry (eg a mushroom), which is not possible with the heightfield-like displacement provided by traditional displacement (the Renderman vector displacement documentation explains this nicely with pictures).

Slides from

This video shows off the new technique, rendering in near-realtime on the GPU:

There's more info on Takahiro's personal page, along with some really interesting slideshow presentations about OpenCL based ray tracing. This guy also developed a new technique called "Foveated real-time ray tracing for virtual reality devices" (paper), progressively focusing more samples on the parts in the image where the eyes are looking (determined by eye/pupil tracking), "reducing the number of pixels to shade by 1/20, achieving 75 fps while preserving the same visual quality" (source: Foveated rendering takes advantage of the fact that the human retina is most sensitive in its center (the "fovea", which contains densely packed colour sensitive cones) where objects' contours and colours are sharply observed, while the periphery of the retina consists mostly of sparsely distributed, colour insensitive rods, which cause objects in the periphery of the visual field to be represented by the brain as blurry blobs (although we do not consciously perceive it like that, thinking that our entire visual field is sharply defined and has colour).
This graph shows that the resolution of the retina is highest at the fovea and drops off quickly with increasing distance from the center. This is due to the fact that the fovea contains only cones which each send individual inputs over the optic fibre (maximizing resolution), while the inputs from several rods in the periphery of the retina are merged by the retinal nerve cells before reaching the optic nerve (image from

Foveated rendering has the potential to make high quality real-time raytraced imagery feasible on VR headsets that support eye tracking like the recently Kickstarted FOVE VR headset. Using ray tracing for foveated rendering is also much more efficient than using rasterisation: ray tracing allows for sparse loading and sampling of the scene geometry in the periphery of the visual field, while rasterisation needs to load and project all geometry in the viewplane, whether it's sampled or not.

Slides from

This video shows a working prototype of the FOVE VR headset with a tracking beam to control which parts of the scene are in focus, so this type of real-time ray traced (or path traced) foveated rendering should be possible right now, (which is pretty exciting):

It's good to finally see AMD stepping up its OpenCL game with its own GPU path tracer. Another example of this greater engagement is that AMD recently released a large patch to fix the OpenCL performance of Blender's Cycles renderer on AMD cards. Hopefully it will put some pressure on Nvidia and make GPU rendering as exciting as in 2010 with the release of the Fermi GPU, a GPGPU computing monster which effectively doubled the CUDA ray tracing performance compared to the previous generation. 

Rendering stuff aside, today is a very important day: for the first time in their 115 year long existence, the Buffalo's from AA Gent, my hometown's football team, have won the title in the Belgian Premier League, giving them a direct ticket to the Champions League. This calls for a proper celebration!

Wednesday, February 18, 2015

Real-time ray traced augmented reality for surgeons

With last month's unveiling of Microsoft's augmented reality glasses project dubbed "HoloLens", today's announcement of Sony's plans to release a similar AR device called the "SmartEyeGlass", and with more details surfacing on Magic Leap's retina projecting fiber optic AR glasses (cleverly reconstructed from publicly available patents by a Gizmodo journalist), the hype around augmented reality seems to be reaching a peak at the moment. Unfortunately, most of the use cases for these technologies that have been demonstrated so far, for example husbands assisting their wives with screwing a new syphon on a sink, projecting the weather forecast for Maui on the kitchen wall or casually investigating a suspicious rock on the surface of Mars, look either gimmicky, far-fetched or both. 

The area where I see a real and immediate use for these high tech AR devices is in the operating room. In my previous life as a medical student, I've spent quite some time in the operating theatre watching surgeons frantically checking if they were cutting the right part of the brain by placing a sharp needle-like pointer (with motion capture dots) on or inside the brain of the patient. The position of the pointer was picked up by 3 infrared cameras and a monitor showed the position of the needle tip in real-time on three 2D views (front, top and side) of the brain reconstructed from CT or MRI scans. This 3D navigation technique is called stereotactic neurosurgery and is an invaluable tool to guide neurosurgical interventions.    

Instruments for stereotactic surgery (from here)

While I was amazed at the accuracy and usefulness of this high tech procedure, I was also imagining ways to improve it, because every time the surgeon checks the position of the pointer on the monitor, he or she loses visual contact with the operating field and "blindly" guiding instruments inside the body is not recommended. A real-time three-dimensional augmented reality overlay that can be viewed from any angle, showing the relative position of the organs of interest (which might be partially or fully covered by other organs and tissues like skin, muscle, fat or bone) would be tremendously helpful provided that the AR display device minimally interferes with the surgical intervention and the augmented 3D images are of such a quality that they seamlessly blend with the real world. The recently announced wearable AR glasses by MS, Sony and Magic Leap seem to take care of the former, but for the latter there is no readily available solution yet. This is where I think real-time ray tracing will play a major role: ray tracing is the highest quality method to visualise medical volumetric data reconstructed from CT and MRI scans. It's actually possible to extend a volume ray caster with physically accurate lighting (soft shadows, ambient occlusion and indirect lighting) to add visual depth cues and have it running in real-time on a machine with multiple high end GPUs. The results are frighteningly realistic. I for one can't wait to test it with one of these magical glasses.

As an update to my previous post, the people behind Scratch-a-Pixel have launched a v2.0 website, featuring improved and better organised content (still work in progress, but the old website can still be accessed). It's by far the best resource to learn ray tracing programming for both novices (non engineers) and experts. Once you've conquered all the content on Scratch-a-Pixel, I recommend taking a look at the following ray tracing tutorials that come with source code:

- smallpt  from Kevin Beason: an impressively tiny path tracer in 100 lines of C++ code. Make sure to read David Cline's slides which explain the background details of this marvel. 

- Rayito, by Mike Farnsworth from Renderspud (currently at Solid Angle working on Arnold Render): a very neatly coded ray/path tracer in C++, featuring path tracing, stratified sampling, lens aperture (depth of field), a simple BVH (with median split), Qt GUI, triangle meshes with obj parser, diffuse/glossy materials, motion blur and a transformation system. Not superfast because of code clarity, but a great way to get familiar with the architecture of a ray tracer

-  Renderer 2.x: a CUDA and C++ ray tracer, featuring a SAH BVH (built with the surface area heuristic for better performance), triangle meshes, a simple GUI and ambient occlusion

- Peter and Karl's GPU path tracer: a simple, but very fast open source GPU path tracer which supports sphere primitives, raytraced depth of field and subsurface scattering (SSS)
If you're still not satisfied after that and want a deeper understanding, consider the following books:
- "Realistic ray tracing" by Peter Shirley, 
- "Ray tracing from the ground up" by Kevin Suffern, 
- "Principles of Digital Image Synthesis" by Andrew Glassner, a fantastic and huge resource, freely available here, which also covers signal processing techniques like Fourier transforms and wavelets (if your calculus is a bit rusty, check out Khan academy, a great open online platform for engineering level mathematics)
- "Advanced global illumination" by Philip Dutré, Kavita Bala and Philippe Bekaert, another superb resource, covering finite element radiosity and Monte Carlo rendering techniques (path tracing, bidirectional path tracing, Metropolis light transport, importance sampling, ...)

Sunday, October 19, 2014

Scratch-a-pixel and more

Having left Otoy some time ago and after enjoying  a sweet as holiday, it's time for things new and exciting. Lots of interesting rendering related stuff happened in the past months, below are some of the most fascinating developments in my opinion:

- starting off, there's an excellent online tutorial series on computer graphics (mostly ray tracing) for both beginners and experts called Scratch-a-Pixel. The authors are veterans from the VFX, animation and game industry and have years of experience writing production rendering code like Renderman. The tutorials deal with all the features that are expected from a production renderer and contains a lot of background and insights into the science of light and tips and tricks on how to write performant and well optimized ray tracing code. Rendering concepts like CIE xyY colorspace and esoteric mathematical subjects like discrete Fourier transforms, harmonics and integration of orthonormal polynomials are explained in an easy-to-digest manner. Most tutorials also come with C++ source code. At the moment some sections are missing or incomplete, but the author told me there's a revamp of the website coming very soon... 

- hybrid rendering (rasterization mixed with ray tracing) for games has finally arrived with the recent release of Unreal Engine 4.5 which supports ray traced soft shadows and ambient occlusion via signed distance fields (which can be faster to compute than traditional shadow mapping, but works only for static geometry): 

A nice video of the technique in action:
Like voxels or triangles, distance fields are another way to represent scene geometry. Just like voxels, distance fields approximate the scene geometry and are more efficient to trace than triangles to create low frequency effects like soft shadows, ambient occlusion and global illumination that don't require 100% geometric accuracy (and because they have inherent multiresolution characteristics by approximating the scene geometry). Inigo Quilez wrote a few interesting articles on rendering with distance fields (in 2008):

Free penumbra shadows for raymarching distance fields

More on distance fields:
Distance fields in Unreal Engine
Alex Evans from Media Molecule invented a neat trick to approximate AO and GI with distance fields in "Fast Approximations for Global Illumination for Dynamic Scenes"

There's also  a very recent paper about speeding up sphere tracing for rendering of signed distance fields or path tracing: Enhanced sphere tracing 

- one of the most interesting Siggraph 2014 surprises, must be the announcement from Weta (the New Zealand based visual effects studio that created the CG effects for blockbusters like the Lord of the Rings, King Kong, Avatar, Tintin and the Hobbit movies) that they are developing their own production path tracer called Manuka (the Maori name for New Zealand's healing tea tree) in conjunction with Gazebo, a physically plausible realtime GPU renderer. While Manuka has been used to render just a couple of shots in "The Hobbit: the Desolation of Smaug", it will be the main renderer for the next Hobbit film. More details are provided in this extensive fxguide article: Another surprise was Solid Angle (creators of Arnold) unveiling of an OpenCL accelerated renderer prototype running on the GPU. There's not much info to be found apart from a comment on by Solid Angle's Mike Farnsworth ("This is a prototype written by another Solid Angle employee (not Brecht), and it is not Arnold core itself. It's pretty obvious we're experimenting, though. We've been keeping a close eye on GPUs and have active communication with both AMD and Nvidia (and also obviously Intel). I wouldn't speculate on what the prototype is, or what Brecht is up to, because you're almost certainly going to be wrong.")

- Alex St John, ex-Microsoft and one of the creators of DirectX API, has recently moved to New Zealand and aims to create the next standard API for real-time graphics rendering using CUDA GPGPU technology. More details on his blog His post on his visit to Weta contains some great insights into the CUDA accelerated CG effects created for The Desolation of Smaug. 

- Magic Leap, an augmented reality company founded by a biomedical engineer, recently got an enormous investment from Google and is working with a team at Weta in New Zealand to create imaginative experiences. Info available on the net suggests they are developing a wearable device that directly projects 3d images onto the viewer's retina that seemlessly integrate with the real-life scene via projecting multiple images with a depth offset. Combined with Google Glass it could create games that are grounded in the real world like this: (augmented reality objects are rendered with Octane Render). 

- the Lab for Animate Technologies at the University of Auckland in New Zealand is doing cutting edge research into the first real-time autonomously animated AI avatar: 
The facial animation is driven in real-time by artificial intelligence using concepts from computational neuroscience and is based on a physiological simulation of the human brain which is incredibly deep and complex (I was lucky to get a behind the scenes look): it includes the information exchange pathways between the retina, the thalamic nuclei and the visual cortex including all the feedback loops and also mimics low level single neuron phenomena such as the release of neurotransmitters and hormones like dopamine, epinephrine and cortisol. All of these neurobiological processes together drive the avatar's thoughts, reactions and facial animation through a very detailed facial muscle system, which is probably the best in the industry (Mark Sagar, the person behind this project, was one of the original creators of the USC Lightstage and pioneered facial capturing and rendering for Weta in King Kong and Avatar). More info on and One of the most impressive things I've ever seen and it's something that is actually happening now. 

Wednesday, January 22, 2014

Object-order ray tracing for fully dynamic scenes

Today, the GPU Pro blog posted a very interesting article about a novel technique which seamlessly unifies rasterization and ray tracing based rendering for fully dynamic scenes. The technique entitled "Object-order Ray Tracing for Fully Dynamic Scenes" will be described in the upcoming GPU Pro 5 book (to be released on March 25, 2014 during the GDC conference)  and was developed by Tobias Zirr, Hauke Rehfeld and Carsten Dachsbacher .  

Abstract (taken from
This article presents a method for tracing incoherent secondary rays that integrates well with existing rasterization-based real-time rendering engines. In particular, it requires only linear scene access and supports fully dynamic scene geometry. All parts of the method that work with scene geometry are implemented in the standard graphics pipeline. Thus, the ability to generate, transform and animate geometry via shaders is fully retained. Our method does not distinguish between static and dynamic geometry. Moreover, shading can share the same material system that is used in a deferred shading rasterizer. Consequently, our method allows for a unified rendering architecture that supports both rasterization and ray tracing. The more expensive ray tracing can easily be restricted to complex phenomena that require it, such as reflections and refractions on arbitrarily shaped scene geometry. Steps in rendering that do not require the tracing of incoherent rays with arbitrary origins can be dealt with using rasterization as usual.

This is to my knowledge the first practical implementation of the so-called hybrid rendering technique which mixes ray tracing and rasterization by plugging a ray tracer in an existing rasterization based rendering framework and sharing the traditional graphics pipeline. Since no game developer in his right mind will switch to pure ray tracing overnight, this seems to be the most sensible and commercially viable approach to introduce real ray traced high quality reflections of dynamic objects into game engines in the short term, without having to resort to complicated hacks like screen space raytracing for reflections (as seen in e.g. Killzone Shadow Fall, UE4 tech demos and CryEngine) or cubemap arrays, which never really look right and come with a lot of limitations and artifacts. For example, in this screenshot of the new technique you can see the reflection of the sky, which would simply be impossible with screen space reflections from this camera angle:  

Probably the best thing about this technique is that it works with fully dynamic geometry (accelerating ray intersections by coarsely voxelizing the scene) and - judging from the abstract - with dynamically tesselated geometry as well, which is a huge advantage for DX11 based game engines. It's very likely that the PS4 is capable of real-time raytraced reflections using this technique and when optimized, it could not only be used for rendering reflections and refractions, but for very high quality soft shadows and ambient occlusion as well. 

The ultimate next step would be global illumination with path tracing for dynamic scenes, which is a definite possibility on very high end hardware, especially when combined with another technique from a very freshly released paper (by Ulbrich, Novak, Rehfeld and Dachsbacher) entitled Progressive Visibility Caching for Fast Indirect Illumination which promises a 5x speedup for real-time progressively path traced GI by cleverly caching diffuse and glossy interreflections  (a video can be found here). Incredibly exciting if true!

Monday, December 23, 2013

Real-time rendered animations with OctaneRender 1.5

OctaneRender 1.5 has some really powerful features like support for Alembic animations and a fully scriptable user interface. The Alembic file support allows for real-time rendered animations in the standalone version of Octane for scenes with both rigid and deformable animated geometry. Seeing your animations rendered in final quality in real-time with GI, glossy reflections and everything is a blast:

You may remember the actor in the following video from blockbuster movies like "Ultra high detailed dynamic character test" and "4968 daftly dancing dudes on Stanford bunny". Even in Octane, his dancing prowess remains unrivalled. The actor is made up of 66k triangles and there are 730 clones of him (48 million triangles in total). For every frame of the animation, Octane loads the animated geometry from an Alembic file, builds the scene and renders the animation sequence with a script, all in real-time. 

Some examples of rigid body animations:

With support for Alembic animations and Lua scripting, Octane has now a very solid foundation for animation rendering in place, allowing for some very cool stuff (yet to be announced) that can be done fully in real-time on a bunch of GPUs (inspired and fueled by earlier Brigade experiments). In 2014, Octane will blow minds like never before.

Friday, December 20, 2013

Lamborghini Test Drive (OctaneRender animation)

Recently I was blown away by a video posted by on the Octane forum, called "Lamborghini Test Drive" as a tribute to celebrate the 50h anniversary of Lamborghini. The realism you can achieve with Octane is just batshit crazy as evidenced by the video.

Try to spot the 7 differences with reality:

Some specs:

- the scene is 100% 3D, all rendered with Octane
- rendered on 4x GTX Titan
- render resolution 1280 x 538 Panavision format (2,39:1)
- average rendertime per frame: from 1 minute for the large shots with the cars to 15 minute for the helmet shots by night
- over 5.000.000 triangles for both cars
- instances for the landscape.

Tuesday, December 10, 2013

Real-time path tracing with OctaneRender 1.5

Just want to share a couple of real-time rendered videos made with the upcoming OctaneRender 1.5. The scene used in the videos is the same one that was used for the Brigade 3 launch videos. The striking thing about Octane is that you can navigate through this scene in real-time while having an instant final quality preview image. It converges in just a few seconds to a noise free image, even with camera motion blur enabled. It's both baffling and extremely fun. 

The scene geometry contains 3.4 million triangles without the Lamborghini model, and 7.4 million triangles with (the Lamborghini alone has over 4 million triangles). All videos below were rendered in real-time on 4 GTX 680 GPUs. Because of the 1080p video capture, the framerate you see in the videos is less than half the framerate you get in real life, it's incredibly smooth. 

There are a bunch more real-time rendered videos and screenshots of the upcoming OctaneRender 1.5 in this thread on the Octane forum (e.g. on page 7).

Monday, November 11, 2013

Shiny Toy pathmarcher on Shadertoy

This looks incredible, raymarching with GI, glossy road, glossy car:

The future of real-time graphics is noisy!

New GPU path tracer announced

Jacco Bikker just announced a new GPU based path tracer on the ompf forum. There's also a demo version available that you can grab from this post

Tuesday, October 22, 2013

Le Brigade nouveau est arrivé!

Time for an update on Brigade 3 and what we've been working on: until now, we have mostly shown scenes with limited materials, i.e. either perfectly diffuse or perfectly specular surfaces. The reason we didn't show any glossy (blurry) reflections so far, is because these generate a lot of extra noise and fireflies (overbright pixels) and because the glossy material from Brigade 2 was far from perfect. Over the past months, we have reworked the material system from Brigade and replaced it with the one from OctaneRender, which contains an extraordinary fast converging and high quality glossy material. The sky system was also replaced with a custom physical sky where sky and sun color vary with the sun position.And there's a bunch of brand new custom post effects, tone mapping filters and real camera effects like fish eye lens distortion (without the need for image warping).

We've had a lot of trouble finding a good way to present the face melting awesomeness that is Brigade 3 in video form and we've tried both youtube and Vimeo at different upload resolutions and samplecounts (samples per pixel). Suffice to say that both sites have ultra shitty video compression, turning all our videos in a blocky mess (although Vimeo is still much better than YT). We also decided to go nuts on glossy materials and fresnel on every surface in this scene, which makes everything look much a lot more realistic (in particular fresnel, which causes surfaces to look more or less reflective depending on the viewing angle), but the downside of this extra realism is a lot of extra noise.

So feast your eyes on the first videos of Brigade 3 (1280x720 render resolution):

Vimeo video (less video compression artefacts):

Youtube vids:

Another one using an Xbox controller:

The scene in the video is the very reason why I started this blog five years ago and is depicted in one of my very first blog posts from 2008 (see The scene was created by Big Lazy Robot to be used in a real-time tech demo for ATI's Radeon HD 4870 GPU. Back then, the scene used baked lightmaps rendered with V-Ray for the diffuse lighting and an approximate real-time ray tracing technique for all reflective surfaces like cars and building windows. Today, more than five years later, we can render the same scene noise free using brute force path tracing on the GPU in less than half a second and we can navigate through the entire scene at 30 fps with a bit of noise (mostly apparent in shadowy areas). When I started this blog my dream was to be able to render that specific scene fully in real-time in photoreal quality and I'm really glad I've come very close to that goal. 

UPDATE: Screenshot bonanza! No less than 32 screenshots, each of them rendered for 0.5 - 1 second. The problem with Brigade 3 is that it's so much fun mucking around with the lighting, the time of day, depth of field and field of view with lens distortion. Moreover, everything looks so photoreal that it's extremely hard to stop playing and taking screenshots. It feels like you're holding a camcorder.

We plan to show more videos of Brigade 3 soon, so stay tuned... 

Update: I've uploaded the direct feed version of the second video to MEGA (a New Zealand based cloud storage service, completely anonymous, fast, no registration required and free, just excellent :). You can grab the file here: brigade3_purely_random_osumness (it's 2.40 GB)

Update 2: The direct feed version of the first video can be downloade here: brigade3_launch_vid_HD.avi (2.90 GB). This video has a higher samplecount per pixel per frame (and thus less noise and lower framerate).