Eliminating branches in shaders

You might have heard GPUs hate branches (if-elses, and consequently dynamic for- and while-loops). See more details here http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter34.html. Instead of trying to figure out how to best setup your branches, how about getting completely rid of them? That is possible and easy to do in most cases.

I am gonna show you how to eliminate branches in shaders with one example. That is usually done using GLSL built-in functions like mix() (lerp() in HLSL), clamp(), sign() and abs(). In this example I will write a fragment shader that “overlays” one texture on top of another. In this case the overlay is the Photoshop Overlay Blending Mode. I am using the formulas from this classic post http://mouaif.wordpress.com/2009/01/05/photoshop-math-with-glsl-shaders/. The naïve implementation of the Overlay blending requires one branch for each of the RGB channels. Basically, what it does is the following: for each channel, if the value in the base image is smaller than 0.5, return 2 * base * blend; otherwise, return 1 - 2 * (1 - base) * (1 - blend). Where blend is the image on top of the base image. Conceptually, if the value is lower than 0.5, multiply, else screen it. Then, the fragment shader would end up looking like this:

// Note: This is GLSL ES
precision highp float;

uniform sampler2D s_texture; // Base texture
uniform sampler2D s_overlay; // Overlay texture

varying vec2 v_texCoord;

float overlayf(float base, float blend)
{
    if (base < 0.5) {
        return 2.0 * base * blend;
    }
    else {
        return 1.0 - 2.0 * (1.0 - base) * (1.0 - blend);
    }
}

vec3 overlay(vec3 base, vec3 blend)
{
    return vec3(overlayf(base.r, blend.r), overlayf(base.g, blend.g), overlayf(base.b, blend.b));
}

void main()
{
    vec4 base = texture2D(s_texture, v_texCoord);
    vec4 blend = texture2D(s_overlay, v_texCoord);
    vec3 overlay = overlay(base.rgb, blend.rgb); // Overlay'd color
    // Linearly interpolate between the base color and overlay'd color
    // because the blend texture might have transparency. The built-in
    // mix does the job since mix(x, y, a) = x*(1.0 - a) + y*a
    vec3 finalColor = mix(base.rgb, overlay.rgb, blend.a);
    gl_FragColor = vec4(finalColor, base.a);
}

That’s quite a lot of code.. It’s possible to make it much simpler with the smart use of the built-in functions. The general technique is to convert the comparison (base < 0.5) into a float which is zero if base is below 0.5, and is one otherwise. Then we use this float as the interpolation parameter (the third) in the mix() built-in. It will allow us to select between the first and the second parameters since it is either zero (then mix() returns the first parameter) or one (mix() returns the second parameter), for example:

float a = 0.0;
float x = 1.0;
float y = 2.0;
float z = mix(x, y, a); // z will be equals to x
float b = 1.0;
float k = mix(x, y, b); // k will be equals to y

Then, our final overlay shader code will be:

precision highp float;

uniform sampler2D s_texture;
uniform sampler2D s_overlay;

varying vec2 v_texCoord;
varying vec2 v_rawTexCoord;

void main()
{
    vec4 base = texture2D(s_texture, v_texCoord);
    vec4 blend = texture2D(s_overlay, v_texCoord);
    // This is our 'selection parameter'. For each of the rgb components, we subtract 0.5
    // and take the sign() of the result. It will return -1.0 if smaller than 0.0, 0.0 if 0.0,
    // and 1.0 if greater than 0.0. In this case, we want for the final result to be 0.0
    // when the subtraction is smaller than zero and 1.0 otherwise, then we clamp() the value
    // in the [0.0, 1.0] interval. Hence, when smaller than zero, it will return -1.0 and this
    // value will be clamped to 0.0.
    vec3 br = clamp(sign(base.rgb - vec3(0.5)), vec3(0.0), vec3(1.0));
    vec3 multiply = 2.0 * base.rgb * blend.rgb;
    vec3 screen = vec3(1.0) - 2.0 * (vec3(1.0) - base.rgb)*(vec3(1.0) - blend.rgb);
    // If br is 0.0, overlay will be multiply (which translates to if (base < 0.5) { return multiply; }).
    // if bt is 1.0, overlay will be screen (which translates to if (base >= 0.5) { return screen; }).
    vec3 overlay = mix(multiply, screen, br);
    vec3 finalColor = mix(base.rgb, overlay, blend.a);
    gl_FragColor = vec4(finalColor, base.a);
}

Now the code is much shorter and cleaner and performs exactly the same function. The difference is that we compute both the values of the if and the else and then select one of them. When branching, we avoid wasting the computation of the unused value, which in this example has a negligible cost. On the other hand, we have no branches, which keeps the GPU happy. Depending on your hardware and shader complexity, you might notice a performance improvement.