The Advantages of Working in 32-bit Float

First of all, let me get one thing straight – I’m not referring to a 32-bit vs 64-bit OS, CPU or software. I am referring to the ‘environment found in applications like Photoshop, After Effects or Nuke.

Exclusive Bonus: Download my free guide (with examples) on how to find the best camera angles for dialogue scenes when your mind goes blank.

Many high-end applications allow one to work in a 32-bit environment. Is that a good thing? Before we find out what a 32-bit workflow can mean to you, we need to know what it is.

What is a 32-bit environment?

0s and 1s that make up data are called bits. A bit can be a 0 or a 1.

8 bits is a byte. A collection of bits is called a ‘word’. A word doesn’t always have to be a byte – it can be 4 bits or 400 bits, but is traditionally a multiple of 2. Okay, why is a word size important?

If I wanted to represent all the characters in the English language, I’ll need a 26 combinations of 0s and 1s.

20 = 1
21 = 2
22 = 4
23 = 8
24 = 16
25 = 32

I’ll need a word size of 5 bits at least. When computers became mainstream 8 bits was more than enough to represent all the characters of the English alphabet, punctuation marks, numbers from 0 to 9, special keys, etc. The total number of combinations was less than 256.

28 = 256

Most computer systems were built on this idea. When video became digital everyone tried to adopt a video signal into this word size. For color, they allotted 8 bits per channel – Red had 8 bits, Blue had 8 bits and Green had 8 bits – or 256 values from dark (0) to maximum saturation (255).

Imagine a digital cable to be a water slide with many lanes. Each color has its own lane. Each color is a channel.

This model turned out to be adequate in practice for most consumer applications – television, computers, photography, games, etc. It is still used today in most applications.

When we say an 8-bit image, we actually mean 8-bits per color channel (8×3 = 24 total bits). When working with an 8-bit image from acquisition to delivery, we are said to be working in an 8-bit environment. All image processing applications – for stills or video – at least support 8-bit.

If RGB are 3 dimensions, it can represented by a cube – doesn’t it look like a room (environment)?

Now imagine if each channel had access to 32 bits.

232 = 4,294,967,296

That’s 4 billion values for each channel. Three channels would be 12 billion bits! The room just got a whole lot bigger.

What is 32-bit float?

Large scientific numbers are represented by a notation that looks something like:

1.2345 = 1,2345 x10-4

A large number will need a large number of bits. No matter what the word size the key concept is that this size is fixed. If the same number needs to be represented differently, especially in the placement of the decimal point, within the same word size, it can only be done so if a system follows the scientific notation system as shown above.

The computer system that can represent real numbers in many ways is a floating point system. Because of this flexibility, the advantage of floating-point representation over fixed-point and integer representation is that it can support a much wider range of values.

Is there a catch? Yes. Under the floating point system large numbers are usually ’rounded off’, and for this reason floating systems are also given an additional qualification – called precision. E.g., the IEEE 754 specification for 32 bit is called Single Precision, with a precision of 24-bits.

The banking industry (where ’rounding off’ is frowned upon) uses a fixed-point system. Scientists often use the floating point system. To get past the rounding-off errors, scientists carefully study how they use the floating point system, and design their algorithms to minimize errors. Where the error is unacceptable, a fixed-system is adopted.

Floating point algorithms used in high-end video applications take into account these errors, and most of the time they are negligible. What we get instead is the ability to go much higher than what 232 is capable of.

Today, the ‘big-daddy’ is 64-bit – which is why operating systems, CPUs and software are built on this architecture. To get an idea why, look at these RAM sizes:

1 GB RAM = 8,589,934,592 bits = 1 billion bytes = 230 = 30 bits bus size
4 GB RAM = 34.36 billion bits = 4.3 billion bytes > 232 = 32 bits bus size
16 Exabytes (EB) = 264 = 64 bits bus

As you can see, 4GB RAM is more than 232, and this is why a 32-bit OS/CPU can’t use 4GB RAM. For this reason, on a 32-bit system, After Effects can use at most 3 GB RAM.

Note: This isn’t the precise and correct explanation of why a 32-bit system can’t read more than 4 GB RAM. For a thorough answer, click here.

The advantages of working in 32-bit float

Let’s recall our water slide example. If the three primary colors (RGB) need three lanes, can we create more lanes for different kinds of data? E.g., if I wanted one lane for the alpha channel (transparency), one for a special outline mask I’ve created, one for a secret message I want to pass on with the image, etc., can I do it?

One way to add additional data along with a file is by using metadata. But at what stage does metadata compete with or overtake data in size? For this reason, we use extra channels to pass on data.

The file format OpenEXR was developed by ILM for this purpose. By storing different kinds of data in separate channels, you don’t have to ‘bake in’ an image.

There are three areas in which 32-bit float can be used:

  • Visual Effects (VFX)
  • Color Grading
  • Mastering


After Effects CS6

32-bit float in VFX

Visual Effects, by definition, is data manipulation and creation. A single image in a video might be a mixture of:

  • A 3D object
  • Many lighting layers on the object
  • A matte painted background
  • 2D objects
  • Masks
  • RGB and A channels
  • Live action footage, chroma keyed
  • Additional effects

Not all of this is created by one person or station. Data can come from many sources. The advantage of using channels is to have a file that can not only store all this data in a very useful way, but also allows the compositor (the guy mixing everything for the final image) to pick and choose at will.

As you can imagine, every channel adds to the size of the file. To store such a file in memory would require a lot of space – and a processing engine (software and hardware) will, in today’s world, be at least 64-bit to allow the maximum usage of RAM.

But what about 32-bit float per channel? How would that help?

High end video work is 10-bit linear. Film scanned is 10-bit log (about 12-bit linear). 16-bits per channel should be more than adequate to work with existing material, except:

  • If the material is created by a 32-bit 3D engine
  • If the material is HDR and has a large dynamic range that cannot be covered in 216 = 65,536 tonal values per channel.
  • If the footage has poor color for chroma keying
  • If precise visual effects are added for photo realistic effect
  • If log and linear data have to be matched
  • If you are working in a wide gamut color space

In such scenarios it is advisable to work in a 32-bit floating point environment. When you get data that is coming from various sources, not all of them need conform to 10-bit shot footage. Mixing and matching colors, color spaces, gamma values, etc can be more accurately calculated if the algorithm has ample ‘elbow room’.

What 32-bit float guarantees is lesser mistakes by the various algorithms when working with 8 to 16-bit data. Eventually most delivery standards don’t ask for greater than 10 or 12-bit.

This is why high end image manipulating packages like After Effects, Photoshop and Nuke can work in 32-bit float. Nuke is purely 32-bit, no matter what the source footage or data.

32-bit float in Color Grading

Color grading involves manipulating color information, mainly in the RGB channels, an additional Y’ (Luma) channel, and sometimes even an alpha channel. Obviously, there is no restriction on this, since you can use masks and other effects in most high-end color grading applications.

Whether footage has come straight from camera or from a VFX pipeline, grading in 32-bit float has the following benefits:

  • Greater precision in calculations and color operations
  • A lot more colors to choose from, and results which are visible on a high-end monitor
  • Allowance for a greater tonal range, which helps in giving footage a better highlight or shadow roll-off, for example, that mimics how tones behave in the real world
  • One can work in true HDR
  • True compatibility with wide gamut color spaces, including the CIE XYZ space for DCI

Many people erroneously believe that by changing the color space to wide gamut (e.g. Prophoto RGB) the software will automatically show more colors. But if the software is only 16-bit, let’s assume, then each channel has only 216 = 65,536 tones. 32-bit gives 4 billion tones per channel. How will working in a wide gamut color space help, if the software can’t read, create or write that many color values? Food for thought.

As with VFX, working in a 32-bit grading environment takes software errors out of the equation. But beware, such precision puts a lot of strain on the hardware to perform – most high end monitors are only 10-bit!

Mastering in 32-bit float

Feature films are usually mastered in DPX or TIFF. Images are almost always TIFF or JPEG. The DCI standard encodes the master into a JPEG compressed package for distribution. Broadcast stations require an MPEG-4 compressed HDCAM SR master for delivery.

None of this requires anything more than 12-bit maximum. The average professional requirement is 10-bit (or even 8-bit). For videos that are not cinema or broadcast bound, 8 bit is more than enough. In such a scenario one might wonder (quite validly) why there is a need for 32-bit masters.

I see an advantage for 32-bit masters in the following situations:

  • If the footage has been processed in an 32-bit environment, then it makes sense to keep that information
  • To future-proof footage, if it is considered worthy enough by their makers

I hope I have clearly highlighted under what scenarios working in a 32-bit float environment might help a particular workflow. Unfortunately, one has to see it in action to believe it.

As far as I’m concerned, any high end workflow – commercials, broadcast television or cinema – deserves a 32-bit float environment. Hopefully, in the future, the entire pipeline – from camera to screen – will be in 32-bit (or even 64-bit)!

Exclusive Bonus: Download my free guide (with examples) on how to find the best camera angles for dialogue scenes when your mind goes blank.

2 replies on “The Advantages of Working in 32-bit Float”

  1. Hi i’m writing my dissertation with a focus on colour manipulation, to which a part of what i’m exploring is the use of 32-bit workflows. I would like to reference this article (advantages of working in 32 bit float) and was wondering if you could provide me with a date this was posted and potentially the authors name.

    Kind Regards,
    Chris Allen

Comments are closed.