Everybody talks of luma sampling and chroma subsampling, but few understand what these numbers really signify. When someone uses the notations 4:4:4, 4:2:2 and 4:2:0, etc., they usually mean chroma sub-sampling. But what do these numbers *really* mean, and why?

## How is 1920 x 1080 sampled?

Before the advent of digital HDTV, there was analog HDTV. The problem engineers faced was that they had to make this higher resolution standard work within the systems already established, mainly PAL and NTSC.

The major problem they faced, was the marriage between PAL and NTSC. The new HDTV standard would need to account for both PAL and NTSC as a common factor. E.g., if PAL is 2 and NTSC is 3, then HDTV couldn’t be 5 or 7 or 8. It had to be divisible by both 2 and 3, in this case resulting in 6, the lowest possible value.

Standard definition luma (Y) was sampled at 13.5 Mhz, and the lowest common factor that pleased both PAL and NTSC was 2.25 MHz (6 x 2.25 = 13.5).

This new system, which saw many variations, is now established as having 1125 vertical lines, of which 1080 is reserved for the image. The highest frame rate at the time for 1080 was 29.97 fps (NTSC), and for 720 was 59.94 fps (NTSC).

What about the horizontal size? The uncorroborated legend goes that 16:9 (1.78:1) wastes the least space when you want to fit in every kind of aspect ratio, from 4:3 to 2.39:1.

For 1125 vertical lines, the horizontal size for this aspect ratio (16:9) is 2002. Multiplying 2002 x 1125 x 30 (frame rate), you get 67.57 MHz, which is not divisible by 2.25 Mhz. The closest that you get is 74.25 MHz, while still keeping the horizontal size a whole number. In this case it happens to be 2200.

Therefore, 74.25 MHz is the sampling rate for HDTV 1080p at 30fps, and 720p at 60fps. Out of 2200 pixels, we only need 1920 pixels for the image.

For SMPTE 292M (HD-SDI), i.e., 10-bit 4:2:2 1080p29.97 HDTV, the luma is sampled at 74.25 Mhz, while each chrominance value is sampled at half that, 37.125 Mhz.

For SMPTE 372M (Dual HD-SDI) or 424M (3G-SDI), the sampling rate is doubled, to 148.5MHz, to account for the increased frame rate to 60fps.

## Why 4?

You’ve all seen 4:4:4, 4:2:0 and 4:2:2. First of all, just for the purposes of this article, let’s call this method of notation the ‘4-sytem’. The 4-system is essentially a digital notation, which isn’t supposed to have a direct analog counterpart (but it can be equated or derived from it).

While defining color bit depth, we understood that in order to make digital words, the formula for the number of combinations with n letters is 2^{n}. This is the hallmark of digital numbers.

Let’s assume n = 1.

In this case, in the Y’CbCr model, Y’ can have only 2^{1} = 2 combinations or values. It can be a 0 or 1. Cb and Cr, too, can have only 2 values each – 0 and 1.

But if Cb and Cr are to be compressed (sampled at a lower rate than Y’ to save bandwidth), the compressed values will have to be represented by a fraction – which as we know in computer land, is a big boo boo.

So n=1 won’t do. The next in line is 2. When n=2, 2^{n} = 4.

This gives Y’ Cb Cr 4 values each, represented as 4:4:4 when not compressed. If you want to sample chroma at half rate, you can represent them by 4:2:2. A quarter rate will give us 4:1:1 or 4:2:0. It gives us four options for each channel in the luma-chroma model.

Is the number 4 good enough? It works. Like all things in broadcast, this too, is a compromise. Tomorrow, if someone feels the sampling rate will need to be changed, this system will become obsolete.

Here are some of the digital chroma sub-sampling values of popular formats:

HDCAM |
3:1:1 |

NTSC |
4:1:1 |

PAL, DV, DVCAM, HDTV, etc |
4:2:0 |

Internet video |
4:2:0 |

HDTV Broadcast quality |
4:2:2 |

Uncompressed (Full information) |
4:4:4 |

## Is 3:1:1 better than 4:2:2?

The old 1080p HDCAM format was 3:1:1, and at the time, broadcast 720p was (still is) 4:2:2. Which one was better?

On the surface, it looks easy. 4:2:2 is double 3:1:1, as far as color sampling is concerned, so if you’re like me you’d say 4:2:2 is the winner, hands down.

But it’s not that simple. The sampling numbers in the 4-system don’t take into account the *size* of the image. Here’s something to ponder over:

**Color information in 1080p 3:1:1**

The 1080p image size is 1920×1080 = 2,073,600 pixels.

That’s 2,073,600 possible colors.

Since color values are only sampled at one-fourth the maximum frequency,

Cb = 1/4 * 2,073,600 * 1/2 = 259,200 colors

Cr = 1/4 * 2,073,600 * 1/2 = 259,200 colors

Total colors = Cb+Cr = 518,400

Color Density per pixel = 518,400 / 2,073,600 = 0.25 or 25%

**Color information in 720p 4:2:2**

The 720p image size is 1280×720 = 921,600 pixels.

That’s 921,600 possible colors

Chroma sampling is half the maximum frequency, therefore

Cb = 1/2 * 921,600 * 1/2 = 230,400 colors

Cr = 1/2 * 921,600 * 1/2 = 230,400 colors

Total colors = Cb+Cr = 460,800

Color Density per pixel = 460,800 / 921,600 = 0.5 or 50%

What do you see? The 3:1:1 1080p image has more color information (518,400) than the 4:2:2 720p image (230,400). But in quality (color density per pixel), obviously 4:2:2 is better than 3:1:1.

So, which is better? An image with more color information, or an image with better sampled color but lesser color information? I don’t know. The point of this little exercise was to show you exactly this, that there is more to these things than meets the eye.

Obviously, if you sample an image at 4:4:4 that gives you the best possible quality based on your sampling frequency.

## Okay, if everything else is equal, is 4:2:2 better than 4:2:0?

Can’t say! You see, in the digital world, you can play a lot of tricks. A manufacturer can sample an image at 4:2:0, and then repackage it as 4:2:2, just to make up the numbers.

Nobody can prove them wrong, because they are giving you 4:2:2, just that this might be an ’empty shell’, sort of like a Ferrari chassis which is running on a crappy third-party engine.

The only way you can know for sure whether you’ve been conned or not is to make tests between the two.

## Using chroma sampling numbers to calculate data size

I personally calculate how much information is lost as follows:

Maximum possible quality is 4+4+4 = 12

A full color image is 4:4:4 = 4+4+4 = 12, or 100% of maximum possible quality. From this, you can derive the rest:

- 4:2:2 = 4+2+2 = 8, which is 66.7% of 4:4:4 (12)
- 4:2:0 = 4+2+0 = 6, which is 50% of 4:4:4 (12)
- 4:1:1 = 4+1+1 = 6, which is 50% of 4:4:4 (12)
- 3:1:1 = 3+1+1 = 5, which is 42% of 4:4:4 (12)

So, if a 4:4:4 uncompressed frame is 24 MB, then a 4:2:2 frame will reduce to 16 MB, a 4:2:0 or 4:1:1 image will be 12 MB, and a 3:1:1 image will be 10 MB. Now you know why chroma sub-sampling is still around. For television and internet video, it reduces the file size by half, even before any compression has been applied.

I’ve used this calculation methodology in the costs of working with 4K and 2K. It’s not a foolproof method and only should be used as a general guideline for quick calculations; but it gets you damn close.

Nimesh says

Fantastic Article. put across in a very simple yet technical manner.