Compression is the elimination of data to reduce file size. The word ‘elimination’ is used with full intent. It’s final – those pixels are gone forever.
True Lossless Compression
When a file that has been compressed can be decoded back into its original form with zero loss of information, the compression is said to be a Lossless Compression.
Is this just a pipe dream, or are there any scenarios where a true lossless compression can be achieved? I can think of two:
- Vector images are scaled based on pure mathematical principles. Simple shapes can be made quite small in file size, however complex shapes can be larger than raster images!
- An algorithm that stores duplicate pixel data in some place can replace them just as easily. My 256×256 image example in Interframe vs Intraframe Compression is one good example. Just like vectors though, once the image gets complicated, the level of compression will become negligible.
Most compression algorithms don’t understand the subject matter being presented. They apply generalized formulas and systems – very good ones, mind you – but still under pressure from the whip of having to reduce file size. That’s the whole point.
If, after compression, the original file cannot be brought back again (like humpty dumpty), then the compression is said to be Lossy.
Let me dispel a myth: As far as video is concerned, I have yet to see lossless compression. Name your codec – it is lossy. Compression, if your intention is to reduce file size, is always lossy.
What about Visually Lossless Compression?
Friend, let me ask you a question: When was the last time you compressed an image to get a visually lossy result?
The aim of every algorithm is to achieve a visually lossless look. These algorithms put power into your hands through software, so you can use them to the degree you see fit. If you go too far, your images will look like mush.
Look at the following image, a test TIFF file that has been compressed to JPEGs under five settings – Maximum (12), High (8), Medium (5), Low (3) and Zero (0).
The original TIFF file is 1920×1080 8-bit sRGB, with a file size of 5.95 MB. You can estimate the perfect file size according to my formula here. The numbers in the second row show you the size in KB (5.95 MB = 6096 KB). The numbers in blue show the size relative to the original size.
What do you see? I see this:
- After a point, compression makes the image worse, but does not reduce file size proportionally. The relationship is exponential rather than linear.
- Large expanses of color get worse faster than fine detail. The first and last boxes show degradation at level 8, while the fine detail holds up till level 5. One of the fundamental tenets of compression is the elimination of duplicate data. Fine detail makes that difficult.
- Once fine detail has been compressed to the limit, compression artifacts give birth to patterns of their own, giving the illusion of detail and sharpness.
- The maximum quality setting for JPEG is visually lossless, while at the same time being only 10% the size of a full raster image!
- If the quality of the original image is poor (the example is a perfect image created in Photoshop), even the best compression level will contain artifacts.
- Level 12 is 575 KB – visually lossless JPEG at maximum quality. If you had a JPEG image sequence (or intraframe codec using similar algorithms), at 30fps this would result in 134 Mbps. At 25 fps, it would mean 112 Mbps. This explains why broadcast quality delivery requirements for intraframe codecs is limited to 100 Mbps and above.
You can perform similar tests on any algorithm that claims to reduce file size. Some algorithms are designed to stop after a certain point – giving the illusion that they are ‘better’ somehow – and they increase file sizes! Really cool codecs like JPEG gets bad publicity from misinformed souls who don’t know how to test or when to stop.
Have you ever wondered why the broadcast requirement for interframe codecs is 50 Mbps while intraframe codecs is 100 Mbps? Aren’t interframe codecs worse? Shouldn’t they need more space if this were true?
The reality is different. In fact, interframe codecs like MPEG-2, H.284 and AVCHD are better than intraframe codecs – in the sense that they can provide similar visually lossless results in a lesser file size! Crazy world, huh?
How to never be taken for a ride
Whenever you are faced with a choice of codecs, conduct your own tests:
- Create a perfect image: Either in Photoshop or take a RAW photograph of a scene.
- Convert to an uncompressed TIFF file (Why TIFF? It can be opened in most software). This is A.
- If you want to test video, you will also need to test for motion or temporal artifacts. If you have uncompressed RAW video (as far as I know only Arriraw is true uncompressed RAW) use that. Else, create a simple vector animation – a few seconds worth. You could combine it with a still image or another vector background. This is B.
- Transcode both A and B to all the codecs you want to compare, from best to worst. Leave out the nitty-gritties – try one slider at a time. Do this systematically – take notes and name each baby correctly. If you plan ahead this step is easy.
- Finally, study each result and share it with people. Then you’ll know what’s good for you. This exercise will take a few hours, but will set you up for many years.
- Share it with the world.
Have you conducted tests of your own that you want to share? Tell me!