Building the Ideal Audiovisual System: Camera, Resolution and Frame Rate

Those who cannot change their minds cannot change anything – George Bernard Shaw

Humans inspire humans. Cameras and audiovisual tools are no different.

If a camera and display system is to transcend the limits of human experience, it must first deliver the human experience. From the notes of Professor Sampler, Dr. Optoglass and Grandma Moon, we can put together a digital audiovisual system that uses the human analog system as a benchmark.


The Ideal Camera Sensor

Sensors can be of many sizes and shapes (from less than one inch to 8×10 and beyond), and there is no strict limit on the ideal size, shape or aspect ratio.

We know that the human retina mapped as an ellipse gives us a sensor size of 51.56 mm x 25.57 mm (2″ x 1″), at an aspect ratio of about 2:1.

The above sensor size is what corresponds to the eye, nothing more. For aesthetic reasons, one is free to deviate from it – but only for aesthetic reasons.

The distance from the back of the lens to the sensor could be about 16.67 mm, which is the flange focal distance. Today, the Sony FZ-mount achieves a flange focal distance of 18mm.

The Ideal Lens

The lens will have a minimum lower-limit f-number of f/2 and will have a focal length of between 10 mm to 1100 mm (8 mm to 850 mm in 35mm equivalent terms) without distortion or aberration. Its peak performance must be at 42mm (32mm in 35mm equivalent terms).

The Ideal Display

The ideal display should have a minimum viewing angle of 155o by 135o.


The Ideal Sensor

The pixel density of the sensor must be no less than 12,700 ppi or 250 lp/mm.

At this rate, the megapixel count of the ‘human eye’ sensor (2″ x 1″ sensor) is approximately 25,400 x 12,700 = 323 Megapixels.

This sensor must be diffraction-limited, with a pixel pitch of less than 2 microns.

The Ideal Lens

The ideal lens will out-resolve the sensor, with a resolution of 250 lp/mm or more, across the entire spectrum.

Formula for Ideal PPI for any Display Monitor

Display ppi depends on viewing distance, regardless of the size of the display. At 4 inches, the minimum focus distance, the ppi should be no less than 2190 ppi.

The simple formula for ppi vs viewing distance is:

PPI = 9000/d

where d is the viewing distance from the screen in inches.

Instead of PPI, if you want to calculate the necessary resolution based on the size of a screen and the viewing distance, the full formula is:

R/L = 9000/d

where R is the Resolution in one direction (Horizontal or Vertical)
L is the Length of the screen in that direction, in inches
d is the viewing distance from the screen in inches.

If one wants to find the ‘good enough’ ppi, substitute 9,000 by 3,600.

Fully ‘Immersive’ Display

A fully immersive display will offer the field of view of the human eye. One can use the following formulas as rules of thumb:

Length = 10d and Height = 5d

where d is the viewing distance from the screen in inches.

To obtain such an experience, the resolution of the display must be 90,000 x 45,000 = 4 Gigapixels. If this display is 30 feet long (length), then the human with the sharpest eye can watch it at a distance of 3 feet.

For the average human with 20/20 vision, the immersive display must be 36,000 x 18,000 = 650 Megapixels. At 30 feet, the average person can watch it at a distance of 3 feet.

The camera that produces 323 Megapixels (25,400 x 12,700) cannot be used to create an immersive display. The sharpest-eyed human should stand at about 10 feet from a 30 feet display at this resolution. However, this will not cover his/her field of view to be immersive.

All said and done, for a fully immersive experience, both camera and display must be able to produce imagery in the order of 4 Gigapixels, at 90,000 x 45,000.

For such a resolution, the camera sensor must be of the size 180 mm x 90 mm (7″ x 3.5″) at 2 microns. At one micron, it can be 90 mm x 45 mm (3.5″ x 1.75″).

Frame Rate and Scan Mode

The frame rate of the display system must be capable of 120 fps. The camera can have any frame rate based on aesthetic criteria, for under or over cranking, etc., but it must have a default/normal timebase frame rate of 120 fps.

For display purposes, the video can be shown as variable frame rates. For static scenes or scenes with slow movement, it might be okay to show certain sections at a lower frame rate. The choice is purely aesthetic. The display system, however, should be able to show video at 120 fps as standard.

Scan Mode and Shutter Technology

The scan mode is perfectly progressive, with zero temporal distortion.

Even though it is convenient to record and display images as frames, one must not discount the possibility of having motion+light detection technology in a sensor, which might only change when either motion or light changes. This works very similarly to the human eye. Such a technology will eliminate redundant imagery thus significantly reducing file sizes and bandwidth requirements.

In the next part, we’ll look at ISO, Dynamic Range and Color.

Next: Coming Soon
Previous: Human Voice