File Format Overview

Last updated June 18, 2019

With a near complete shift from tape-based digital (DAT) and analog recording formats to file-based digital audio recording formats, the choices of audio file types have exploded. In the recent past, audio engineers would set their DAT recorder to 48 kHz, record its two audio tracks as either dual-mono or stereo, and would then concentrate on the business of getting great sound. With a file-based work flow, there are other elements that need to be considered to ensure a smooth transition from the field or studio to the editing suite. For productions where sound is recorded with the picture, the sound file format isn’t much of a consideration. The audio is already married to the picture so sync is not an issue. When budgets allow, many film and television projects opt for dual-system sound, were the sound is recorded separately from the camera. This gives the audio engineer far more control and independence from the camera person. It is when recording dual-system audio where compatibility issues can arise. Get two sound mixers together and they'll share horror stories, complaining about how their sound files didn't show up, or showed up incorrectly in the editing suite. Not only is this issue an annoyance for the mixers and editors working on the project, but can cost time and money. To avoid complications, file formats, sampling rates, and metadata need to be considered.

File Formats: The Digital Container

In dual-system sound, WAV files are the standard format for uncompressed audio. There are others, such as Apple’s AIFF, but WAV is king. This is due to the fact that WAV is universally supported by production audio recorders and editing software. As a data container, WAV files can be either monophonic or polyphonic. One of the first considerations for the audio engineer is whether post can accept a polyphonic WAV file greater than two tracks. In the latest versions of popular editing software from such companies as AVID and Apple, multi-track poly files import with no trouble. Problems do arise in older, legacy editing environments. Monophonic WAV files then become another least common denominator. A typical production can generate anywhere from one to four GB of uncompressed audio data per day. While this can be a fraction of what HD video generates, there are applications that can benefit by data reduction with compression. Lossy compression algorithms such as AAC, MP2 and MP3 can be used for transcription or for e-mailing samples, but may not be suitable for important production because the resulting decompression loses some audio quality. Many workflows will require the portability that "lossy" compression provides but with the sonic quality of uncompressed audio. Lossless compression is the technique of reducing data without throwing anything away. Computer “Zip” files have used this technique for years, but .zip doesn’t work on sound files. File formats such as FLAC (free lossless audio codec) do reduce file sizes while maintaining the original audio data. The benefit is the ability to squeeze more record time out of a CompactFlash card or a hard drive. Utilizing a FLAC file does add an additional step in post. FLAC files need a means of conversion to a WAV/BWF file. FLAC and other lossless algorithms are still a newer concept so there is no industry-wide support for these file formats as of yet. However, with the growing complexity of productions, which in turn, results in larger file sizes, the widespread adoption of lossless algorithms is in the immediate future.

Sampling Rates

Because most SD and HD video formats use a 48 kHz sampling rate for their native audio formats, sound files are generally recorded at 48 kHz. Nonetheless, there are numerous exceptions to the 48 kHz rule. In NTSC-land, one exception is the sampling rate of 48048 Hz (48.048 kHz), 0.1% greater than 48 kHz. It is used in select applications when film or video is shot at true 30 or 24 frames and is pulled down to 29.97 or 23.98 in post. When those 48.048 sound files are played back at 48 kHz, the 0.1% pull down will be achieved without additional sample rate conversion. To add to the confusion, some editors will want the audio files recorded at 48.048, but stamped at 48 kHz. Sound Devices calls this intentional mislabeling of the sampling rate our “F” mode. This mode generates a spoofed file which fools the editing software into playing the sound file at 48 kHz, causing the file to play back 0.1% slower than recorded with no additional steps. If nosebleed sampling rates, such as 96 kHz, are being considered for a project, you may want to select an alternate rate. Post will, at best, sample rate convert them or, at worst, refuse them.

BWF and iXML Metadata

As with any collaborative project, communication between each member involved is key. One means of keeping communications lines open is to incorporate metadata into the workflow. File-based production sound recorders 7-Series recorders write files in the Broadcast Wave Format, or BWF files. These files are identical to WAV files with the addition of a BEXT (broadcast extension) chunk in the file header. This added metadata chunk is only viewable by Broadcast Wave-aware applications. BWF files include data such as scene and take information, time code value, time code rate, unique file identifiers, and all sorts of other data. BWF data enables WAV files to carry time code information. The time code value in a BWF file is represented as the number of samples since midnight. Because the latest generation of metadata-crazed editors require further informational options beyond what BWF can provide, iXML was introduced as an additional piece of metadata generated by file-based audio recorders. Additional information stored in the iXML “chunk” include information such as track names (such as “boom,” “char-1 lav,” and “char-2 lav”) and notes on files taken on set. One important element written in iXML is the original file name. If the file name was accidentally changed somewhere in the editing process, the original will still exist in the file’s iXML metadata.

Key to Audio Post Success: Preparation

With careful pre-planning, a file-based digital audio file can merrily move through the editing process in perfect synchronization. Additionally, new formats, such as FLAC and MP3 proxies can speed decision-making by moving around a smaller file and extending recording times. The key technology to successful dual-system production is quite elementary. The production sound mixer needs to know before first pressing record what is expected in the post-production environment.