Mp3 Encoding with Lame

There is a lot of confusion surrounding the terms audio compression, audio encoding, and audio decoding. This section will give you an overview what audio coding (another one of these terms...) is all about.

The purpose of audio compression

Up to the advent of audio compression, high-quality digital audio data took a lot of hard disk space to store. Let us go through a short example.

You want to, say, sample your favorite 1-minute song and store it on your harddisk. Because you want CD quality, you sample at 44.1 kHz, stereo, with 16 bits per sample. 44100 Hz means that you have 44100 values per second coming in from your sound card (or input file). Multiply that by two because you have two channels. Multiply by another factor of two because you have two bytes per value (that's what 16 bit means). The song will take up 44100 samples/s * 2 channels * 2 bytes/sample * 60 s/min ~ 10 Mbytes of storage space on your harddisk. If you wanted to download that over the internet, given an good 56k modem connected at 44k, it would take you (at least) 10000000 bytes * 8 bits/byte / (44000 bits/s) / (60 s/min) ~ 30 minutes just to download one minute of music! Digital audio coding, which - in this context - is synonymously called digital audio compression as well, is the art of minimizing storage space (or channel bandwidth) requirements for audio data. Modern perceptual audio coding techniques (like MPEG Layer III) exploit the properties of the human ear (the perception of sound) to achieve a size reduction by a factor of 11 with little or no perceptible loss of quality. Therefore, such schemes are the key technology for high quality low bit-rate applications, like sound tracks for CD-ROM games, solid-state sound memories, Internet audio, digital audio broadcasting systems, and the like.

The two parts of audio compression

Audio compression really consists of two parts. The first part, called encoding, transforms the digital audio data that resides, say, in a WAVE file, into a highly compressed form called bitstream. To play the bitstream on your soundcard, you need the second part, called decoding. Decoding takes the bitstream and re-expands it to a WAVE file. The program that effects the first part is called an audio encoder. LAME is such an encoder . The program that does the second part is called an audio decoder. Decoders can be found on http://www.mp3-tech.org.

Compression ratios, bitrate, and quality

It has not been explicitly mentioned up to now: What you end up with after encoding and decoding is not the same sound file anymore: All superfluous information has been squeezed out, so to say. It is not the same file, but it will sound the same - more or less, depending on how much compression had been performed on it. Generally speaking, the lower the compression ratio, the better the sound quality will be in the end - and vice versa. The table gives you an overview about quality achievable. Because compression ratio is a somewhat unwieldy measure, experts use the term bitrate when speaking of the strength of compression. Bitrate denotes the average number of bits that one second of audio data will take up in your compressed bitstream.Usually the units used will be kbps, which is Kbits/s, or 1000 bits/s. To calculate the number of bytes per second of audio data, simply divide the number of bits per second by eight.

Table: Bitrate versus sound quality

Bitrate	Bandwidth	Quality comparable to or better than
16 kbps	4.5 kHz	short-wave radio
32 kbps	7.5 kHz	AM radio
96 kbps	11 kHz	FM radio
128 kbps	16 kHz	near CD
160-180 kbps	20 kHz	perceptual transparency
256 kbps	22 kHz	studio

Table: MPEG-Version versus Samplerate

MPEG1	MPEG2	MPEG2.5
44100 Hz	22050 Hz	11025 Hz
48000 Hz	24000 Hz	12000 Hz
32000 Hz	16000 Hz	8000 Hz

Table: Valid Bitrates in kbit/second

MPEG 1			MPEG2	MPEG2.5
Layer I	Layer II	Layer III	Layer I	Layer II and III
32	32	32	32	8
64	48	40	48	16
96	56	48	56	24
128	64	56	64	32
160	80	64	80	40
192	96	80	96	48
224	112	96	112	56
256	128	112	128	64
288	160	128	144	80
320	192	160	160	96
352	224	192	176	112
384	256	224	192	128
416	320	256	224	144
448	384	320	256	160

The mp3 Standard

The reason MP3 took off and became the audio standard on the Web is that the original patent holders made it freely available for anyone to develop a decoder, or player, for it. So the early MP3 innovators hacked around and developed players and other cool software that spread fast and wide. By contrast, several other digital audio formats, which are more efficient or sound better than MP3, are proprietary formats, developed by companies like Lucent, Yamaha, and Microsoft, which have restrictions on how outside developers can employ their technology. These other audio formats may gain wider acceptance in the future, as record companies use them to distribute popular music, but for now MP3 still has the momentum.

Lame is just one out of many different encoders for the mp3 format. The name, the word stands for Lame ain't an mp3 Encoder, has historic reasons. While the format is public and decoding is a standard process, the encoding is not standard and not public. Many variations in speed and quality exist. Lame therefore was, for patent reasons, not an encoder, but a patch to the reference implementation. The reference implementation being an implementation that illustrated the principle but wasn't particularly good. Later Lame developed into a full implementations. Details can be found at lame.sourceforge.net. The implementation of m3w uses the lame encoder DLL, called lame_enc.dll, and loads it at program start from either the current working directory, or one of the default directories for DLL's like c:\windows. If you want to get the latest version, download it from some place on the internet and copy it to the appropriate place on your machine where m3w can find it. m3w will always show the version loaded in the main window. Lame comes with many options and not all of them are available in m3w.

The differences of various encoders that matter most are the differences in quality they can achieve. The quality does not only depend on the available bitrate but also on the psycho-acoustic model of the encoder. All encoders will produce good quality at 256kbit/s and lousy quality at 16kbit/s but in-between there are noticeable differences. Lame (since version 3.7) is considered to be one of the better encoders about as good as the encoder of the Fraunhofer Institute, the inventors of mp3. Lame, however, is free software, that is, you can download it and use it without paying anything for it. That and some philosophical concerns, is why m3w uses lame. The psycho-acoustic model is a model of the human ear, and hearing process, that is used to determine what parts of the sound are not audible and therefore can be dropped, if it has to be, to achieve the desired compression. A good model will discard exactly those parts of the signal that cause the least distortion to the audible result. Of course this is all a matter of subjective judgment.