Voice Communications: Our Own Solution

If I don't find anything available to be adequate, I may just start developing my own solution. Apparently, there is some support for real-time voice chat in DirectX 8. I'll have to look into that. To pull this off, some skill with audio input (under windows) and audio compression would be necessary.

Name

I think we should name our software Combat Radio

Audio Compression

As far as I know, there are basically two kinds of audio compression. The simplest kind is to control the sampling rate and sampling size. The other kind is more mysterious to me. It uses compression techiques such as MP3, which I know nothing about.

A standard telephone line is digitized in 7 bit samples at 8000 samples per second. This translates to a 56k data stream. This would be terrible for music, but it is fine for a voice conversation. A common method of "compressing" an audio stream is to reduce the number of samples or the size of the samples. Obviously, this reduces the quality of the voice stream. I have listened to a demonstration of this kind of voice compression on phone lines, and I've learned that 9600 bps (8x1200 or maybe 4x2400) is reasonably acceptable.

On a telephone line, the audio levels are not linear. Sample levels at lower amplitudes (quieter) are closer together. In other words, the samples are done along a curve. This is part of what makes a 7 bit sample acceptable. The standard algorithm is called mu-law (mu, like the greek letter). I will have to find out the formula for that.

Mixing audio signals is easy. All one has to do is add together the samples (after making it linear). Unfortunately, this can result in overflows. The simplest thing to do with overflows is to cut them off, transmitting the highest (or lowest) value possible for the sample size. This is generally considered undesirable for digital music, but for this application, it should be fine, especially since there will be a minimum of people talking over each other.

In theory, one could employ LZ compression. Unfortunately, that is the same compression used in GIF files, which Unisys wants patent license fees for. I don't really know how effective LZ compression would be for audio anyway. I wonder what PNG uses.

One could also employ some kind of sliding bit size compression. If we use a 7 bit sample, but the next 20 samples have 00 in the high bits, we could indicate that the next 20 samples have 5 bits instead of 7, a saving of 40 bits or 5 bytes. Not terribly significant.

Our data stream should include header information that allows multiple compression types. For now, we'll just use standard compression, but we want to leave it available for future reference.

The user should be able to select their output and input sampling rates and sizes. The user should also be able to select a start delay. The start delay, along with the sample rate, would transmit to a packet size. For instance, using 8 bit samples at 8000 samples per second, a 50 millisecond delay would cause 3200 bits (or 400 bytes) to be accumulated before sending the first packet. The second and remaining packets should be smaller, maybe less than half, which would allow time for out of order or dropped packets to be reassembled, thus eliminating gaps in the event of network failures.

The server should fold all of the samples together, into output streams. Each user has their own output stream, because they don't want to hear their own voice. The server will have to keep a timeline so packets are assembled without overlapping each other.

User Interface

There should be a simple user interface which gives all the standard functionality of joining their favorite server, using their normal user name, and joining their favorite channel (or maybe a default channel).

Name:_____________________
Server:_____________________
Channel:_____________________
Password:_____________________

There can also be an advanced user interface that let's them set up controls for speaking to a specific individual, being connected to multiple servers or multiple channels, etc. There should also be a server user interface, that let's someone act as a server.

I like TeamSound's session display the best (it's about the only thing I like about it). I particularly like the visual indication that a user is transmitting -- the little red circle changes from red to green.

Controls

TeamSound and BattleCom both allow the user to set up hotkeys that trigger events. BattleCom includes a function to "whisper" to a user. Unfortunately, it is a while-holding function, rather than an after-pressing.

Unix Server

In addition to the server option of the Windows client, there should be dedicated servers available. Obviously one for FreeBSD, which should also be made available for Linux, but also one for Windows.

[Home]