
Analysing the radio alphabet
If you have tried to spell out the letters of a name, or give a (UK) post code or car registration number over the phone, the person at the other end of the conversation might have misheard what you said. One solution to this is to use the radio alphabet (also known as the NATO phonetic alphabet, the ICAO phonetic alphabet etc.). It’s the one where you say alpha to mean A, bravo to mean B and so on. It has appeared in names in UK TV and film and US TV and film.
In this article I will look at the sounds of the words chosen, to see how similar or different they are. This will touch a bit on transcribing sounds as symbols and how transcriptions could be compared.
Limitations
It’s really important that I describe the ways I know that this analysis is limited or flawed. There are so many that I’ll summarise them here and go into more detail in the different sections below.
If I were doing it properly, I would design, conduct and evaluate experiments. This would involve lots of people listening to sounds in lots of different conditions. I’m not doing that; instead I’m looking at the phonetic transcription of the sounds, and then comparing the transcriptions. This is something I can do fairly quickly and easily sat at a computer.
Transcriptions
The words and their transcription using the International Phonetic Alphabet (IPA) are given in the table below.
Word | Transcription |
alpha | alfə |
bravo | brɑːvəʊ |
charlie | tʃɑːli |
delta | dɛltə |
echo | ɛkəʊ |
foxtrot | fɒkstrɒt |
golf | ɡɒlf |
hotel | həʊˈtɛl |
india | ɪndɪə |
juliet | dʒuːlɪˌɛt |
kilo | kiːləʊ |
lima | liːmə |
mike | mʌɪk |
november | nəʊˈvɛmbə |
oscar | ɒskə |
papa | pɑːpə |
quebec | kwɪbɛk |
romeo | rəʊmɪəʊ |
sierra | sɪˈɛrə |
tango | taŋɡəʊ |
uniform | juːnɪfɔːm |
victor | vɪktə |
whiskey | wɪski |
xray | ɛksreɪ |
yankee | jaŋki |
zulu | zuːluː |
IPA transcriptions can take some getting used to – there are some odd symbols, the link between sound and symbol might be odd, and transcriptions can reveal details of speech sounds that you might not have noticed. If you look at the symbols at the start of yankee, uniform and juliet you’ll notice that the j symbol is for the sound at the start of yankee and uniform. This is similar to its use in Scandinanvian languages, in words such as fjord and Mjolnir.
If the j symbol is used for this sound, what about the sound at the beginning of juliet? It’s dʒ, which suggests it’s two sounds, one after the other. A consonant’s sound being formed of two smaller sounds isn’t all that far-fetched. For instance, if you look at the symbols for xray, you see ks in the middle. This makes sense to me as I can’t see how you can make an x sound without either a k or a s. However, to me the consonant at the beginning of juliet makes one sound that can’t be broken down, so dʒ seems to be just chosing to encode one sound using a pair of symbols.
The symbols for vowel sounds are a bit of a mix. There are a few words where vowels are transcribed using ː, such as iː in lima. This shows that the previous symbol’s sound has been lengthened. This makes sense to me, but needing an extra symbol rather than e.g. making the symbol upper case is a choice, and one that affects the number of symbols needed to transcribe one sound.
There are also several words that include ə, such as at the end of victor. This represents the sound schwa, which is a generic unstressed vowel. If you pay attention to how people actually speak, rather than how spelling suggests they speak, many vowel sounds fade into a lowest-common-denominator unstressed sound, which is called schwa. The same letter could be transcribed as schwa in some contexts and a specific symbol in different contexts. Consider the word banana – it has three a vowels, and is transcribed as bəˈnɑːnə.
Lastly on the topic of vowels, some vowel sounds can be thought of as two simpler vowel sounds combined, so it makes sense to transcribe them using two symbols. These vowels are called diphthongs, and are in mike and the end of kilo. It’s easier to notice the two components if you try to hold the sound for a long time, and compare how that feels and sounds to when you hold the first vowel in lima (which is a monophthong).
Limitations of transcription
As well as the things I mention above, there are two bigger problems with transcription. The first is that you need to decide which sounds you’re transcribing (and which you’re not). The sounds I’m transcribing here are all Standard English, so foxtrot is transcribed as fɒkstrɒt. People who use different versions of English would say these words differently, for instance by lengthening the first vowel or by using a glottal stop instead of the last consonant. For simplicity I’m leaving out other versions of English.
The second problem is the assumption that you can think of speech sounds as being able to be formed of smaller parts joined together, as with Lego bricks. Normally these parts are referred to as phonemes. Unfortunately this is an approximation that doesn’t always work (at least not with extra detail added to the phonemes). For instance, if you look at the words cap and cab, they are very similar which suggests we need only 4 phonemes / Lego bricks to make them.
They start with the same consonant, have the same vowel, and the last consonants are different but similar – your lips, tongue etc. are doing the same thing, but your vocal chords are turned on for different amounts of time (they are a voiced / unvoiced pair). The difference in voicing of the following consonant affects the duration of the vowel sound – it’s longer in cab than it is in cap. If you were, for instance, trying to make a speech synthesiser its output would sound odd if you didn’t include this sort of detail.
The words pit and pin are also similar but not identical. Not only does the voicing of the following n lengthen the vowel in pin, your mouth gets ready to make the nasal n sound while it’s still making the vowel sound. This means the vowel is a bit more nasal than it is in pit. I.e. the duration and nasal-ness of the two vowel sounds are different if you look at them closely enough. Simply snapping together a standard set of phonemes doesn’t capture this detail.
Alternative transcriptions
When I started thinking about doing this comparison, I looked for ways of transcribing sounds that used only ASCII characters rather than the more exotic characters used in the IPA. This led me to Soundex, Metaphone etc. They are useful for some applications, but not for this one. The major problem is that they leave out vowels like Biblical Hebrew (other than when a word starts with a vowel). This means that the Metaphone transcription BT represents the sounds of all these words: bat, bait, bet, bit, bite, but, bot, boat, bought, batter, better, biter, bitter, butter (and probably others I haven’t thought of). I needed something that included all vowel sounds as well as the consonants, so I chose IPA.
Comparison
The way transcriptions were compared was to compute the Levenshtein distance between pairs of transcriptions. (Please see the linked article for details.) In summary – it’s a count of the number of characters that need to be added, removed or changed to turn one word into the other. So a word compared with itself produces a distance of 0, and the more different the two words are the higher the number. The largest possible distance is the length of the longer word.
This is where we remember why the details of the transcription matter. For instance, turning a long vowel such as iː into a different short vowel such as ɒ adds 2 to the distance because we need to sort out the length marker as well as the main symbol for the vowel. If a long vowel were indicated by making its symbol upper case then the distance would only be 1. Also, it treats all transcription symbol differences as equal (everything costs 1), and pays no attention to the underlying linguistics. Vowels can be plotted along two axes depending on where the sound is made in the mouth, which can mean some vowel sounds are more similar to each other than others. A simple Levenshtein comparison ignores this.
Results
How similar each word is to the other words in the alphabet is shown in the following table (no, it’s not an attempt to create a new tartan).

Most words have two syllables – golf and mike are the only one-syllable words, and india, juliet, november and uniform have three syllables. The word with the lowest average distance is oscar with 5.16 and the word with the highest average distance is uniform with 8.28.
There’s a set of words that all end in schwa – alpha, delta, india, lima, november, oscar, papa, sierra, and victor. There are two words that end in ki – whiskey and yankee. Five words end in əʊ – bravo, echo, kilo, romeo, and tango. There are two words that start with the same sound – echo and xray both start with ɛk (which is the beginning of the ɛks syllable in xray).
The following diagram shows all the nearest neighbours for each word.

The lines have arrows on for a reason: just because B is the nearest thing to A, A isn’t necessarily the nearest thing to C. If you had three things – A, B, and C – in a line, where the gap between A and B is bigger than the gap between B and C, then B is the nearest thing to A but C (and not A) is the nearest thing to B.
In the middle of the diagram you can see the words that all end in schwa, and in the bottom right are the two words that end in ki. In the top left are a set of words that all end in əʊ (in with words that don’t).
Trying to be more confusing
Several years ago, some friends and I tried to come up with an alternative to the radio alphabet that was as unhelpful as possible. I am probably mangling it, but it was this kind of thing:
- A: Are
- B: Bee
- C: Cess
- E: Eye
- I: I
- K: Knight
- M: Mnemonic
- P: Ptarmigan
- S: Sea
- T: Tea
- W: Wrong
- X: Xylophone
I’m glad that the standard one is less confusing.
Summary
This is a far from scientifically rigorous analysis of how similar the sounds of these words are to each other, as I mentioned earlier. The limitations of how sounds are transcribed and compared are two big problems. Nonetheless I found it interesting to look at the two diagrams to see how words cluster together.