Skip to main content

How to learn Mandarin tones

A practical guide to the four tones and the neutral tone — what each one sounds like, why minimal pairs trip learners up, and the drilling routine that actually moves the needle.

Why tones are not optional

The single feature that surprises new Mandarin learners most is that the pitch contour of a syllable is not stylistic — it is part of the word. , , , and are four different words: mother, hemp, horse, and to scold. Drop the tone and you have not said the word imprecisely; you have said something else, or nothing at all. Mandarin has roughly 400 distinct syllables before tone, and roughly 1,200 with tone factored in. The tone is doing about a third of the work that pronunciation does.

The good news is that the system is small. There are four tones plus a neutral, and that is it. Once you can produce them in isolation and recognise them in context, the rest is volume.

The four tones, briefly

First tone (高平 — high level)

Hold a steady, high pitch for the duration of the syllable. Think of the way a doctor asks you to say “ah” while they look at your throat — that flat, sustained note. The first tone does not climb or fall. Pinyin mark: ā. Example: (yī, “one”).

Second tone (升 — rising)

Start in the middle of your range and climb to a high pitch. The closest English analogue is the questioning intonation in “What?” The rise should be brisk and committed; a tentative half-rise is the most common failure mode and frequently gets misheard as third tone. Pinyin mark: á. Example: (guó, “country”).

Third tone (降升 — dipping)

Start low, dip lower, and rise again. In citation form (the way it appears in a vocabulary list), the contour is a full V-shape. In running speech the tail of the rise often disappears and the third tone reduces to a sustained low pitch. Many learners over-produce the rising tail; native speakers usually undercook it. Pinyin mark: ǎ. Example: (wǒ, “I”).

Fourth tone (降 — falling)

Start at the top of your range and fall sharply to the bottom. The contour is short, decisive, and slightly louder than the others — think of the English “No!” said firmly. Learners coming from non-tonal languages often under-energise the fourth tone, leaving it sounding like a flat or second-tone syllable. Commit to the drop. Pinyin mark: à. Example: (xià, “down”).

Neutral tone (轻声 — light)

A short, light syllable with no inherent pitch shape — its actual pitch is determined by the tone that came before it. The neutral tone appears on grammatical particles (, , ) and on the second syllable of many compound words (māma, bàba). Pinyin convention: no diacritic. Get used to producing this one as genuinely short, not just “quietly”.

Tone sandhi: the rules that bend

Mandarin has a small number of tone sandhi rules — automatic adjustments that happen when certain tones meet. The two you must internalise early are:

  • Third + third → second + third. When two third tones come back-to-back, the first one is pronounced as a second tone. nǐ hǎo is in fact spoken as ní hǎo. The pinyin is conventionally written with both third tones because that is the citation form, but no native speaker actually says it that way.
  • 不 (bù) and 一 (yī) shift. Both change tone before a fourth tone: , . Most textbooks drill these, but the rule is mechanical enough that you can apply it on autopilot once you notice it.

Other sandhi rules exist (the half-third-tone rule, the optional -before-non-fourth shift), but if you can hear and produce the two above reliably, you are ahead of most intermediate learners.

Why minimal pairs are the bottleneck

Producing tones in isolation, in front of a mirror, is the easy half. The hard half is hearing them in real time when the difference between two words is one tone, both syllables are familiar, and the speaker is moving fast. A few classic pairs:

  • mǎi (买, to buy) versus mài (卖, to sell) — third versus fourth.
  • shuǐjiǎo (水饺, dumplings) versus shuìjiào (睡觉, to sleep) — third+third versus fourth+fourth.
  • wèn (问, to ask) versus wěn (吻, to kiss) — fourth versus third.

Asking the wrong shopkeeper to sell instead of buy is recoverable. Mixing up dumplings and sleep is funny exactly once. The remedy is targeted minimal-pair drilling, not more vocabulary.

A drilling routine that works

The routine that produces measurable improvement in a few weeks has three ingredients, in this order:

  1. Tone-only recognition, daily. Hear a syllable, decide which of the four (or five) tones it carries. Five minutes a day is enough. The goal is to push the recognition latency below half a second — not to be accurate when you have time to think.
  2. Minimal-pair discrimination, weekly. Hear two syllables that differ only in tone and decide which is which. This is the drill that maps most directly onto comprehension in conversation.
  3. Production, with feedback. Read a syllable aloud and have something — a partner, a tutor, or a software meter — tell you whether your contour matched. Production lags recognition by months for most learners, which is fine; do not skip the recognition step trying to catch up.

The HanziFluency tone drill handles steps 1 and 2; the minimal-pair drill isolates the discrimination step against the characters you have already learned, so you are not trying to disambiguate vocabulary and tone at the same time. For step 3, a tutor on iTalki or HelloTalk is faster than any software currently available; budget twenty minutes of speaking feedback per week if you can.

Common pitfalls

  • Treating colour as the answer. Tone-coloured pinyin is a useful crutch for early reading, but if you can only get the tone right when it is colour-coded, you are not yet hearing it. HanziFluency renders tones with both colour and a colourblind-safe glyph (, , ˅, , ·) and the colour can be turned off entirely in Settings once you outgrow it.
  • Memorising the tone with the character but not the syllable. If you can produce mǎi when you see 买, but cannot decide which tone the syllable mai carries when you hear it cold, the tone has not transferred from a visual cue to an auditory one. Work the recognition drill.
  • Practising tones only on familiar words. Recognition on known vocabulary is helpful but easier than the real task, because the lexical context narrows the candidates. Practise on nonsense syllables or unfamiliar words too.

When you are done

You are not aiming for native intuition; you are aiming for a working ear and a reliable mouth. A practical bar: you can take dictation of an unfamiliar two-syllable Mandarin word read at a normal speaking pace and get both syllables and both tones right, eight times out of ten. Most learners reach that bar within three to six months of consistent drilling. After that, tones stop being a discrete thing you study and start blending into ordinary listening practice — which is exactly the point.

Next: HSK levels explained, or jump straight into a tone drill.