Guides·8 min read·April 26, 2026

How to Edit Audio in Video (Sync, Levels, Cleanup)

Audio quality makes or breaks video. Complete guide to editing video audio: syncing external recordings, leveling dialogue, removing noise, and adding music.

Why audio matters more than video quality

Viewers tolerate mediocre video with great audio. They abandon great video with bad audio.

Studies of YouTube and TikTok engagement consistently show that audio problems — background noise, inconsistent levels, distortion, sync issues — drive higher abandonment rates than visual problems of similar severity.

Investing in audio editing has higher returns than investing in better cameras for most creators.

The audio editing pipeline

A complete audio edit covers five stages:

  1. 1Sync — align external audio to video
  2. 2Cleanup — remove noise, breaths, and unwanted sounds
  3. 3Level — balance dialogue volume across the video
  4. 4Mix — blend dialogue, music, and effects
  5. 5Master — final loudness normalization for the platform

Skipping any stage produces noticeably worse output.

Stage 1: Sync external audio

If you record audio separately (lavalier mic, shotgun, recorder), you need to sync it to the camera audio.

The clap method. Clap once at the start of recording. Both camera and external audio capture the clap. Align the spike on both waveforms.

Auto-sync tools. Some editors detect matching audio signatures. Useful when you forgot to clap.

Manual sync. Match a sharp transient (door close, footstep, plosive) on both tracks.

Once synced, mute the camera audio and use the external recording.

Stage 2: Cleanup

Common cleanup tasks:

Remove background noise. Air conditioner hum, refrigerator buzz, traffic. Use noise reduction sparingly — heavy reduction creates artifacts worse than the noise.

Remove breaths and lip smacks. Cut between phrases to remove distracting sounds. Do not cut every breath — some breathing sounds natural.

Remove filler words. Um, uh, like. Automated tools make this fast.

Remove plosives. P and B sounds that create pops. Volume automation on the affected syllable, or a high-pass filter at 100Hz.

Stage 3: Level dialogue

Volume should be consistent across the entire video. Sudden loud or quiet sections feel amateur.

Target levels:

  • Dialogue peaks: -6dB to -3dB
  • Dialogue average: -16dB to -12dB
  • Music under dialogue: -24dB to -18dB

Use volume automation or compression to even out delivery. A speaker who whispers then shouts needs both compression (to even dynamics) and gain riding (manual volume changes).

Compression settings for dialogue:

  • Threshold: -18dB
  • Ratio: 3:1
  • Attack: 5ms
  • Release: 50ms

These are starting points. Adjust by ear.

Stage 4: Mix dialogue, music, and effects

Dialogue is primary. Everything else supports dialogue.

Music ducking. Music volume drops when dialogue plays. Most editors automate this.

Sound effects. Match the perspective of the visual. A door close on screen should be loud; the same sound off-screen should be quieter.

Stereo placement. Dialogue typically center. Music can be wider. Effects positioned by visual location.

Frequency balance. Dialogue lives in 200Hz-3kHz. Music should leave that range relatively clear during dialogue.

Stage 5: Master for the platform

Different platforms have different loudness standards.

YouTube: -14 LUFS integrated loudness target.

TikTok/Instagram: -16 to -14 LUFS.

Broadcast: -23 LUFS (EBU R128) or -24 LUFS (US ATSC).

Spotify Canvas: -14 LUFS.

If your video is too quiet, viewers turn it up — then ads blast them. If it is too loud, platforms automatically attenuate, which can hurt perceived quality.

Common audio mistakes

Inconsistent volume. The most common issue. Different scenes at different levels.

Music too loud. Music that competes with dialogue forces viewers to keep adjusting volume.

Excessive noise reduction. Creates underwater or robotic artifacts.

Wrong sample rate. Mismatched sample rates between recordings cause sync drift.

No mastering. Exporting raw mixes without final loudness normalization.

Forgetting captions. Caption everything — a large portion of viewers watch muted.

Audio for muted viewers

70%+ of social media video is watched with sound off. Two implications:

Captions are mandatory. Auto-generate them at minimum.

Visual storytelling matters. Show, do not just tell. Visuals must communicate without audio support.

This does not mean audio does not matter. The 30% who watch with sound have stronger engagement and conversion. But your video must work both ways.

Tools

Browser-based: v8eo handles sync, leveling, and basic cleanup. Caption generation is built in.

Dedicated audio editing: Audacity (free, full-featured), Reaper (paid, professional).

For dialogue-heavy content: Descript pioneered text-based audio editing. v8eo offers similar filler word removal workflow.

Quick checklist

Before exporting any video:

  • [ ] Audio synced to video
  • [ ] Background noise reduced (lightly)
  • [ ] Filler words and breaths cleaned
  • [ ] Dialogue levels consistent
  • [ ] Music ducked under dialogue
  • [ ] Final loudness around -14 LUFS
  • ] [Captions added

Related: Remove filler words automatically | How to edit talking head videos

video audio editoredit audio in videovideo audio editingaudio editing videofix audio in videovideo sound editing

Put it into practice

Open the editor and apply these techniques to your own footage right now. No sign-up required.