
Some audiobooks open with a 45-second publisher introduction before every single chapter. Some podcast archives have the same sponsor read baked into every episode. If you have a 12-hour file and that jingle plays 80 times, you're going to hear it 80 times — unless you do something about it.
Audio Excise does something about it.
Give it a short reference clip and a long MP3, and it finds every matching occurrence in the file and cuts them all out. The result is a clean file with no re-encoding, no quality loss, and no manual editing.
Usage
bash run.sh --intro reference.mp3 --input long_file.mp3
That produces
long_file.clean.mp3. The reference clip can be anything you can isolate — trim 10 seconds of the jingle from one of the chapters and hand it over. Audio Excise takes care of the rest.
Three parameters you can tune if needed:
--threshold 0.7 # match confidence, 0.0–1.0 (default 0.7)
--padding_ms 100 # milliseconds of buffer around each cut
--min_gap_sec 30 # minimum seconds between two detected matches
The threshold is the one worth adjusting. A value of 0.7 catches clear matches while ignoring background noise coincidences. If the tool is missing occurrences, lower it slightly. If it is cutting things it shouldn't, raise it.
How it works
The detection is FFT-based normalized cross-correlation — the same fundamental technique used in audio fingerprinting and signal processing. The reference clip gets converted to a mono 8 kHz float32 signal, and the long file streams through in 5-minute chunks. For each chunk,
scipy.signal.fftconvolve computes the correlation and finds peaks above the threshold. Only peak positions get stored, not full audio buffers, so multi-hour files stay memory-efficient throughout.
Once all positions are collected, FFmpeg does the actual cutting. Rather than re-encoding, it copies the audio stream directly between detected segments and concatenates the pieces. The codec never touches the samples — no generation loss, no re-compression artifacts. Whatever quality the original had is exactly what the output has.
Installation
Python 3.9+ and FFmpeg are the only prerequisites. On macOS:
brew install ffmpeg
On Linux:
apt install ffmpeg
The first run sets up a virtual environment and installs NumPy and SciPy automatically. No manual pip install needed.
What it is good for
The obvious cases are audiobooks and podcasts — long files with predictable repeated patterns. But anything that follows that structure works: lecture recordings with institutional bumpers, radio show archives, training materials with repeated intro sequences. If the reference clip is consistent enough that your ears can recognize it every time, the cross-correlation will too.
The source is on GitHub at simonjenny/audio-excise. MIT licensed.