
![audio-excise on GitHub](/storage/audio-excise.png)

Some audiobooks open with a 45-second publisher introduction before every single chapter. Some podcast archives have the same sponsor read baked into every episode. If you have a 12-hour file and that jingle plays 80 times, you're going to hear it 80 times — unless you do something about it.

**Audio Excise** does something about it.

Give it a short reference clip and a long MP3, and it finds every matching occurrence in the file and cuts them all out. The result is a clean file with no re-encoding, no quality loss, and no manual editing.

## Usage

```bash
bash run.sh --intro reference.mp3 --input long_file.mp3
```

That produces `long_file.clean.mp3`. The reference clip can be anything you can isolate — trim 10 seconds of the jingle from one of the chapters and hand it over. Audio Excise takes care of the rest.

Three parameters you can tune if needed:

```bash
--threshold 0.7       # match confidence, 0.0–1.0 (default 0.7)
--padding_ms 100      # milliseconds of buffer around each cut
--min_gap_sec 30      # minimum seconds between two detected matches
```

The threshold is the one worth adjusting. A value of 0.7 catches clear matches while ignoring background noise coincidences. If the tool is missing occurrences, lower it slightly. If it is cutting things it shouldn't, raise it.

## How it works

The detection is FFT-based normalized cross-correlation — the same fundamental technique used in audio fingerprinting and signal processing. The reference clip gets converted to a mono 8 kHz float32 signal, and the long file streams through in 5-minute chunks. For each chunk, `scipy.signal.fftconvolve` computes the correlation and finds peaks above the threshold. Only peak positions get stored, not full audio buffers, so multi-hour files stay memory-efficient throughout.

Once all positions are collected, FFmpeg does the actual cutting. Rather than re-encoding, it copies the audio stream directly between detected segments and concatenates the pieces. The codec never touches the samples — no generation loss, no re-compression artifacts. Whatever quality the original had is exactly what the output has.

## Installation

Python 3.9+ and FFmpeg are the only prerequisites. On macOS:

```bash
brew install ffmpeg
```

On Linux:

```bash
apt install ffmpeg
```

The first run sets up a virtual environment and installs NumPy and SciPy automatically. No manual pip install needed.

## What it is good for

The obvious cases are audiobooks and podcasts — long files with predictable repeated patterns. But anything that follows that structure works: lecture recordings with institutional bumpers, radio show archives, training materials with repeated intro sequences. If the reference clip is consistent enough that your ears can recognize it every time, the cross-correlation will too.

The source is on GitHub at [simonjenny/audio-excise](https://github.com/simonjenny/audio-excise). MIT licensed.
