Skip to content

Commit

Permalink
Merge pull request #59 from bbc/remove_default_trim
Browse files Browse the repository at this point in the history
Remove default trim
  • Loading branch information
stephenjolly authored Jun 19, 2024
2 parents f3802ce + ec3c61c commit dfb2f1d
Show file tree
Hide file tree
Showing 5 changed files with 61 additions and 38 deletions.
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ If you are installing on macOS and use the third-party package manager [HomeBrew

You will need to [install FFmpeg](https://ffmpeg.org/download.html) to use the command-line tool, or to use the file-related functions in the library.

Usage
-----
Basic Usage
-----------

To use the command-line tool:

Expand All @@ -40,6 +40,27 @@ To use the command-line tool:
Offset: -12.26 (seconds)
Standard score: 28.99

Command-Line Options
--------------------
The following command-line options can be provided to alter the behaviour of the tool:

| Option | Description |
| ------ | ----------- |
| -h, --help | Show a help message and exit |
| --find-offset-of audio file | Find the offset of this file... |
| --within audio file | ...within this file |
| --sr sample rate | Target sample rate in Hz during downsampling (default: 8000) |
| --trim seconds | Only use the first n seconds of each audio file |
| --resolution samples | Resolution (maximum accuracy) of search in samples (default: 128) |
| --show-plot | Display a plot of the cross-correlation results |
| --save-plot filename | Save a plot of the cross-correlation results to a file (in a format that matches the extension you provide - png, ps, pdf, svg) |
| --json | Output in JSON for further processing |

You can fine-tune the results for your application by tweaking the sample rate, trim and resolution parameters:
* The _sample rate_ option refers to a resampling operation that is carried out before the audio offset search is carried out. It does not refer to the sample rate(s) of the audio files being compared. Resampling at a higher sample rate retains higher audio frequencies, but increases the time required to search for an offset. The default sample rate is 8000Hz, which is a good compromise for most audio.
* The audio search is carried out by comparing the two audio files at a given offset, then skipping forward by a certain number of samples and then comparing them again. This is repeated for all valid positions of one file compared to another, and then the best match is chosen and presented to the user. The size of the skip is the _resolution_ of the search. At a sample rate of 8000Hz (the default, as described above), a resolution of 128 samples (also the default) corresponds to a skip size of 128 / 8000 = 0.016 seconds. This sets a limit on the precision of the offsets calculated by the tool. You can make the search more precise by decreasing the value of _resolution_, but at the cost of increasing the processing time.
* An optional _trim_ operation can be carried out before processing. If you specify a value here, only the given number of seconds from the beginning of each file will be searched for an offset. This will prevent the tool from finding offsets unless they are somewhat less than the trim size. It will also prevent the tool from finding offsets unless the similarities between the two audio files are present in the trimmed parts of the files. If in doubt ensure that you select a trim size at least twice as large as the maximum possible offset, or leave it unspecified (the default) to search the whole range of each file.

To provide additional information about the accuracy of the result in addition to the standard score, the `--show-plot` option shows a plot of the cross-correlation curve, and the `--save-plot` option saves one to a file. The two options can be used separately, or together if you want to both view the plot and save a copy of it:

$ audio-offset-finder --find-offset-of file2.wav --within file1.wav --show-plot --save-plot example.png
Expand Down Expand Up @@ -68,6 +89,7 @@ have in memory.

Testing
-------
A number of automated unit tests are included (and run before any pull requests are accepted) to try and validate the basic functionality of the tool in different scenarios. You can run them yourself by simply installing pytest and running it in this repository's root folder:

$ pytest

Expand Down Expand Up @@ -95,4 +117,4 @@ For details of how to contribute changes, see [CONTRIBUTING.md](CONTRIBUTING.md)
The audio files used in the tests were downloaded from
[Wikimedia Commons](https://commons.wikimedia.org/wiki/Main_Page):
* [A recording of Tim Berners-Lee](https://commons.wikimedia.org/wiki/File:Tim_Berners-Lee_-_Today_(ffmpeg_FLAC_in_OGG).oga), originally extracted from the 9 July 2008 episode of the BBC [Today programme](https://www.bbc.co.uk/programmes/b00cddwc). This file is licensed under the [Creative Commons](https://en.wikipedia.org/wiki/en:Creative_Commons) [Attribution 3.0 Unported](https://creativecommons.org/licenses/by/3.0/deed.en) license
* [A spoken word version of the Wikipedia article on BBC Radio 4](https://commons.wikimedia.org/wiki/File:BBC_Radio_4.ogg). This file is licensed under the [Creative Commons](https://en.wikipedia.org/wiki/en:Creative_Commons) [Attribution-Share Alike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/deed.en) license, and hence the excerpts created for use in the tests are also covered by that license.
* [A spoken word version](https://commons.wikimedia.org/wiki/File:BBC_Radio_4.ogg) of [the Wikipedia article on BBC Radio 4](https://en.wikipedia.org/wiki/BBC_Radio_4), spoken and recorded by Wikimedia Commons user [Tom Morris](https://commons.wikimedia.org/wiki/User:Tom_Morris). This file is licensed under the [Creative Commons](https://en.wikipedia.org/wiki/en:Creative_Commons) [Attribution-Share Alike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/deed.en) license, and hence the excerpts created for use in the tests are also covered by that license.
43 changes: 18 additions & 25 deletions audio_offset_finder/audio_offset_finder.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def mfcc(audio, win_length=256, nfft=512, fs=16000, hop_length=128, numcep=13):
]


def find_offset_between_files(file1, file2, fs=8000, trim=60 * 15, hop_length=128, win_length=256, nfft=512, max_frames=2000):
def find_offset_between_files(file1, file2, fs=8000, trim=None, hop_length=128, win_length=256, nfft=512, max_frames=2000):
"""Find the offset time offset between two audio files.
This function takes in two file paths, and (assuming they are media files with a valid audio track)
Expand All @@ -54,7 +54,7 @@ def find_offset_between_files(file1, file2, fs=8000, trim=60 * 15, hop_length=12
fs: int
The sampling rate that the audio should be resampled to prior to MFCC calculation, in Hz
trim: int
The length to which input files will be truncated before processing, in seconds
The length to which input files will be truncated before processing, in seconds. A value of "None" indicates no trimming.
hop_length: int
The number of samples (at the resampled rate "fs") to skip between each calculated MFCC frame
win_length: int
Expand Down Expand Up @@ -209,7 +209,7 @@ def std_mfcc(array):
return (array - np.mean(array, axis=0)) / np.std(array, axis=0)


def convert_and_trim(afile, fs, trim):
def convert_and_trim(afile, fs, trim=None):
"""Converts the input media to a temporary 16-bit WAV file and trims it to length.
Parameters
Expand All @@ -220,6 +220,7 @@ def convert_and_trim(afile, fs, trim):
The sample rate that the audio should be converted to during the conversion
trim: float
The length to which the output audio should be trimmed, in seconds. (Audio beyond this point will be discarded.)
A value of "None" implies no trimming.
Returns
-------
Expand All @@ -228,28 +229,20 @@ def convert_and_trim(afile, fs, trim):
tmp = tempfile.NamedTemporaryFile(mode="r+b", prefix="offset_", suffix=".wav")
tmp_name = tmp.name
tmp.close()
psox = Popen(
[
"ffmpeg",
"-loglevel",
"error",
"-i",
afile,
"-ac",
"1",
"-ar",
str(fs),
"-ss",
"0",
"-t",
str(trim),
"-acodec",
"pcm_s16le",
tmp_name,
],
stderr=PIPE,
text=True,
)

ffmpeg_command = ["ffmpeg"]
ffmpeg_command += ["-loglevel", "error"]
ffmpeg_command += ["-i", afile]
ffmpeg_command += ["-ac", "1"]
ffmpeg_command += ["-ar", str(fs)]
ffmpeg_command += ["-ss", "0"]
if trim:
ffmpeg_command += ["-t", str(trim)]
ffmpeg_command += ["-acodec", "pcm_s16le"]
ffmpeg_command += [tmp_name]

psox = Popen(ffmpeg_command, stderr=PIPE, text=True)

stdout, stderr = psox.communicate()
if psox.returncode != 0:
raise Exception("FFMpeg failed:\n" + stderr.strip())
Expand Down
18 changes: 10 additions & 8 deletions audio_offset_finder/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,19 +30,17 @@ def main(argv):
),
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)
parser.add_argument("--find-offset-of", metavar="audio file", type=str, help="Find the offset of file")
parser.add_argument("--within", metavar="audio file", type=str, help="Within file")
parser.add_argument("--sr", metavar="sample rate", type=int, default=8000, help="Target sample rate during downsampling")
parser.add_argument(
"--trim", metavar="seconds", type=int, default=60 * 15, help="Only uses first n seconds of audio files"
)
parser.add_argument("--find-offset-of", metavar="audio file", type=str, help="Find the offset of this file...")
parser.add_argument("--within", metavar="audio file", type=str, help="...within this file.")
parser.add_argument("--sr", metavar="sample rate", type=int, default=8000, help="Resample to this rate before searching")
parser.add_argument("--trim", metavar="seconds", type=int, help="Only consider the first n seconds of the audio files")
parser.add_argument(
"--resolution", metavar="samples", type=int, default=128, help="Resolution (maximum accuracy) of search in samples"
)
parser.add_argument("--show-plot", action="store_true", dest="show_plot", help="Display plot of cross-correlation results")
parser.add_argument(
"--save-plot",
metavar="plot file",
metavar="filename",
dest="plot_file",
type=str,
help=("Save a plot of cross-correlation results to a file " "(format matches extension - png, ps, pdf, svg)"),
Expand All @@ -53,8 +51,12 @@ def main(argv):
parser.error("Please provide input audio files")

try:
trim = None
if args.trim:
trim = int(args.trim)

results = find_offset_between_files(
args.within, args.find_offset_of, fs=int(args.sr), trim=int(args.trim), hop_length=int(args.resolution)
args.within, args.find_offset_of, fs=int(args.sr), trim=trim, hop_length=int(args.resolution)
)
except Exception as e:
print(e, file=sys.stderr)
Expand Down
6 changes: 6 additions & 0 deletions tests/audio_offset_finder_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ def test_find_offset_between_files():
results = find_offset_between_files(path("timbl_1.mp3"), path("timbl_2.mp3"), hop_length=160, trim=1)
assert results["standard_score"] == pytest.approx(2.60, rel=1e-2) # No good results["offset"] found

results = find_offset_between_files(path("timbl_1.mp3"), path("timbl_2.mp3"), hop_length=160)
assert results["time_offset"] == pytest.approx(12.26)
assert results["standard_score"] == pytest.approx(
30.09, rel=1e-2
) # standard score increases with more audio in cross-correlation

with pytest.raises(InsufficientAudioException):
find_offset_between_files(path("timbl_1.mp3"), path("timbl_2.mp3"), hop_length=160, trim=0.1)

Expand Down
4 changes: 2 additions & 2 deletions tests/tool_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@

def test_reorder_correlations():
input_array1 = np.array([0, 1, 2, 3])
np.testing.assert_array_equal(reorder_correlations(input_array1, 2), np.array([2, 3, 0, 1]), 2)
np.testing.assert_array_equal(reorder_correlations(input_array1, 2), np.array([2, 3, 0, 1]))

input_array2 = np.array([0, 1, 2, 3, 4])
np.testing.assert_array_equal(reorder_correlations(input_array2, 2), np.array([2, 3, 4, 0, 1]), 2)
np.testing.assert_array_equal(reorder_correlations(input_array2, 2), np.array([2, 3, 4, 0, 1]))


def test_tool():
Expand Down

0 comments on commit dfb2f1d

Please sign in to comment.