Merge pull request #59 from bbc/remove_default_trim

Remove default trim
bbc · Jun 19, 2024 · dfb2f1d · dfb2f1d
2 parents f3802ce + ec3c61c
commit dfb2f1d
Show file tree

Hide file tree

Showing 5 changed files with 61 additions and 38 deletions.
diff --git a/README.md b/README.md
@@ -26,8 +26,8 @@ If you are installing on macOS and use the third-party package manager [HomeBrew
 
 You will need to [install FFmpeg](https://ffmpeg.org/download.html) to use the command-line tool, or to use the file-related functions in the library.
 
-Usage
------
+Basic Usage
+-----------
 
 To use the command-line tool:
 
@@ -40,6 +40,27 @@ To use the command-line tool:
     Offset: -12.26 (seconds)
     Standard score: 28.99
 
+Command-Line Options
+--------------------
+The following command-line options can be provided to alter the behaviour of the tool:
+
+| Option | Description |
+| ------ | ----------- |
+| -h, --help  |  Show a help message and exit |
+| --find-offset-of audio file | Find the offset of this file... |
+| --within audio file  |  ...within this file |
+| --sr sample rate |  Target sample rate in Hz during downsampling (default: 8000) |
+| --trim seconds  |  Only use the first n seconds of each audio file |
+| --resolution samples  |  Resolution (maximum accuracy) of search in samples (default: 128) |
+| --show-plot  |  Display a plot of the cross-correlation results |
+| --save-plot filename |  Save a plot of the cross-correlation results to a file (in a format that matches the extension you provide - png, ps, pdf, svg) |
+| --json  |  Output in JSON for further processing |
+
+You can fine-tune the results for your application by tweaking the sample rate, trim and resolution parameters:
+* The _sample rate_ option refers to a resampling operation that is carried out before the audio offset search is carried out.  It does not refer to the sample rate(s) of the audio files being compared.  Resampling at a higher sample rate retains higher audio frequencies, but increases the time required to search for an offset.  The default sample rate is 8000Hz, which is a good compromise for most audio.
+* The audio search is carried out by comparing the two audio files at a given offset, then skipping forward by a certain number of samples and then comparing them again.  This is repeated for all valid positions of one file compared to another, and then the best match is chosen and presented to the user.  The size of the skip is the _resolution_ of the search.  At a sample rate of 8000Hz (the default, as described above), a resolution of 128 samples (also the default) corresponds to a skip size of 128 / 8000 = 0.016 seconds.  This sets a limit on the precision of the offsets calculated by the tool.  You can make the search more precise by decreasing the value of _resolution_, but at the cost of increasing the processing time.
+* An optional _trim_ operation can be carried out before processing.  If you specify a value here, only the given number of seconds from the beginning of each file will be searched for an offset.  This will prevent the tool from finding offsets unless they are somewhat less than the trim size.  It will also prevent the tool from finding offsets unless the similarities between the two audio files are present in the trimmed parts of the files.  If in doubt ensure that you select a trim size at least twice as large as the maximum possible offset, or leave it unspecified (the default) to search the whole range of each file.
+
 To provide additional information about the accuracy of the result in addition to the standard score, the `--show-plot` option shows a plot of the cross-correlation curve, and the `--save-plot` option saves one to a file.  The two options can be used separately, or together if you want to both view the plot and save a copy of it:
 
     $ audio-offset-finder --find-offset-of file2.wav --within file1.wav --show-plot --save-plot example.png
@@ -68,6 +89,7 @@ have in memory.
 
 Testing
 -------
+A number of automated unit tests are included (and run before any pull requests are accepted) to try and validate the basic functionality of the tool in different scenarios.  You can run them yourself by simply installing pytest and running it in this repository's root folder:
 
     $ pytest
 
@@ -95,4 +117,4 @@ For details of how to contribute changes, see [CONTRIBUTING.md](CONTRIBUTING.md)
 The audio files used in the tests were downloaded from
 [Wikimedia Commons](https://commons.wikimedia.org/wiki/Main_Page):
 * [A recording of Tim Berners-Lee](https://commons.wikimedia.org/wiki/File:Tim_Berners-Lee_-_Today_(ffmpeg_FLAC_in_OGG).oga), originally extracted from the 9 July 2008 episode of the BBC [Today programme](https://www.bbc.co.uk/programmes/b00cddwc).  This file is licensed under the [Creative Commons](https://en.wikipedia.org/wiki/en:Creative_Commons) [Attribution 3.0 Unported](https://creativecommons.org/licenses/by/3.0/deed.en) license
-* [A spoken word version of the Wikipedia article on BBC Radio 4](https://commons.wikimedia.org/wiki/File:BBC_Radio_4.ogg).  This file is licensed under the [Creative Commons](https://en.wikipedia.org/wiki/en:Creative_Commons) [Attribution-Share Alike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/deed.en) license, and hence the excerpts created for use in the tests are also covered by that license.
+* [A spoken word version](https://commons.wikimedia.org/wiki/File:BBC_Radio_4.ogg) of [the Wikipedia article on BBC Radio 4](https://en.wikipedia.org/wiki/BBC_Radio_4), spoken and recorded by Wikimedia Commons user [Tom Morris](https://commons.wikimedia.org/wiki/User:Tom_Morris).  This file is licensed under the [Creative Commons](https://en.wikipedia.org/wiki/en:Creative_Commons) [Attribution-Share Alike 3.0 Unported](https://creativecommons.org/licenses/by-sa/3.0/deed.en) license, and hence the excerpts created for use in the tests are also covered by that license.
diff --git a/audio_offset_finder/audio_offset_finder.py b/audio_offset_finder/audio_offset_finder.py
@@ -38,7 +38,7 @@ def mfcc(audio, win_length=256, nfft=512, fs=16000, hop_length=128, numcep=13):
     ]
 
 
-def find_offset_between_files(file1, file2, fs=8000, trim=60 * 15, hop_length=128, win_length=256, nfft=512, max_frames=2000):
+def find_offset_between_files(file1, file2, fs=8000, trim=None, hop_length=128, win_length=256, nfft=512, max_frames=2000):
     """Find the offset time offset between two audio files.
 
     This function takes in two file paths, and (assuming they are media files with a valid audio track)
@@ -54,7 +54,7 @@ def find_offset_between_files(file1, file2, fs=8000, trim=60 * 15, hop_length=12
     fs: int
         The sampling rate that the audio should be resampled to prior to MFCC calculation, in Hz
     trim: int
-        The length to which input files will be truncated before processing, in seconds
+        The length to which input files will be truncated before processing, in seconds.  A value of "None" indicates no trimming.
     hop_length: int
         The number of samples (at the resampled rate "fs") to skip between each calculated MFCC frame
     win_length: int
@@ -209,7 +209,7 @@ def std_mfcc(array):
     return (array - np.mean(array, axis=0)) / np.std(array, axis=0)
 
 
-def convert_and_trim(afile, fs, trim):
+def convert_and_trim(afile, fs, trim=None):
     """Converts the input media to a temporary 16-bit WAV file and trims it to length.
 
     Parameters
@@ -220,6 +220,7 @@ def convert_and_trim(afile, fs, trim):
         The sample rate that the audio should be converted to during the conversion
     trim: float
         The length to which the output audio should be trimmed, in seconds.  (Audio beyond this point will be discarded.)
+        A value of "None" implies no trimming.
 
     Returns
     -------
@@ -228,28 +229,20 @@ def convert_and_trim(afile, fs, trim):
     tmp = tempfile.NamedTemporaryFile(mode="r+b", prefix="offset_", suffix=".wav")
     tmp_name = tmp.name
     tmp.close()
-    psox = Popen(
-        [
-            "ffmpeg",
-            "-loglevel",
-            "error",
-            "-i",
-            afile,
-            "-ac",
-            "1",
-            "-ar",
-            str(fs),
-            "-ss",
-            "0",
-            "-t",
-            str(trim),
-            "-acodec",
-            "pcm_s16le",
-            tmp_name,
-        ],
-        stderr=PIPE,
-        text=True,
-    )
+
+    ffmpeg_command = ["ffmpeg"]
+    ffmpeg_command += ["-loglevel", "error"]
+    ffmpeg_command += ["-i", afile]
+    ffmpeg_command += ["-ac", "1"]
+    ffmpeg_command += ["-ar", str(fs)]
+    ffmpeg_command += ["-ss", "0"]
+    if trim:
+        ffmpeg_command += ["-t", str(trim)]
+    ffmpeg_command += ["-acodec", "pcm_s16le"]
+    ffmpeg_command += [tmp_name]
+
+    psox = Popen(ffmpeg_command, stderr=PIPE, text=True)
+
     stdout, stderr = psox.communicate()
     if psox.returncode != 0:
         raise Exception("FFMpeg failed:\n" + stderr.strip())

diff --git a/audio_offset_finder/cli.py b/audio_offset_finder/cli.py
@@ -30,19 +30,17 @@ def main(argv):
         ),
         formatter_class=argparse.ArgumentDefaultsHelpFormatter,
     )
-    parser.add_argument("--find-offset-of", metavar="audio file", type=str, help="Find the offset of file")
-    parser.add_argument("--within", metavar="audio file", type=str, help="Within file")
-    parser.add_argument("--sr", metavar="sample rate", type=int, default=8000, help="Target sample rate during downsampling")
-    parser.add_argument(
-        "--trim", metavar="seconds", type=int, default=60 * 15, help="Only uses first n seconds of audio files"
-    )
+    parser.add_argument("--find-offset-of", metavar="audio file", type=str, help="Find the offset of this file...")
+    parser.add_argument("--within", metavar="audio file", type=str, help="...within this file.")
+    parser.add_argument("--sr", metavar="sample rate", type=int, default=8000, help="Resample to this rate before searching")
+    parser.add_argument("--trim", metavar="seconds", type=int, help="Only consider the first n seconds of the audio files")
     parser.add_argument(
         "--resolution", metavar="samples", type=int, default=128, help="Resolution (maximum accuracy) of search in samples"
     )
     parser.add_argument("--show-plot", action="store_true", dest="show_plot", help="Display plot of cross-correlation results")
     parser.add_argument(
         "--save-plot",
-        metavar="plot file",
+        metavar="filename",
         dest="plot_file",
         type=str,
         help=("Save a plot of cross-correlation results to a file " "(format matches extension - png, ps, pdf, svg)"),
@@ -53,8 +51,12 @@ def main(argv):
         parser.error("Please provide input audio files")
 
     try:
+        trim = None
+        if args.trim:
+            trim = int(args.trim)
+
         results = find_offset_between_files(
-            args.within, args.find_offset_of, fs=int(args.sr), trim=int(args.trim), hop_length=int(args.resolution)
+            args.within, args.find_offset_of, fs=int(args.sr), trim=trim, hop_length=int(args.resolution)
         )
     except Exception as e:
         print(e, file=sys.stderr)

diff --git a/tests/audio_offset_finder_test.py b/tests/audio_offset_finder_test.py
@@ -52,6 +52,12 @@ def test_find_offset_between_files():
     results = find_offset_between_files(path("timbl_1.mp3"), path("timbl_2.mp3"), hop_length=160, trim=1)
     assert results["standard_score"] == pytest.approx(2.60, rel=1e-2)  # No good results["offset"] found
 
+    results = find_offset_between_files(path("timbl_1.mp3"), path("timbl_2.mp3"), hop_length=160)
+    assert results["time_offset"] == pytest.approx(12.26)
+    assert results["standard_score"] == pytest.approx(
+        30.09, rel=1e-2
+    )  # standard score increases with more audio in cross-correlation
+
     with pytest.raises(InsufficientAudioException):
         find_offset_between_files(path("timbl_1.mp3"), path("timbl_2.mp3"), hop_length=160, trim=0.1)
 

diff --git a/tests/tool_test.py b/tests/tool_test.py
@@ -26,10 +26,10 @@
 
 def test_reorder_correlations():
     input_array1 = np.array([0, 1, 2, 3])
-    np.testing.assert_array_equal(reorder_correlations(input_array1, 2), np.array([2, 3, 0, 1]), 2)
+    np.testing.assert_array_equal(reorder_correlations(input_array1, 2), np.array([2, 3, 0, 1]))
 
     input_array2 = np.array([0, 1, 2, 3, 4])
-    np.testing.assert_array_equal(reorder_correlations(input_array2, 2), np.array([2, 3, 4, 0, 1]), 2)
+    np.testing.assert_array_equal(reorder_correlations(input_array2, 2), np.array([2, 3, 4, 0, 1]))
 
 
 def test_tool():