documentation

alanngnet · Mar 20, 2024 · edfa3e9 · edfa3e9
1 parent f93a898
commit edfa3e9
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -78,7 +78,10 @@ The important output from that is full.txt and the cqt_feat subfolder's contents
 3. Run the evaluation script:<br>
 `python3 -m tools.eval_testset pretrained_model data/covers80/dataset.txt data/covers80/dataset.txt` 
 
-CoverHunter only implemented evaluation for the case when query and reference data are identical. But there is an optional 4th parameter for `query_in_ref_path` that would be relevant if query and reference are not identical. See the "query_in_ref" heading below under "Input and Output Files."
+CoverHunter only shared an evaluation example for the case when query and reference data are identical(!). But there is an optional 4th parameter for `query_in_ref_path` that would be relevant if query and reference are not identical. See the "query_in_ref" heading below under "Input and Output Files."
+
+This script saves embeddings generated by the `eval_for_map_with_feat` function as .npy arrays in a new subfolder of the `pretrained_model` folder named `embed_{epoch#}_tmp` where epoch# is the highest-numbered epoch subfolder in the `pretrained_model` folder.
+
 
 ## Coarse-to-Fine Alignment Training
 
@@ -175,6 +178,14 @@ Listed in the order that the script creates them:
 | song_name_num.map | Text file, not used by train.py, maybe not by anything else? |
 | full.txt | See above detailed description.| 
 
+## Training checkpoint output
+
+Using the default configuration, training saves checkpoints after each epoch in the egs/covers80 folder. 
+
+The `pt_model` folder gets two files per epoch: do_000000NN and g_00000NN where NN=epoch number. The do_ files contain the AdamW optimizer state. The g_ files contain the model's state dictionary. "g" might be an abbreviation for "generator" given that a transfomer architecture is involved?
+
+The `eval_for_map_with_feat()` function, called at the end of each epoch, also saves the embeddings for every sample in the training data, plus the embeddings for time-chunked sections of those samples, named with a suffix of ...__start-N.npy where N is the timecode in seconds of where the chunk starts. These arrays are placed in a `query_embed` subfolder. An accompanying `query.txt` is also saved (with an identical copy as `ref.txt`) which is text file listing the attributes of every utt represented in the `query_embed` subfolder, following the same format as described above for `full.txt`.
+
 ## query_in_ref
 
 The file you can prepare for the tools/eval_testset.py script to pass as the 4th parameter `query_in_ref_path` (CoverHunter did not provide an example file or documentation) assumes:

diff --git a/src/eval_testset.py b/src/eval_testset.py
@@ -255,8 +255,8 @@ def eval_for_map_with_feat(hp, model, embed_dir, query_path, ref_path,
     hp: dict contains hparams
     model: nnet model, should have method 'infer'
     embed_dir: dir for saving embedding, None for not saving anything
-    query_path: contains query info
-    ref_path: contains ref info
+    query_path: text file with query utt info
+    ref_path: text file with ref utt info
     query_in_ref_path: path to prepared query in ref index. None means that
         query index equals ref index
     batch_size: for nnet infer
@@ -318,12 +318,13 @@ def eval_for_map_with_feat(hp, model, embed_dir, query_path, ref_path,
   query_embed_dir = os.path.join(embed_dir, "query_embed")
   query_chunk_lines = _cut_lines_with_dur(query_lines, chunk_s, query_embed_dir)
   write_lines(os.path.join(embed_dir, "query.txt"), query_chunk_lines, False)
+  # select query utts for which there is not yet a saved embedding
   to_cal_lines = [l for l in query_chunk_lines
                   if not os.path.exists(line_to_dict(l)["embed"])]
   if logger:
     logger.info("query chunk lines: {}, to compute lines: {}".format(
       len(query_chunk_lines), len(to_cal_lines)))
-
+  # generate any missing embeddings 
   if len(to_cal_lines) > 0:
     data_loader = DataLoader(AudioFeatDataset(hp, data_lines=to_cal_lines,
                                               mode="defined",
@@ -341,6 +342,7 @@ def eval_for_map_with_feat(hp, model, embed_dir, query_path, ref_path,
   ref_chunk_lines = _cut_lines_with_dur(ref_lines, chunk_s, ref_embed_dir)
   write_lines(os.path.join(embed_dir, "ref.txt"), ref_chunk_lines, False)
   if ref_path != query_path:
+    # select ref utts for which there is not yet a saved embedding
     to_cal_lines = [l for l in ref_chunk_lines
                     if not os.path.exists(line_to_dict(l)["embed"])]
     if logger: