forked from magenta/ddsp
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes
186 lines (124 loc) · 5.76 KB
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
letzter gedanke: discriminator trainieren ohne g zu updaten sollte eigentlich funktionieren (tut es das?)
# Next steps:
- TIMBRE INTERPOLATION
Target: specify a model and two monophonic wav files
Preprocess samples, collect dataset statistics
copy stuff from timbre transfer notebook
- move timbre interpolation notebook to colab
- add sript to download s3
- test gan:
- train discriminator only to see if it works
- test that decoder changes when trained as gan
- implement gated residual block
- 04_hierarchical_timbre_painting
- 05_multi_room_reverb
- implement reverb module to take latent params
# Later:
- correct learning rate, optimizers for discriminator
- timbre painting training stages will not be handled from the trainer loop. instead, trainer will be called for each stage, and we use some generic way to create a model from pretrained modules
- add hyperparams for d_optimizer to trainer
- add hyperparams for d_optimizer to timbrepainting.gin
# Training jobs:
With harmonic audio (urmp-mono) only:
Target: find reverb module that can learn to handle arbitrary room acoustics, find best configuration for standard DDSP Autoencoder
- Standard DDSP Autoencoder with trainable reverb for single room
- Standard DDSP Autoencoder with custom reverb
Target: evaluate extensions for timbre painting (multiple instruments, reverb)
- Timbre Painting on single instrument (original)
- Timbre painting with multiple instruments and custom reverb
- Evaluate Timbre painting with harmonic sine waves and random noise as input channels
Target: DDSP GAN vs extended Timbre Painting Gan
- Train DDSP GAN
Target: Find effect of adversarial loss
- Best model with adversarial loss only
With percussion:
- Timbre Painting + z vector + reverb vector
- DDSP (GAN/AE) + z vector + reverb vector
With singing:
- Timbre Painting + z vector + reverb vector
- DDSP (GAN/AE) + z vector + reverb vector
# Questions
- why is scale_fn included in e.g. Additive synth? why wouldn't I just scale it in the network that produces the input
- where is this scale_fn actually used (by which applications)? Loudness is scaled by default, but isn't it computed by some fixed algorithm?
- difference between loudness and amplitude?
# Random ideas
- learn encoding of voice vs lyrics:
voice component is constant for a training sample over time
lyrics is not averaged over time (i.e. like it is now)
there is some cost to using the lyrics encoder (L2 loss + noise)
# New Classes:
synths.py:
BasicUpsampler
- Like additive synths without harmonics
decoders.py:
TimbrePaintingDecoder(Decoder) <- BasicUpsampler, ParallelWaveGANUpsampler
- Combines a BasicUpsampler and a stack of ParallelWaveGANUpsamplers
Upsampler
- Upsample all conditioning features including input audio to target sample rate, then use dilated conv stacks on them
discriminator.py
Discriminator
- discriminator takes a dict with controls and the target audio
- discriminator can decide which of these info to use by itself
- the evaluated audio is always called "discriminate_audio"
- with this setup, we can model any kind of conditioning information
ParallelWaveGANDiscriminator(Discriminator)
gan.py
Implements the GAN train step, is otherwise like an autoencoder
losses.py:
AdversarialMSELoss(Loss) <- Discriminator
trainer.py:
__init__:
- create discriminator optimizer
- define a learning schedule for the different steps and number of generators to use in each step
scheduler:
- create a tf.function with:
- control flow to execute g or d step
- control flow to set number of upsamplers
- control flow only depends on tf.Tensor objects
step functions of different models:
Autoencoder:
- classical train step
ParallelWaveGANUpsampler:
- can take another upsampler to copy initial weights
- downsamples target to output sample rate
Discriminator:
- classical train_step
TimbrePainting:
- sub models: ParallelWaveGANUpsampler, Discriminator
- upsampler.train_step(batch)
- split batch, reconstructions into audio_real, audio_gen
- discriminator.train_step(audio_real, audio_gen)
- instead of creating a new discriminator and copying the weights, we just continue to use the discriminator we already have
model.py:
the losses dict has get an additional entry: 'discriminator_loss'
# Components:
MonoEncoder: Audio -> (f0, loudness_db, z_timbre, z_reverb)
- Pretrained CREPE + MFCC-RNN
- Pretrained CREPE + Dilated Gated Conv Architecture
MonoDecoder: (f0, loudness_db, z_timbre, z_reverb) -> Audio
- NN and Harmonic+noise
- TP
MonoDiscriminator: (f0, loudness_db, z_timbre, z_reverb, Audio) -> [0, 1]
PolyEncoder: Audio -> (f0, loudness_db, z_timbre, z_reverb)^num_tracks
PolyDecoder: (f0, loudness_db, z_timbre, z_reverb)^num_tracks -> Audio
# Random old notes
MonoAutoencoder
MonoEncoder:
(MFCC) => (z, loudness)
f0 Encoder:
(MFCC) => (f0)
CREPE
FeatureDecoder:
(f0, z) => ('amps', 'harmonic_distribution', 'noise_magnitudes')
Synthesizer:Processor
('f0_hz', 'amps', 'harmonic_distribution', 'noise_magnitudes') => (audio)
MonoTimbreUpsamblingDecoder(GanDecoder): (f0, loudness, z) => (audio)
InitialSampler:Processor
(f0, loudness) => (audio)
ParallelWaveGANUpsampler:
(f0, loudness, z, audio) => (audio)
PolyAutoEncoder
PolyEncoder:
(MFCC) => (z: (N_synth, z_dims, time), f0: (N_synth, time), loudness: (N_synth, time))
PolyDecoder: (f0, loudness, z) => audio
stacked applications of MonoDecoder, with shared weights