Skip to content

[audioworklet] Number of channels in output buffers #1325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hoch opened this issue Sep 18, 2017 · 21 comments
Closed

[audioworklet] Number of channels in output buffers #1325

hoch opened this issue Sep 18, 2017 · 21 comments
Assignees
Milestone

Comments

@hoch
Copy link
Member

hoch commented Sep 18, 2017

Consider the following:

// In AudioWorkletProcessor class definition.
process(inputs, outputs, parameters) {
  for (let i = 0; i < inputs.length; ++i) {
    let inputBuffer = inputs[i];
    let inputChannelCount = inputBuffer.channel;
  }
  for (let o = 0; o < outputs.length; ++o) {
    let outputBuffer = outputs[o];
    let outputChannelCount = outputBuffer.channel;
  }
}

In the example above, what you get from inputChannelCount is determined by the channel count computing algorithm. (spec) However, what you get for outputChannelCount is not well-defined.

The simplest approach is making the matching input/output buffers have the same channel count. For example, if inputChannelCount gives you n, you can expected outputChannelCount to be n.

Although there are more elegant solutions for this, I am against creating AudioBuffer objects (filling in outputs every render quantum) in the process method because that will generate garbages fast and furious.

@hoch hoch self-assigned this Sep 18, 2017
@joeberkovitz joeberkovitz added this to the Web Audio V1 milestone Sep 21, 2017
@hoch
Copy link
Member Author

hoch commented Sep 21, 2017

We can utilize AudioWorkletNodeOptions to handle the channel configuration. Consider this example:

dictionary AudioWorkletNodeOptions : AudioNodeOptions {
  unsigned long numberOfInputs = 1;
  unsigned long numberOfOutputs = 1;
  sequence<unsigned long> outputChannelCount = [];
  record<DOMString, double> parameterData;
}

Consider these 4 cases:

1 input, 1 output

Similar to GainNode. The input channel count is computed by AudioNodeOptions values and the output channel count follows it.

n inputs, 1 output

Similar to ChannelMergerNode. The input channel count will be identical for all inputs, once it's computed by AudioNodeOptions values. The output channel count must be specified by outputChannelCount with the length of 1.

1 input, m outputs

Similar to ChannelSplitterNode. The input channel count is computed by AudioNodeOptions values. The channel count of each output must be specified by outputChannelCount with the length of m.

n input, m outputs

The input channel count will be identical for all inputs, computed by AudioNodeOptions values. The channel count of each output must be specified by outputChannelCount with the length of m.

To sum up, the dynamic channel configuration will not be supported on the output side when you have multiple inputs or outputs. Also outputChannelCount MUST be specified for such case.

@padenot
Copy link
Member

padenot commented Sep 22, 2017

Looks like it would work. Does this extend to 0 input n output (multi-channel source), or n input 0 output (multi-channel analyser)? I think it would works well.

@hoch
Copy link
Member Author

hoch commented Sep 22, 2017

  1. We should throw on (0 input && 0 output).
  2. For 0 input and m outputs, nothing to do with inputs. The output channel count is configured by the user-supplied info.
  3. For n inputs and 0 output, the input channel count is computed by the node options. Also nothing to do with outputs.

Yeah, I think it works correctly.

@joeberkovitz
Copy link

The thrust of this seems right, but it feels underspecified to me. I think we need to say what happens across this matrix of possibilities in two different cases: 1) where outputChannelCount is specified in AudioWorkletOptions, and 2) where outputChannelCount isn't specified, in which case either an error gets thrown or a default has to be supplied. And in one case (the 1:m case) , there is a reasonable default I think that was overlooked.

So I would propose that the combinations behave like this:

0 input, 0 output

throw an error

0 input, m outputs

use outputChannelCount with length m if specified.
throw if specified, but length doesn't match m
throw if outputChannelCount is not specified.

1 input, 1 output

(these cases are the same as per your observation that AudioNode
use outputChannelCount with length 1 if specified.
throw if specified, but length isn't 1
use computed input channel count if outputChannelCount not specified.

n inputs, 1 output

use outputChannelCount with length 1 if specified.
throw an error if specified, but length isn't 1
throw if outputChannelCount is not specified.

(NOTE: Your comment "The input channel count will be identical for all inputs, once it's computed by AudioNodeOptions values" doesn't seem right -- in the max or clamped-max modes, different inputs on a node can have different numbers of channels according to the channel mixing spec. We cannot to take this behavior away from AudioWorkletNodes -- which means there is no workable default.)

1 input, m outputs

use outputChannelCount with length m if specified
throw an error if specified, but length isn't m
use the computed input channel count for each output if outputChannelCount not specified.

n input, m outputs

use outputChannelCount with length m if specified
throw an error if specified, but length isn't m
throw if outputChannelCount is not specified.

@joeberkovitz
Copy link

One side point: why is the options dictionary called AudioWorkletNodeOptions rather than AudioWorkletOptions? This is different from every other options dictionary.

@hoch
Copy link
Member Author

hoch commented Sep 22, 2017

The thrust of this seems right, but it feels underspecified to me.

We were merely exchanging ideas here, but your last comment shaped this thread into the actual spec text.

NOTE: Your comment "The input channel count will be identical for all inputs, once it's computed by AudioNodeOptions values" doesn't seem right -- in the max or clamped-max modes, different inputs on a node can have different numbers of channels according to the channel mixing spec.

Well, the spec of "up/downmixing" section is not really clear on how multiple inputs are handled. So my interpretation was to compute the channel count once based on the options, and apply that single value for all the inputs. This is where my confusion begins:

"Once this number is computed, all of the connections will be up or down-mixed to that many channels."

I think your interpretation is correct - and we should add more text in the spec to clarify that each input gets to have its own computed channel count.

@joeberkovitz
Copy link

But it seems to me the spec is actually pretty clear that each input has a distinct count. I read the text here: https://webaudio.github.io/web-audio-api/#channel-up-mixing-and-down-mixing

And at the end, what I see is a clear:

"For each input of an AudioNode, an implementation must:

  1. Compute computedNumberOfChannels.
  2. For each connection to the input:
    [etc.]"

The words for each input to me convey that the computation of channel count is separate for each input.

@hoch
Copy link
Member Author

hoch commented Sep 25, 2017

@joeberkovitz Oh, I missed that part. Then I think your comment above nicely sums up the behavior. Should we proceed to an actual PR with the more refined text?

By the way, I think we might encounter some issues because of this dynamic mechanism. This means every render quantum will get a newly created object of sequence<sequence<Float32Array>> which in turn it will generate garbages. The channel count changes dynamically, so I think there is no way to optimize this.

@joeberkovitz
Copy link

@hoch: I don't agree that garbage is inevitable under this dynamic approach. There is a way to optimize this, and I think it's important to include something like the following as a non-normative guideline:

The minimization of garbage in the audio thread is of great importance to the efficiency of AudioNode implementations. In the absence of connectivity changes to an AWN that affect the number of input channels, implementations are strongly encouraged to pool buffers, only allocating new ones when the channel counts are different from those in the previous quantum.

If UAs generate garbage every quantum, it'll be a big step backward.

@hoch
Copy link
Member Author

hoch commented Sep 26, 2017

@joeberkovitz Consider this example:

let evilGlobal;

class myProcessor extends AudioWorkletProcessor {
  constructor() {
    super();
  }
  
  process(inputs, outputs, parameters) {
    if (!evilGlobal)
      evilGlobal = inputs[0]
  }
}

That's where "buffer pooling" fails. Without creating new sequence and Float32Array objects every render quantum, user can cache the reference in the global scope and inspect/use it any time.

From my point of view, The "let's make allocation more efficient" approach doesn't really help this situation. The allocation is allocation, and it'll accumulate rapidly when it happens.

The good news is only one process method can be invoked at any given moment. There is no concurrency here, so we can reuse the pool of Float32Arrays without worrying about garbages. However, as shown above, user can see the array content being changed by the implementation.

Perhaps we should open another issue for this matter.

@joeberkovitz
Copy link

@hoch I'm still not seeing how this presents a problem (and it could be because I'm just plain slow).

If we're worried that people will write broken code that just holds onto the buffer, that can be defeated in the UA by switching between different (but still pooled) sets of buffers, or by re-allocating them every N render quanta where N is large enough to really cut down on the garbage, but small enough to break code that tries to cache buffers.

The sin of allowing code to cache buffers seems far outweighed by the sin of generating lots of garbage on every quantum.

@hoch
Copy link
Member Author

hoch commented Sep 26, 2017

@joeberkovitz I completely agree with you on this, but I am only making the counter argument for the sake of discussion.

The sin of allowing code to cache buffers seems far outweighed by the sin of generating lots of garbage on every quantum.

My concern is whether this "implicit yet visible" change in the buffer is acceptable from the architecture/specification perspective. I haven't seen any other Web API allows this behavior.

Also randomizing underlying scope or storage seems to be rather critical due to the compat/interop issue:

I don't think AudioWorklet poses such problem, but at least we have to address it before too late.

@joeberkovitz
Copy link

Thanks, now I understand better. Yes, deterministic behavior could be a long-term architectural problem and we don't want it to get baked in. I'd agree that it deserves a separate issue.

My sense is that some form of non-determinism would have to be specified regardless of whether it's random scopes, random re-allocations, random pool selections... but I don't think any of these need to change the API that we're designing in this issue right here...

@hoch
Copy link
Member Author

hoch commented Sep 26, 2017

Returning back to the original discussion...

While I was thinking about the efficient way of describing the behavior, I found some more corner cases. The following proposal is slightly different from the @joeberkovitz's idea above, but I think it is simpler:

AudioWorkletNodeOptions.outputChannelCount

  • The range of valid channel count value is between 1 and the UA's maximum number of channels supported. A NotSupportedError exception will be thrown at the invalid channel count.
  • IndexSizeError will be thrown The outputChannelCount.length does not match numberOfOutputs.
  1. 0 input, 0 output
    NotSupportedError

  2. 0 input, 1 output
    Optional. The channel count will be mono if unspecified.

  3. 0 input, m output
    Optional. The channel count will be mono for all outputs if unspecified.

  4. 1 input, 0 output
    N/A

  5. 1 input, 1 output
    Optional. The channel count will match the computed input channel count if unspecified.

  6. 1 input, m output
    Optional. The channel count will match the computed input channel count for all outputs if unspecified.

  7. n input, 0 output
    N/A

  8. n input, 1 output
    Optional. The channel count will be mono if unspecified.

  9. n input, m output
    Optional. The channel count will be mono for all outputs if unspecified.

This way, we can make outputChannelCount optional for the majority of cases while providing user with the sensible default channel count. Also I believe this scheme is easier to grasp.

@rtoy
Copy link
Member

rtoy commented Sep 26, 2017 via email

@rtoy
Copy link
Member

rtoy commented Sep 27, 2017

Hmm. My last comment didn't come out quite right. I think option 6 with 1 input and m outputs should default to mono outputs. Then it defaults to mono for all cases except the 1-input/1-output case.

Also, what does N/A in, say, item 4 really mean? There are no outputs, but what if I specify a set of channel counts? You say earlier that an IndexSizeError is thrown if the number of outputs doesn't match the array length.

@joeberkovitz
Copy link

@rtoy I'll tell you my reasoning for making the 1:m case use computedChannelCount.

The reason is that if you want 1:m to use mono outputs, it's very easy to provide an explicit array [1, 1, ...]. But if you actually want to split a multichannel input into m outputs that are dynamically adjusted to be channel-congruent with the input, there's no way to do that other than to have the node do it for you. So I thought it would make a good default behavior.

@joeberkovitz joeberkovitz added the Needs Discussion The issue needs more discussion before it can be fixed. label Sep 27, 2017
@hoch
Copy link
Member Author

hoch commented Sep 27, 2017

@rtoy By "N/A", I meant outputChannelCount is not applicable there. As you stated, IndexSizeError will be thrown if outputChannelCount.length is non-zero.

@rtoy
Copy link
Member

rtoy commented Sep 27, 2017

@joeberkovitz Nothing wrong with multiple channels. I was just going for default behavior that is simple. I also couldn't (after a few seconds) think of a use-case where I would really want this kind of dynamic output count that couldn't be handled with multiple worklets with1-input/1-output dynamic case.

@mdjp mdjp removed the Needs Discussion The issue needs more discussion before it can be fixed. label Sep 28, 2017
@hoch
Copy link
Member Author

hoch commented Sep 28, 2017

WG resolution: leave 1:M case as mono output for each of the M channels

@joeberkovitz
Copy link

Note: we need to file new issue for v.next to support adjustable multichannel count on AWN w multiple outputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants