Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very choppy playback when building MumbleGuiClient #52

Open
trevorirwin opened this issue Mar 9, 2020 · 25 comments
Open

Very choppy playback when building MumbleGuiClient #52

trevorirwin opened this issue Mar 9, 2020 · 25 comments

Comments

@trevorirwin
Copy link
Contributor

trevorirwin commented Mar 9, 2020

Hi,

When building MumbleGuiClient with Visual Studio 2019, I'm getting very choppy playback. I would estimate it at a fraction of a second (so maybe a handful of packets at most) followed by silence, ad nauseum. Recording seems to work fine, as other clients are receiving transmissions from MumbleGuiClient loud and clear.

Could this be a severe case of jitter, as in #46, or is this something more extreme? Perhaps an issue with the build?

@martindevans
Copy link
Owner

Perhaps try putting some logging into the Audio Decoding Buffer to see if it's running out of samples to play back.

@trevorirwin
Copy link
Contributor Author

@martindevans I’m trying to understand the structure of the Audio Decoding Buffer. Should I be logging in Read()? Am I correct in my understanding that that function moves data out of the “decoded buffer”, and effectively “hands it off” to NAudio?

@martindevans
Copy link
Owner

Yeah basically encoded audio gets dumped into this buffer when it arrives from the network and then the audio devices pulls from it (in Read) when it needs something to play back.

@trevorirwin
Copy link
Contributor Author

trevorirwin commented Mar 10, 2020

Is anyone else experiencing this behavior? It would be good to know if the bug is widespread, or if it's just in my build. (For what it's worth, all unit tests passed.)

Some logging reveals that even with constant transmission of a stream of noise (blowing into the mic) on another client, the readCount value is usually zero, and FillBuffer() is usually returning false. readCount is non-zero for one "frame" almost exactly every second. If I stop and start transmitting, it will get a couple more "frames" and then continue with the once-per-second behavior.

edit: Curiously, it looks like the buffer is sized to be about 1 second long... coincidence? I think not.

@martindevans
Copy link
Owner

That's definitely the problem then, every time FillBuffer() returns false is a failure that results in audio artifacts.

Fixing it ideally requires a jitter buffer (as I described here). A simpler workaround would be to delay the playback of all speech by a fixed amount (say 100ms), to allow the buffer to fill up a bit.

@trevorirwin
Copy link
Contributor Author

Do you have any insight as to what an implementation of that delay might look like? I'd be happy to do it.

@martindevans
Copy link
Owner

I'm afraid I can't remember the details of the playback pipeline (it's been a long time since I wrote this). I'd suggest finding all uses of AudioDecodingBuffer and tracing it out from there.

I think that it's implemented as something just reads from that buffer and plays what it gets. In this case what you'd want to do is make the buffer a little more intelligent so you count up writes, and only start returning audio when writes > threshold. As long as reads (happening in realtime) and writes (also happening in realtime) stay in sync you're ok. The tricky bit will be handling when writes stop (due to the remote end stopping transmission), you'll need to detect this case and reset your counter so the buffer has chance to fill up a bit again when they start talking again.

@trevorirwin
Copy link
Contributor Author

trevorirwin commented Mar 10, 2020

Thanks for the pointers. After some additional logging and experimentation, it looks like the rate at which the _decodedBuffer fills with network data is considerably slower than the rate at which it's consumed, regardless of whether there is any delay or jitter buffer.

If we look at the line where the array _decodedBuffer is defined, we see this:
_decodedBuffer = new byte[sampleRate * (sampleBits / 8) * sampleChannels];
Which when built and run on my machine, comes out to 96000, or twice the sample rate of my default hardware (48000). So this tells me the buffer is intended to hold one second of audio for two channels.

However, if I have the Read method simply call FillBuffer every time it's called and output the number of bytes in the buffer, I find that the buffer only fills at a rate of ~5760 bytes/sec. Which is 1/8th of 48000 46080. Again, not a coincidence I'm sure.

@martindevans
Copy link
Owner

Ah, that's a bigger issue than a simple jitter/timing problem!

Which application are you using to play back audio (the core MumbleSharp library itself doesn't handle playback iirc)?

@trevorirwin
Copy link
Contributor Author

I'm building the MumbleGuiClient example so... NAudio? If I'm understanding your question correctly.

@trevorirwin
Copy link
Contributor Author

Also, math error: 5760 is close to 1/8 of 48000 but not quite. It's actually 46080. Maybe not wise to assume a correlation between those two numbers yet.

@trevorirwin
Copy link
Contributor Author

trevorirwin commented Mar 10, 2020

So, some notes:
The second client I was using to transmit voice (the existing Mumble for iOS app) was set to use the "Low" quality preset (16 kbit/s, 60 ms audio per packet.)

I tried implementing a delay by counting writes, but maybe I wasn't doing it very well. Somewhere in this process, I set the quality back to "Normal" (40 kbit/s, 20 ms audio per packet.)

I then tried implementing the workaround described in #46. At the "Normal" quality setting, I'm getting good, clear audio! Jitter-free, I think. However, at the "Low" quality setting, I'm getting the behavior I originally described.

So, we've now teased out that the source of my woes are actually two separate issues!

  • Without a jitter buffer, audio is unintelligible in the sample MumbleGuiClient
  • Audio packet lengths other than 20 ms are not currently supported

I'll open a separate issue for the packet length problem.

edit: After more testing, the "High" quality setting works too. Might be the bitrate and not the packet audio length?

@trevorirwin
Copy link
Contributor Author

trevorirwin commented Mar 10, 2020

I've implemented a version of the "dynamic" jitter buffer described by kestasjk in #46. My work is in this branch of my fork. This implementation of the "jitter buffer" is contained entirely within AudioDecodingBuffer.cs, rather than relying on Model/User.cs and the EncodedVoiceDelegate, so it shouldn't require any work to set up from end-users of the library.

It needs more testing, and also some cleanup. I'd love some feedback on the following:

  1. Should I be using System.Diagnostics.Stopwatch instead of DateTime for measuring the elapsed time? Does precision really matter here? Doesn't seem to.
  2. How should the time delay be made configurable? Where should it be "broken out" for access by library users? Should it be static, or configurable per-buffer (and therefore per-Mumble user)?
  3. Speaking of time delay, looking at it in more detail, I've noticed that I turn on isJitterDetected when readCount == 0. However, readCount will always be zero when a transmission starts because FillBuffer() will not yet have been called. So, while in theory this could catch readCount dropping to zero mid-transmission and act as a "dynamic jitter buffer", for the most part it is just going to create a constant delay. Huh!

@martindevans
Copy link
Owner

  1. Ideally rather than using real time (stopwatch or DateTime) you'd be using the sample counts as that's the measure of time the audio system cares about. It also makes it easier to unit test as you don't have to worry about trying to fake time in a test!
  2. Making the delay configurable would be convenient, in the longer term we could measure network connection quality for each user and expand the jitter buffer for worse connections. I would probably expose this as a property on the buffer which can be set any time, but is only applied when no audio stream is active (changing the size of the buffer while audio is flowing is tricky).
  3. Just from a skim of your code it looks like you're using isJitterDetected to indicate if you're in a buffer-filling phase (i.e. artifically returning silence to let the buffer fill). This seems like a pretty reasonable approach :)

@trevorirwin
Copy link
Contributor Author

I appreciate the feedback. I'll try to incorporate these improvements as I continue work on my app. Would you like me to submit a PR for the jitter buffer in it's current state anyway, given that the example apps are unintelligible without it? (And perhaps open a new issue thread for "jitter buffer improvements")?

@martindevans
Copy link
Owner

Sure, it sounds like it's at least an improvement on what we've got!

@Grandpa-G
Copy link

Did this jitter problem get fixed? I am now trying the Gui client and it is very choppy. Text message work fine.

@trevorirwin
Copy link
Contributor Author

My trivial jitter buffer was merged in #54, so it should be better. But it doesn’t dynamically scale with latency or anything. There’s a property in AudioEncodingBuffer that should let you adjust the buffer time, maybe play with that?

@Grandpa-G
Copy link

What value should be adjusted? jitterMillis? Make it bigger or smaller?

@trevorirwin
Copy link
Contributor Author

You’ll want your application to set the value of the JitterDelay property on the AudioDecodingBuffer. Larger values will give you more buffer time, and smoother playback, at the cost of more latency.

jitterMillis is overridden by JitterDelay, so don’t modify that directly.

@Grandpa-G
Copy link

Grandpa-G commented Jul 4, 2020

I am just trying to get two instances of the MumbleGuiClient to talk to each other, the audio is very choppy. I took the JitterDelay up to 800, got a lot of latency, but still didn't get all the voice. Receiving on a standard Mumble client worked fine.

@kestasjk
Copy link

kestasjk commented Jul 5, 2020 via email

@dextercai
Copy link

I have checked the code from the other repo which implementing with unity. I don't understand, in BananaHemic/Mumble-Unity, we just need put the audio buffer into the AudioSource (provided by UnitySDK), then the sound played very smoothly. Is AudioSource doing the jitter buffer work for us implicitly?

@martindevans
Copy link
Owner

It's a fine proof of concept or starting point, but it's a development project not something to take and use.

That's a very fair assessment of the state of this project. Everything not to do with voice is pretty solid (text message, channel membership etc) but all the audio stuff is very much in need of more development!

Is AudioSource doing the jitter buffer work for us implicitly?

The AudioSource itself is not doing the jitter buffering. Instead that appears to be implemented here - the buffer returns silence until it has got InitialSampleBuffer number of packets of audio stored in it. This is a fixed size jitter buffer.

@Meetsch Meetsch added this to the Voice Support milestone Mar 22, 2021
@StarGate01
Copy link

Are there any updates on this? I would be interested in helping out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants