jbuf: fix for adaptive mode #960

cspiel1 · 2023-09-21T06:02:54Z

The adaptive mode was broken. Video frames are not relevant for the wish size. The estimation of the number of frames was not accurate because key frames have much higher #frame/#packet ratio then differential frames.

The wish size depends only on rdiff, the measure for how much packet sequence number differ from the expected sequence number if packets arrive out of order.

jbuf: wish size is given in #packets
test: jbuf packets with equal timestamps
jbuf: trace data for plot
jbuf: return EAGAIN only if wish size is exceeded
jbuf: faster reduce wish size in adaptive mode
jbuf: simplify rdiff computation

The plots generated from baresip/tools/jbuf of PR baresip/baresip#2724 visualize how the adaptive mode works.

The wish size for the adaptive mode depends on how much the packets are re- ordered. It can only be measured by looking on the sequence number of the RTP header and how much it jumps around because of re-ordered packets. Counting video frames is not needed and causes failures when the wish size is computed.

cspiel1 · 2023-09-21T06:34:38Z

macos tests are failing. This has nothing to do with this PR. @sreimers please could you have a look?

sreimers · 2023-09-21T06:38:17Z

But you want to specify for video how many frames are held. As you say, a video frame can be x packets (depending on keyframe and frame changes) and you want to make sure that a frame is ready (all packets in order) before decoding starts.

cspiel1 · 2023-09-21T06:50:21Z

I think it is not relevant if the frame is ready. The decoder should know what to do if the frame is not complete. It waits for more packets.
The jbuf should not differ between audio and video. And observing the timestamp currently has no other application and can be ignored.

The only application of the jbuf is to re-order packets. Thus observing the sequence number is enough.

edit:
Please look the plots: baresip/baresip#2724
The adaptive mode works again and the decoded video looks good with modules avcodec and x11.
I tested with: jitter simulation of 50ms. See: https://github.com/baresip/baresip/pull/2724/files#diff-40cd15ad2ef562a2c2cc1079ca43315fd35425f2e864c7dda28dd864c8753f96R14

sreimers · 2023-09-21T07:52:42Z

Ok I can move the logic with correct video frame handling also outside of jbuf I think, to keep jbuf simple.

cspiel1 · 2023-09-21T08:54:51Z

Is it really relevant if the chunks of a video frame follow very quickly or with small interruptions (waiting for next RTP packets) to the decoder?

sreimers · 2023-09-21T09:00:02Z

It is important that a frame is complete, because otherwise, say the last packet (or new frame packet) arrives before it is finished (since jbuf doesn't keep all relevant packets by default), the entire frame (all video chunks) is corrupted. Since keyframes can be hundreds of packets in a short period of time, the probability is high.

sreimers · 2023-09-21T09:06:30Z

Here is a example how pjsip handles this:

https://github.com/pjsip/pjproject/blob/master/pjmedia/src/pjmedia/vid_stream.c#L924-L981

sreimers · 2023-09-21T09:34:22Z

Another use-case is Generic RTP Feedback - RTPFB (RTCP), since the encoder can try to resend lost packets (makes only sense for video). Therefore the jitter buffer has to held all packets that belongs to a video frame until they arrive to order the packets.

cspiel1 · 2023-09-21T09:51:26Z

In a next PR, we could implement this:

jbuf_put() could remember timestamps where a sequence number is already missing
jbuf_put() resets remembered timestamps if a missing packet arrives
jbuf_get() holds back packets with such timestamps (for a while)
The upper layer then could do the RTPFB (RTCP) trick.

We could enable this behavior for jbuf by a setter and enable it only for video streams.

sreimers · 2023-09-21T10:11:17Z

Not sure if understand the apis correctly, but jbuf has to keep always a whole frame (at least one or more), otherwise a video frame get's easily corrupted as mentioned (if you move only chunks to the decoder).

The adaptive mode wish size calculation based on packets, makes not much sense for video I think.

Edit:
Keep in mind, say you have a video with 25 fps. So every 40ms a frame. One frame could be 1 Packet or maybe 1000 Packets or more. So you can not calculate based on packets. With a jitter of 50ms you have to keep at least two frames (80ms) to avoid corrupted frames if reordering happens.

Maybe we should use ptime and timestamp based calculation instead.

cspiel1 · 2023-09-21T12:26:22Z

I do not agree. The job of jbuf is to reorder out-of-order RTP packets. If it is large enough to reorder all out-of-order packets (with some probability) then e.g. video frames will not be corrupted. This is a very simple invariant.

The current implementation in main is wrong and leads to freezing video if the packets arrive out-of-order.

Edit:
Note, that not video frames need to be re-ordered. Instead RTP packets may need to be re-ordered.

sreimers · 2023-09-21T12:54:50Z

The current implementation in main is wrong and leads to freezing video if the packets arrive out-of-order.

Only in adaptive mode?

Note, that not video frames need to be re-ordered. Instead RTP packets may need to be re-ordered

Yes, and that's not the point, let's make this example:

TS: 1, SEQ: 1
TS: 1, SEQ: 2
TS: 2, SEQ: 31 <- out of order
....
TS: 1, SEQ: 29
TS: 1, SEQ: 30

If video decoder consumes chunks of packets from jbuf the frame can be corrupted right? You have to wait at least one frame of packets to ensure all packets in correct order before process within video decoder. For audio it's easy because 1 Frame = 1 Packet.

cspiel1 · 2023-09-21T13:24:41Z

Only in adaptive mode?

Yes, only adaptive mode. Because the computation of the ratio and thus wish size is wrong.

You have to wait at least one frame of packets ...

I think I know what you want and that it is enough to wait until:

we get a packet where TS increases to t+1, and
the buffered packets with timestamp t are a sequence where no sequence number is missing.

We could add this behavior in a following PR.

Edit:
For both audio and video it should also be possible to have a zero latency jbuf in adaptive mode if the network is perfect. This means a jbuf_put() - jbuf_get() combination should pass through the same packet. In baresip stream.c the get follows immediately the put.

Of course only if jbuf min is configured to zero.

As soon as one out-of-order case is detected the jbuf could switch from "zero latency" to the "frame complete" mode.

cspiel1 · 2023-09-21T13:53:30Z

Low latency video is a necessary requirement to reach lip synchronous and low latency audio+video conversation.

sreimers · 2023-09-21T14:07:14Z

You always have to wait for a frame to complete before passing to the video decoder. I would prefer not to hold packets within the video decode module itself (avcodec uses an mbuf for this) until a frame is completely received, so jbuf can handle any out-of-order packets, at least for that frame, this adds no extra latency. jbuf can be emptied at once.

For real lip sync you need rtcp SR and calculate playout buffers to keep audio and video in sync.

cspiel1 · 2023-09-21T14:28:05Z

Okay. In order to decide that a frame with timestamp t is complete

jbuf has to wait until the first packet with timestamp t+1 is received, or
is it possible to wait for the marker bit (for video).

sreimers · 2023-09-21T14:38:28Z

jbuf has to wait until packet with timestamp t+1 is received, or

is it possible to wait for the marker bit (for video).

We should check both, thats what pjsip does, as mentioned earlier:

    /* Quickly see if there may be a full picture in the jitter buffer, and
     * decode them if so. More thorough check will be done in decode_frame().
     */
    ts_diff = pj_ntohl(hdr->ts) - stream->dec_frame.timestamp.u32.lo;
    if (ts_diff != 0 || hdr->m) {

cspiel1 · 2023-09-21T14:50:38Z

Should I add this in this PR?

sreimers · 2023-09-22T05:22:50Z

Currently this PR breaks fixed buffer video frame handling, since it reverts it. See this example:

You want define a min. latency for jitter buffer and audio with at least 100ms, so you know one packet represents 20ms you can configure min. 5 packets. But if you want to archive the same for video you can't specify min packets, since video frames are fragmented. So you have to count/specify frames instead.

sreimers · 2023-09-22T06:36:29Z

The jbuf should not differ between audio and video. And observing the timestamp currently has no other application and can be ignored.

That's not true, see pjsip as mentioned or webrtc modules/video_coding/packet_buffer.cc. A good jitter buffer for video needs frame handling.

sreimers · 2023-09-22T12:36:45Z

jbuf: trace data for plot
Can you make a separate PR for this and instead DEBUG_LEVEL > 6, I would prefer #ifdef RE_JBUF_TRACE to enable explicit.

sreimers · 2023-09-24T21:30:28Z

Another more detailed explanation why video frame handling is so useful in jbuf (besides jitter timing calculations):

If you only handle packets, the decoder pulls packets from jbuf without frame knowledge, so some packets are in the decode buffer now and other packets in jbuf. if a out-of-order packet arrives that is is already consumed/cached by decode, jbuf detects a late packet and it's lost. So a single out-of-order packet will result in the loss of an entire frame, which can be prevented if all the packets that are part of a frame are kept in jbuf.

I added a PR with better average ratio calculation (inspired by webrtc code): #962

cspiel1 · 2023-09-25T08:08:21Z

If you only handle packets, the decoder pulls packets from jbuf without frame knowledge, so some packets are in the decode buffer now and other packets in jbuf. if a out-of-order packet arrives that is is already consumed/cached by decode, jbuf detects a late packet and it's lost. So a single out-of-order packet will result in the loss of an entire frame, which can be prevented if all the packets that are part of a frame are kept in jbuf.

My argument was that if jbuf is large enough to re-order all out-of-order packets, then also the video frames will be correct. Anyway, I'll try to implement a frame counter and a "frame complete" check.

I added a PR with better average ratio calculation (inspired by webrtc code): #962

I will have a look tomorrow.

cspiel1 · 2023-09-26T10:28:01Z

I try to implement a frame processing that guarantees that frames are complete. For now I convert this to draft.

sreimers · 2023-09-26T10:50:38Z

I'm not fully understand why a complete new solution is needed? Can you explain, before putting to much work in to this.

cspiel1 · 2023-09-26T11:43:24Z

The field wish was not meant to ensure that frames are complete. The adaptive mode also was not meant to deal with frames. If we ensure that frames are complete and in order, then maybe the adaptive mode will become obsolete. That's what I guess. But I have to make a proof of concept first.

I saw that the frame/packet ratio is not a good measure to adjust the wish size, even if we compute a moving average.

Further I would avoid linear effort in jbuf_put() and jbuf_get() in order to check if a frame is complete.

I close this and will re-open a draft for discussion.

sreimers · 2023-09-26T12:19:12Z

I think this works best if video decode is pulling by time callback (like a audio source/destination) and frame. At the moment it's packet/frame/push driven.

cspiel1 added 6 commits September 19, 2023 09:41

test: jbuf packets with equal timestamps

7bd4907

jbuf: trace data for plot

c1d8307

jbuf: return EAGAIN only if wish size is exceeded

6bb2817

jbuf: faster reduce wish size in adaptive mode

d7ad3e6

jbuf: simplify rdiff computation

583cffe

This was referenced Sep 21, 2023

config: jitter buffer size given in packets baresip/baresip#2725

Closed

tools: add jitter buffer plots generation baresip/baresip#2724

Closed

cspiel1 marked this pull request as ready for review September 21, 2023 06:35

cspiel1 marked this pull request as draft September 26, 2023 10:28

cspiel1 closed this Sep 26, 2023

cspiel1 deleted the jbuf_only_rtp_packets branch September 27, 2023 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jbuf: fix for adaptive mode #960

jbuf: fix for adaptive mode #960

cspiel1 commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023 •

edited

Loading

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023

sreimers commented Sep 21, 2023 •

edited

Loading

sreimers commented Sep 21, 2023

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023 •

edited

Loading

sreimers commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023 •

edited

Loading

sreimers commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023 •

edited

Loading

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023

sreimers commented Sep 22, 2023

sreimers commented Sep 22, 2023

sreimers commented Sep 22, 2023

sreimers commented Sep 24, 2023

cspiel1 commented Sep 25, 2023

cspiel1 commented Sep 26, 2023

sreimers commented Sep 26, 2023

cspiel1 commented Sep 26, 2023

sreimers commented Sep 26, 2023

jbuf: fix for adaptive mode #960

jbuf: fix for adaptive mode #960

Conversation

cspiel1 commented Sep 21, 2023 • edited Loading

cspiel1 commented Sep 21, 2023

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023 • edited Loading

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023

sreimers commented Sep 21, 2023 • edited Loading

sreimers commented Sep 21, 2023

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023 • edited Loading

sreimers commented Sep 21, 2023 • edited Loading

cspiel1 commented Sep 21, 2023 • edited Loading

sreimers commented Sep 21, 2023 • edited Loading

cspiel1 commented Sep 21, 2023 • edited Loading

cspiel1 commented Sep 21, 2023

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023 • edited Loading

sreimers commented Sep 21, 2023

cspiel1 commented Sep 21, 2023

sreimers commented Sep 22, 2023

sreimers commented Sep 22, 2023

sreimers commented Sep 22, 2023

sreimers commented Sep 24, 2023

cspiel1 commented Sep 25, 2023

cspiel1 commented Sep 26, 2023

sreimers commented Sep 26, 2023

cspiel1 commented Sep 26, 2023

sreimers commented Sep 26, 2023

cspiel1 commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023 •

edited

Loading

sreimers commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023 •

edited

Loading

sreimers commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023 •

edited

Loading

sreimers commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023 •

edited

Loading

cspiel1 commented Sep 21, 2023 •

edited

Loading