-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bizarre Uplink Latency measurements #19
Comments
What synchronization protocol is there between the master and slave? PTP? |
The hardware timestamps are taken on radio interfaces? Do you know what is the timestamping point for these packets, and if that could explain a fundamental delay asymmetry? |
No adjustment being done, and no adjustment should be necessary. The RX - TX should be the one-way path delay. The igb driver however does adjust the reported timestamps. |
Isn't that only true when the two devices are synchronized to 0 ns (i.e. the master and slave are on the exact same time base)? Reading the igb_ptp code, it looks like they just adjust it to account for the MAC to PHY delay? (amount of time it takes for the frame to be sent out onto the wire.) I could be wrong. I have seen online that some people have encountered negative path delays while using the igb driver. Nonetheless, I'm not sure the adjustment made by the igb driver is causing for those drastic changes in latency measurement by isochron, but it's possible. |
Not for the drastic fluctuations, no (since the adjustment value is constant, the error comes from somewhere else). Just for the negative part of it. To see how, let's refer to any diagram of the Pdelay calculation, and apply (asymmetric) timestamp corrections: (t2 - t1) + (t4 - t3) / 2 = Pdelay_1 Pdelay_2 - Pdelay_1 = (B_rx - A_tx + A_rx - B_tx) / 2 In our case, A_rx is IGB_I210_RX_LATENCY_1000 (448) and A_tx is IGB_I210_TX_LATENCY_1000 (178). If the uncorrected Pdelay is smaller than 270 ns, then the corrected one can as well be negative. |
Yeah, I was assuming precise sync. |
Makes sense. So, there's still the drastic fluctuations I'm seeing in the uplink latency. Weirdly enough, this doesn't always happen. Sometimes, I see completely reasonable uplink and downlink latencies, but this is causing me to doubt whether those measurements are legitimate (before and after adjustment by Time Error). I do think it's worth noting that I am also measuring latency in 3 ways. First, I am measuring pdelay from linuxptp (that's the average of uplink and downlink latencies), uplink latency from isochron, and downlink latency from isochron. The downlink latency from isochron, pdelay, and PPS are nearly identical to one another in terms of trend. The confounding variable here is the fact that they are all based on the PTP HW Clock (PHC). I would expect that taking an average of the adjusted uplink latency, and adjusted downlink latency would yield the linuxptp pdelay value, or something near it, But because the downlink latency is always drastically different than the uplink latency, the average of the two is nowhere near the linuxptp pdelay. Because of the asymmetry inherent in 5G networks, the uplink and downlink are completely expected to differ, but their average should be the pdelay. This is probably out of your area of expertise, so my main concern directed toward you is: do you have a theory as to why the uplink latency measured by isochron seems so wildly incorrect (i.e. negative values after adjustment, and drastic fluctuations)? |
I mentioned earlier how I can't use VLAN interfaces due to limitations with my network, which prevents me from using the TAPRIO traffic shaper. Would this make my results meaningless, since isochron expects to have a different traffic class than PTP? Since I'm using this over a 5G network, the HW Timestamp latency is really the only thing I care much about. MAC latencies are negligible compared to the latency over the air. What I'm doing right now is running isochron send and receive on both machines, and calculating the latency between HW Timestamps using |
I'm thinking the apparent negative latency can be caused by PTP, which expects the path delay to be symmetric. Correlating the 2 streams of data might be a problem. |
It depends on what you want to measure. The program doesn't necessarily expect to run alone on a traffic class. |
I am absolutely confused by the latency figures you've posted being so low (<= 2.5 ns), so there is some physical phenomenon I'm not understanding. But delay asymmetry breaks a lot of the math behind PTP and I simply don't know what to expect. |
Hello,
I'm using isochron to measure the uni-directional latency (from master to slave, and slave to master). I am doing this over a 5G network, so the synchronization quality is not very good. I am seeing that sometimes, latency is being measured as negative (as in, the tx timestamp happens after the rx timestamp). I assume this happens because isochron assumes the devices are "synchronized" but not really, since the Time Error is large, so I figured that I just needed to adjust the latency measurement by isochron by the Time Error (measured by PPS).
When looking at the uplink and downlink latencies (calculated by just taking
ts_rx - ts_tx
), I noticed that the uplink latency was very bizzare. It looks like this:We see in the picture above that the latency not only goes negative, but is inversely correlated with the PPS output. Even if I were to adjust by the PPS Time Error, we would still see latencies at certain times where they dip down drastically from one second to the other. Do you have any idea why this would be? Why we see those drastic dips in latency?
I adjusted the latency calculation based on the Time Error measured by the PPS:
t_ms = ts_rx - PPS Time Error - ts_tx
t_sm = ts_rx + PPS Time Error - ts_tx
Where t_sm is slave to master latency as measured by isochron, ts_rx is the HW timestamp on receive, and ts_tx is the HW timestamp on transmit.:
We still see those large dips in latency. Just to be clear, the reason I am wanting to measure uni-directional latency is because I want to measure the delayasymmetry as a function of time. But, I can't trust these measurements as they don't look right.
For reference, the command I'm using for isochron on the sender side in this example is:
sudo isochron send -i enp2s0 -s 64 --client 10.10.10.2 -c 1.0 -t 1 -w 1.0 -F isochron.dat -n 300 -o -O 37 --cpu-mask $((1 << 1)) -4 -J 10.10.10.2 -S 0.0
For practical reasons, I can't use VLAN interfaces, so isochron doesn't have it's own traffic class, because VLAN isn't available to me.
The text was updated successfully, but these errors were encountered: