Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECP5 / RGMII doesn't meet timing closure #27

Open
ximinity opened this issue Jan 25, 2020 · 14 comments
Open

ECP5 / RGMII doesn't meet timing closure #27

ximinity opened this issue Jan 25, 2020 · 14 comments

Comments

@ximinity
Copy link
Contributor

ximinity commented Jan 25, 2020

The README file mentions the following:

3. Check out /examples/versa_ecp5_udp_loopback for a good practical example of how to get
started with the Liteeth core solo in an FPGA.

I've tested the example in examples/targets/udp_loopback by performing the following steps:

  1. Build example bitstream:

$ ./versa_ecp5.py

Full output: timing.txt
Note that timing closure is not met for crg_clkout.
Versions:
yosys: 3c41599ee1f62e4d77ba630fa1a245ef3fe236fa
nextpnr: 247e18cf027334d5201be00735aa607250e6253d
trellis: e2e10bfdfaa29fed5d19e83dc7460be9880f5af4

  1. Load bitstream to FPGA

$ ./versa_ecp5.py load

  1. Set local ip:
$ sudo ifconfig enp7s0 192.168.1.100 netmask 255.255.255.0

$ ifconfig
enp7s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.100  netmask 255.255.255.0  broadcast 192.168.1.255
        ether e8:6a:64:c7:84:3b  txqueuelen 1000  (Ethernet)
        RX packets 1699  bytes 1050972 (1.0 MiB)
        RX errors 0  dropped 529  overruns 0  frame 0
        TX packets 1641  bytes 119511 (116.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 33
  1. Set ARP entry:
$ sudo arp -s 192.168.1.50 10:e2:d5:00:00:00 -i enp7s0
$ arp -a                                              
? (192.168.1.50) at 10:e2:d5:00:00:00 [ether] PERM on enp7s0
  1. Ping to FPGA:
$ ping 192.168.1.50
PING 192.168.1.50 (192.168.1.50) 56(84) bytes of data.

This gives no response.

  1. Run listener and sender:
    (Note: when the UDP example was added in Versa ecp5 udp loopback example #23 the listener.py and sender.py have had their IP addresses switched from the original example, the current version is the wrong way around)
$ sudo ./listener.py & ./sender.py 
[4] 9009
2020-01-25 20:58:29
2020-01-25 20:58:30
2020-01-25 20:58:30
2020-01-25 20:58:31
2020-01-25 20:58:31
2020-01-25 20:58:32

This also gives no response from the FPGA.
See the following wireshark trace: wireshark

On a final note, both PHY status indicators of the Ethernet interface on the FPGA turn off as soon as I connect the network cable (the orange link state is turned on when the cable is unconnected).

@shuffle2
Copy link
Contributor

shuffle2 commented May 6, 2020

There are multiple culprits (at least when using lattice toolchain and LFE5UM-45F(non-5G) Versa, which is what I've been doing) causing timing problems.

  1. There are a few registers in the IP path which are initialized to a non-zero value. In diamond you'll notice it complain that these registers cannot be packed efficiently into slices. This applies to some of the checksum modules, as well as some counter values which count down instead of up (should be easy to invert).
  2. LiteEthIPV4Checksum results in the longest critical path. As a hack I've locally just replaced this module with a nop that always has done = 1 and value = 0.
  3. I noticed trellis has poor results. Maybe try diamond to compare/debug.

On a final note, both PHY status indicators of the Ethernet interface on the FPGA turn off as soon as I connect the network cable (the orange link state is turned on when the cable is unconnected).

This is normal default behavior on my board as well. The link seems fine, perhaps Lattice have designed the board in a strange way (it wouldn't be the first part... :) ). Indeed the versa user guide schematic around PHY1_LED seems a bit strange...
See also this pdf for more info:
marvell-phys-transceivers-alaska-88e151x-datasheet-2018-02.pdf

@shuffle2
Copy link
Contributor

shuffle2 commented May 9, 2020

Another hacky way I'm using to make timing closure easier is to disable icmp (with_icmp=False to LiteEthUDPIPCore).

@enjoy-digital
Copy link
Owner

The examples has been removed. The current issue is already tracked here: litex-hub/litex-boards#40, we'll try to improve this soon.

@enjoy-digital enjoy-digital changed the title UDP loopback example doesn't meet timing closure ECP5 / RGMII doesn't meet timing closure Nov 23, 2020
@rowanG077
Copy link
Contributor

@enjoy-digital This can be closed. With the 32-bit + buffered CDC there are no timing issues anymore.

@ozel
Copy link

ozel commented Oct 30, 2023

I still saw this issue with $glbnet$eth_clocks_rx$TRELLIS_IO_IN failing well below 100 MHz after all those improvements and even on a Butterstick board using ECP5 with speed grade 8.
It took some time until I realised that the 'data_width' of the etherbone module is by default still 8 instead of 32. ECP5 examples in 'bench' should probably be updated unless there are other side effects of using a wider Ethernet/Etherbone data_width by default.

anyway, thank you all for fixing this!

@rowanG077
Copy link
Contributor

@ozel Which version and config are you using? I get pretty stable timing closure with the current liteeth with this config:

phy: LiteEthECP5PHYRGMII
phy_tx_delay: 0e-9
phy_rx_delay: 2e-9
device: LFE5U-25F-6BG256C
vendor: lattice
toolchain: trellis
# Core -------------------------------------------------------------------------
clk_freq: 125e6
core: udp

mac_address: 0x10e2d5000000
ip_address: 172.30.0.1

tx_cdc_depth: 16
tx_cdc_buffered: True
rx_cdc_depth: 16
rx_cdc_buffered: True

udp_ports:
  raw:
    data_width: 32
    mode: raw

The CDC parameters are important to get better timing

@ozel
Copy link

ozel commented Oct 30, 2023

@rowanG077 I meant the test bench folder projects, for example https://github.com/enjoy-digital/liteeth/blob/master/bench/butterstick.py

@rowanG077
Copy link
Contributor

rowanG077 commented Oct 30, 2023

I see. Configuration on that level is not supported for the benchcore it seems. You could copy paste the add_etherbone method defined in: https://github.com/enjoy-digital/litex/blob/master/litex/soc/integration/soc.py#L1766

And change the config there so it passes timing.

@ozel
Copy link

ozel commented Oct 30, 2023

32-bit data width can be changed from the 8-bit default, this works:

self.add_etherbone(phy=self.ethphy, buffer_depth=255, data_width=32)

I just wanted to highlight that current ECP5 test bench examples still report timing violations unless modified. There might be good reasons for that of course, but newcomers to LiteX might wonder what's going on...

@TheZoq2
Copy link

TheZoq2 commented Jul 3, 2024

@ozel Which version and config are you using? I get pretty stable timing closure with the current liteeth with this config:

phy: LiteEthECP5PHYRGMII
phy_tx_delay: 0e-9
phy_rx_delay: 2e-9
device: LFE5U-25F-6BG256C
vendor: lattice
toolchain: trellis
# Core -------------------------------------------------------------------------
clk_freq: 125e6
core: udp

mac_address: 0x10e2d5000000
ip_address: 172.30.0.1

tx_cdc_depth: 16
tx_cdc_buffered: True
rx_cdc_depth: 16
rx_cdc_buffered: True

udp_ports:
  raw:
    data_width: 32
    mode: raw

The CDC parameters are important to get better timing

I'm having trouble reaching 125 MHz even with this config on a butterstick (LFE5UM5G-85F), best I can get to is about 104 MHz

@rowanG077
Copy link
Contributor

rowanG077 commented Jul 3, 2024

Are you sure you have the most recent liteeth? What is your system clock frequency? I'm using liteeth on a ECP5 part that is the lowest speed grade. The ECP5 on the butterstuck has the highest speed grade. It should have no trouble reaching 125Mhz. I just looked at just the logic timing between the two. Your part is almost twice faster. Are you using 125Mhz for your system clock?

@TheZoq2
Copy link

TheZoq2 commented Jul 3, 2024

I'm on liteeth downloaded today, so that should be no problem, but I probably misunderstood the clocking. Since I set core_freq to 125 MHz I figured I should drive sys_clk at 125 MHz, but I guess that's not the case?

@rowanG077
Copy link
Contributor

rowanG077 commented Jul 3, 2024

sys_clk can be any clock you want. What is import is that you use the RGMII clock for the rx clock. I run liteeth on 50Mhz. That's the reason why you go to a higher bitwidth. You process multiple bytes in a single cycle so you can lower your clock frequency.

@TheZoq2
Copy link

TheZoq2 commented Jul 3, 2024

Excellent, thanks for the quick reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants