Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed flow_alveo.sh environment issue #87

Merged
merged 3 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,15 +168,15 @@ $ make bitgen
in the original build-directory as described before.

## Deploying on the ETHZ HACC-cluster
The ETHZ HACC is a premiere cluster for research in systems, architecture and applications (https://github.com/fpgasystems/hacc/tree/main). Its hardware equipment provides the ideal environment to run Coyote-based experiments, since users can book up to 10 servers with U55C-accelerator cards connected via a fully switched 100G-network. User accounts for this platform can be obtained following the explanation on the previously cited homepage.
The ETHZ HACC is a premiere cluster for research in systems, architecture, and applications (https://github.com/fpgasystems/hacc/tree/main). Its hardware equipment provides the ideal environment to run Coyote-based experiments, since users can book up to 10 servers with U55C-accelerator cards connected via a fully switched 100G-network. User accounts for this platform can be obtained following the explanation on the previously cited homepage.

The interaction with the HACC-cluster can be simplified by using the sgutil-run time commands. They also allow to easily program the accelerator with a Coyote-bitstreamd and insert the driver. For this purpose, the script `program_coyote.sh` has been generated. Under the assumption that the hardware-project has been created in `examples_hw/build` and the driver is already compiled in `driver`, the workflow should look like this:
The interaction with the HACC-cluster can be simplified by using the sgutil-run time commands. They also allow to easily program the accelerator with a Coyote-bitstreamd and insert the driver. For this purpose, the scripts `util/program_hacc_local.sh` and `util/program_hacc_remote.sh` have been created. Under the assumption that the hardware-project has been created in `examples_hw/build` and the driver is already compiled in `driver`, the workflow should look like this:

~~~
$ bash program_coyote.sh examples_hw/build/bitstreams/cyt_top.bit driver/coyote_drv.ko
$ bash util/program_hacc_local.sh examples_hw/build/bitstreams/cyt_top.bit driver/coyote_drv.ko
~~~

Obviously, the paths to `cyt_top.bit` and `coyote_drv.ko` need to be adapted if a different build-structure has been chosen before.
The paths to `cyt_top.bit` and `coyote_drv.ko` need to be adapted if a different build-structure has been chosen before.
A successful completion of this process can be checked via a call to

~~~
Expand All @@ -185,6 +185,8 @@ $ dmesg

If the driver insertion went through, the last printed message should be `probe returning 0`. Furthermore, the dmesg-printout should contain a line `set network ip XXXXXXXX, mac YYYYYYYYYYYY`, which displays IP and MAC of the Coyote-NIC if networking has been enabled in the system configuration.

To program Coyote to a remote server, `util/program_hacc_remote.sh` may be used in the same way. Additionally, that script will ask for a list of server ids (e.g., `3, 5`).

## Publication

#### If you use Coyote, cite us :
Expand Down
33 changes: 0 additions & 33 deletions util/find.py

This file was deleted.

12 changes: 12 additions & 0 deletions util/insmod_local.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Script to set up IP and MAC address environment variables and insert the driver because environment is not initialized when executing through ssh from a remote server (environment is only initialized when MOTD is shown).

CLI_PATH=/opt/sgrt/cli

IP_address=$($CLI_PATH/sgutil get network -d 1 | awk '$1 == "1:" {print $2}')
MAC_address=$($CLI_PATH/sgutil get network -d 1 | awk '$1 == "1:" {print $3}' | tr -d '()')
qsfp_ip=$($CLI_PATH/common/address_to_hex IP $IP_address)
qsfp_mac=$($CLI_PATH/common/address_to_hex MAC $MAC_address)
echo "** IP_ADDRESS: $qsfp_ip"
echo "** MAC_ADDRESS: $qsfp_mac"

sudo insmod coyote_drv.ko ip_addr=$qsfp_ip mac_addr=$qsfp_mac
File renamed without changes.
40 changes: 23 additions & 17 deletions util/flow_alveo.sh → util/program_hacc_remote.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,32 @@
## Args
##

if [ "$1" == "-h" ]; then
echo "Usage: $0 <bitstream_path_within_base> <driver_path_within_base> <qsfp_port>" >&2
if [ "$1" == "-h" ] || [ $# -eq 0 ]; then
echo "Usage: $0 <bitstream_path> [<driver_path> [<qsfp_port>]]" >&2
exit 0
fi

BASE_PATH=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )/..

PROGRAM_FPGA=1
DRV_INSERT=1

BIT_PATH="${1%%.bit}" # Strip .bit
DRV_PATH=driver

if [ ! -f ${BIT_PATH}.bit ]; then
echo "Bitstream ${BIT_PATH}.bit does not exist."
exit 1
fi

if ! [ -x "$(command -v vivado)" ]; then
echo "Vivado does NOT exist in the system."
exit 1
fi

BASE_PATH=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

PROGRAM_FPGA=1
DRV_INSERT=1

BIT_PATH=$1
DRV_PATH=$2
if [ -n "$2" ]; then
DRV_PATH=$2
fi

if [ -z "$3" ]; then
QSFP_PORT=0
Expand All @@ -32,7 +41,7 @@ fi
## Server IDs (u55c)
##

echo "*** Enter server IDs:"
echo "*** Enter U55C server IDs [1,10] (e.g., <3, 5>):"
read -a SERVID

BOARDSN=(XFL1QOQ1ATTYA XFL1O5FZSJEIA XFL1QGKZZ0HVA XFL11JYUKD4IA XFL1EN2C02C0A XFL1NMVTYXR4A XFL1WI3AMW4IA XFL1ELZXN2EGA XFL1W5OWZCXXA XFL1H2WA3T53A)
Expand All @@ -53,7 +62,7 @@ alveo_program()
BOARDSN=$3
DEVICENAME=$4
BITPATH=$5
vivado -nolog -nojournal -mode batch -source program_alveo.tcl -tclargs $SERVERADDR $SERVERPORT $BOARDSN $DEVICENAME $BITPATH
vivado -nolog -nojournal -mode batch -source util/program_alveo.tcl -tclargs $SERVERADDR $SERVERPORT $BOARDSN $DEVICENAME $BITPATH
}

if [ $PROGRAM_FPGA -eq 1 ]; then
Expand All @@ -72,7 +81,7 @@ if [ $PROGRAM_FPGA -eq 1 ]; then
echo " ** "
for servid in "${SERVID[@]}"; do
boardidx=$(expr $servid - 1)
alveo_program alveo-u55c-$(printf "%02d" $servid) 3121 ${BOARDSN[boardidx]} xcu280_u55c_0 $BASE_PATH/../$BIT_PATH &
alveo_program alveo-u55c-$(printf "%02d" $servid) 3121 ${BOARDSN[boardidx]} xcu280_u55c_0 $BASE_PATH/$BIT_PATH &
done
wait

Expand All @@ -97,14 +106,11 @@ if [ $DRV_INSERT -eq 1 ]; then

echo "*** Compiling the driver ..."
echo " ** "
parallel-ssh -H "$hostlist" "make -C $BASE_PATH/../$DRV_PATH"
parallel-ssh -H "$hostlist" "make -C $BASE_PATH/$DRV_PATH"

echo "*** Loading the driver ..."
echo " ** "
qsfp_ip="DEVICE_1_IP_ADDRESS_HEX_$QSFP_PORT"
qsfp_mac="DEVICE_1_MAC_ADDRESS_$QSFP_PORT"

parallel-ssh -H "$hostlist" -x '-tt' "sudo insmod $BASE_PATH/../$DRV_PATH/coyote_drv.ko ip_addr=\$$qsfp_ip mac_addr=\$$qsfp_mac"
parallel-ssh -H "$hostlist" -x '-tt' "cd $BASE_PATH/$DRV_PATH && ../util/insmod_local.sh"

echo "*** Driver loaded"
echo " ** "
Expand Down
Loading