diff --git a/Connecting_to_AWS_lab.md b/Connecting_to_AWS_lab.md index 0f399b6..ad4a234 100644 --- a/Connecting_to_AWS_lab.md +++ b/Connecting_to_AWS_lab.md @@ -14,128 +14,147 @@ After completing this lab, you will be able to: - Close the RDP session - Stop the instance -## Step 1: Login into the AWS and starting an F1 instance +## Steps +Each registered participant has been allocated a pre-configured EC2 F1 instance and should have received an email with the following details: -#### Each registered participant has been allocated a preconfigured EC2 F1 instance and should have received an email with the following details: - Account ID, - IAM username, - Link to access a preconfigured EC2 F1 instance -### 1.1. Open a web browser and login into AWS EC2 F1 instance using the provided credentials -**1.1.1.** Start a web browser session - -**1.1.2.** Either click on the provided link to open up an AWS login page OR enter [https://console.aws.amazon.com/ec2](https://console.aws.amazon.com/ec2) to open a login page - -If you had used the link then you should see a login page similar to shown here: - -![alt tag](./images/connecting_lab/FigConnectingLab-1.png) -#### Figure 1. Login page accessed through the provided link - -If you had not used the link you may be directed to the AWS standard login page - -![alt tag](./images/connecting_lab/FigConnectingLab-2.png) -#### Figure 2. Entering credentials manually - -Enter *xilinx-aws-f1-developer-labs* in the Email address field and click **Next** to see the login page similar to shown in **Figure 1** - -**1.1.3.** Enter _userxx_ in the **IAM user name** field and enter the provided password in the **Password** field - -**1.1.4.** Click **Sign In** -### 1.2. Make sure to select N. Virginia (or instructor indicated region) as the region and start the instance -**1.2.1.** In the top right corner, using the drop-down button, select a region with F1 instances, such as **N. Virginia** (**US East)** -![alt tag](./images/connecting_lab/FigConnectingLab-3.png) -#### Figure 3. Selecting region - -If you select different region other then where the accounts are created for then you may not see your instance as well as the source files which are pre-loaded for the workshop. - -**1.2.2.** Click on the **EC2** link on the dashboard or if not visible, then click on the _Services_ drop-down button and then click on **EC2** - -![alt tag](./images/connecting_lab/FigConnectingLab-4-1.png) ![alt tag](./images/connecting_lab/FigConnectingLab-4-2.png) -#### Figure 4. Accessing EC2 service - -**1.2.3.** Click on the **Instances** link on the left panel - -![alt tag](./images/connecting_lab/FigConnectingLab-5.png) -#### Figure 5. Accessing Instances - -You may see several instances - -**1.2.4.** Enter your username in the filter field just below the **Launch Instance** button and hit enter -![alt tag](./images/connecting_lab/FigConnectingLab-6.png) -#### Figure 6. Filtering your instance - -**1.2.5.** Making sure that your instance is selected, click on the **Actions > Instance State > Start** -![alt tag](./images/connecting_lab/FigConnectingLab-7.png) -#### Figure 7. Starting an instance - -**1.2.6.** Click on the **Yes, Start** button - -**1.2.7.** Click on the refresh button(![alt tag](./images/Fig-refresh.png)) to see the updated status to _Running_ - -![alt tag](./images/connecting_lab/FigConnectingLab-8.png) -#### Figure 8. Running state - -**1.2.8.** Make a note of the Public DNS and IPv4 Public IP which will be used by PuTTy and Remote Desktop (RDP) -![alt tag](./images/connecting_lab/FigConnectingLab-9.png) -#### Figure 9. Assigned IP to the running instance - -## Step 2: Interacting with the Instance using RDP +### Login into the AWS and starting an F1 instance + +1. Start a web browser session +1. Either click on the provided link to open up an AWS login page OR enter [https://console.aws.amazon.com/ec2](https://console.aws.amazon.com/ec2) to open a login page + If you had used the link then you should see a login page similar to shown here: +

+ +

+

+ Login page accessed through the provided link +

+ If you had not used the link you may be directed to the AWS standard login page +

+ +

+

+ Entering credentials manually +

+ Enter xilinx-aws-f1-developer-labs in the Email address field and click Next to see the login page similar to shown in figure titled: Login page accessed through the provided link +1. Enter _userxx_ in the **IAM user name** field and enter the provided password in the **Password** field +1. Click **Sign In** +1. In the top right corner, using the drop-down button, select a region with F1 instances, such as **N. Virginia (US East)** or instructor indicated region +

+ +

+

+ Selecting a region +

+ If you select different region other then where the accounts are created for then you may not see your instance as well as the source files which are pre-loaded for the workshop. +1. Click on the **EC2** link on the dashboard or if not visible, then click on the _Services_ drop-down button and then click on **EC2** +

+ + +

+

+ Accessing EC2 service +

+1. Click on the **Instances** link on the left panel +

+ +

+

+ Accessing Instances +

+ You may see several instances +1. Enter your username in the filter field just below the **Launch Instance** button and hit enter +

+ +

+

+ Filtering your instance +

+1. Making sure that your instance is selected, click on the **Actions > Instance State > Start** +

+ +

+

+ Starting an instance +

+1. Click on the **Yes, Start** button +1. Click on the refresh button(![alt tag](./images/Fig-refresh.png)) to see the updated status to _Running_ +

+ +

+

+ Running state +

+1. Make a note of the Public DNS and IPv4 Public IP which will be used by PuTTy and Remote Desktop (RDP) +

+ +

+

+ Assigned IP to the running instance +

+### Interacting with the Instance using RDP **You can communicate with the instance using command line through PuTTY or Git Bash, and using GUI through remote desktop (RDP) connection.** -**2.1. Start a remote desktop session** - -**2.1.1.** Start the remote desktop session - -**2.1.2.** Enter the _IPv4_ address - -**2.1.3.** Click on the **Show Options** - -![alt tag](./images/connecting_lab/FigConnectingLab-10.png) -#### Figure 10. Entering the IPv4 address - -**2.1.4.** Select the **Display** tab and select _True Color (24 bit)_ and click **Connect** -![alt tag](./images/connecting_lab/FigConnectingLab-11.png) -#### Figure 11. Selecting resolution and connecting - -**2.1.5.** A certificate warning will be displayed. Click **Yes** to open the RDP session - -**2.1.6.** Enter centos as the username and enter the provided password and click **OK** - -![alt tag](./images/connecting_lab/FigConnectingLab-12.png) -#### Figure 12. Entering username and password - -**2.1.7.** Right-click on the desktop and select **Open Terminal** to open a window - -**2.1.8.** You should enter the following commands in any newly opened terminal window to source the environments - +1. Start a remote desktop session +1. Enter the _IPv4_ address +1. Click on the **Show Options** +

+ +

+

+ Entering the IPv4 address +

+1. Select the **Display** tab and select _True Color (24 bit)_ and click **Connect** +

+ +

+

+ Selecting resolution and connecting +

+1. A certificate warning will be displayed. Click **Yes** to open the RDP session +1. Enter centos as the username and enter the provided password and click **OK** +

+ +

+

+ Entering username and password +

+1. Right-click on the desktop and select **Open Terminal** to open a window +1. You should enter the following commands in any newly opened terminal window to source the environments ``` cd ~/aws-fpga source sdaccel_setup.sh source $XILINX_SDX/settings64.sh ``` -## Step 3: Stopping the Instance and Signing Out - -**It is important to shut down the instance in order to stop billing meter** +### Stopping the Instance and Signing Out -### You do not need to execute this command in this lab as you will continue the session for the supsequent labs - -### 3.1. Shut down the RDP instance from the terminal window. - -**3.1.1.** Type the following command to terminate the RDP session and shutdown the instance +**It is important to shut down the instance in order to stop billing meter. You do not need to execute this command in this lab as you will continue the session for the subsequent labs** +1. Type the following command to terminate the RDP session and shutdown the instance ``` sudo shutdown now ``` - -**3.1.5.** Check the browser window, you will see status as either **Stopping** or **Stopped.** Click on the refresh button to see the status update -![alt tag](./images/connecting_lab/FigConnectingLab-13-1.png) ![alt tag](./images/connecting_lab/FigConnectingLab-13-2.png) -#### Figure 13. Instance Status - -**3.1.6.** Once the instance is stopped, sign out by clicking on the drop-down button on the top bar and selecting **Sign Out** -![alt tag](./images/connecting_lab/FigConnectingLab-14.png) -#### Figure 14. Signing out +1. Check the browser window, you will see status as either **Stopping** or **Stopped.** Click on the refresh button to see the status update +

+ + +

+

+ Instance Status +

+ +1. Once the instance is stopped, sign out by clicking on the drop-down button on the top bar and selecting **Sign Out** +

+ +

+

+ Signing out +

## Conclusion @@ -151,33 +170,32 @@ Start the next lab: 2. Makefile Flow ## Appendix: Interacting with the Instance using Putty -**A.1 Connect using PuTTY.** - -**A.1.1.** Start PuTTY program - -**A.1.2.** Enter _centos@<public\_dns\_entry>_ in the **Host Name** field and **22** in the _Port_ field - +1. Start PuTTY program +1. Enter _centos@<public\_dns\_entry>_ in the **Host Name** field and **22** in the _Port_ field Make sure that SSH is selected as the Connection type -![alt tag](./images/connecting_lab/FigConnectingLab-15.png) -#### Figure 15. Session settings in PuTTY - -**A.1.3.** Expand **SSH** under the _Connection_ in the left panel and click **Auth** - -**A.1.4.** Click on the **Browse…** button, browse to where the private key has been stored - +

+ +

+

+ Session settings in PuTTY +

+1. Expand **SSH** under the _Connection_ in the left panel and click **Auth** +1. Click on the **Browse…** button, browse to where the private key has been stored If you don't have the private key file (as in workshop) you can skip this step - -**A.1.5.** Click **Open** - -![alt tag](./images/connecting_lab/FigConnectingLab-16.png) -#### Figure 16. Selecting private key file - -**A.1.6.** Click **Yes** - +1. Click **Open** +

+ +

+

+ Selecting private key file +

+1. Click **Yes** The PuTTY window will open. It will ask for the password (in case of the workshop). Enter the provided password -![alt tag](./images/connecting_lab/FigConnectingLab-17.png) -#### Figure 17. The PuTTY window showing the connection - -**A.1.7.** Set password for the RDP connection by entering sudo passwd <your choice of password> command. You will use the same password in the RDP connection. - -**A.1.8.** Enter **exit** to close the session +

+ +

+

+ The PuTTY window showing the connection +

+1. Set password for the RDP connection by entering sudo passwd <your choice of password> command. You will use the same password in the RDP connection. +1. Enter **exit** to close the session diff --git a/Creating_AFI.md b/Creating_AFI.md new file mode 100644 index 0000000..0c808a4 --- /dev/null +++ b/Creating_AFI.md @@ -0,0 +1,58 @@ +# Creating AFI Image after Building Full System + +This document guides you through the steps involved in creating an AFI which can be run AWS EC2 F1 instance to verify the deign works in hardware. It assumes that a full system is built which consists of an host application and xclbin. + +### Create an Amazon FPGA Image (AFI) + +To execute the application on F1, the following files are needed: + +- Host application (exe) +- FPGA binary (xclbin) +- Amazon FPGA Image (awsxclbin) + +The xclbin and the host applications must already have been generated + +1. Create a **xclbin** directory under the directory using the following commands: + ``` + cd /home/centos/aws-fpga/ + mkdir xclbin + ``` +1. Copy the generated **xclbin** file and the host application into the created **xclbin** directory, using the following commands + ``` + cd xclbin + cp /home/centos/aws-fpga//*.xclbin . + cp /home/centos/aws-fpga//*.exe . + ``` +### Create an AFI by running the create\_sdaccel\_afi.sh script and wait for the completion of the AFI creation process +1. Enter the following command to generate the AFI: + ``` + $SDACCEL_DIR/tools/create_sdaccel_afi.sh –xclbin=binary_container_1.xclbin –s3_bucket= -s3_dcp_key= -s3_logs_key= + ``` +In the above command, replace binary\_container\_1.xclbin with an appropriate name if it is different; <bucket-name>, <dcp-folder-name>, and <logs-folder-name> with the names you had given when running CLI script. Learn more about setting up S3 buckets at [https://github.com/aws/aws-fpga/blob/master/SDAccel/docs/Setup_AWS_CLI_and_S3_Bucket.md](https://github.com/aws/aws-fpga/blob/master/SDAccel/docs/Setup_AWS_CLI_and_S3_Bucket.md) +In the workshop environment this was already done. + +The create\_sdaccel\_afi.sh script does the following: + +- Starts a background process to create the AFI +- Generates a \_afi\_id.txt which contains the FPGA Image Identifier (or AFI ID) and Global FPGA Image Identifier (or AGFI ID) of the generated AFIs +- Creates the \*.awsxclbin AWS FPGA binary file which will need to be read by the host application to determine which AFI should be loaded in the FPGA. +- Enter the following command to note the values of the AFI IDs by opening the *\_afi\_id.txt file. + ``` + cat *afi_id.txt + ``` +1. Enter the **describe-fpga-images** API command to check the status of the AFI generation process: + ``` + aws ec2 describe-fpga-images --fpga-image-ids + ``` +Note: When AFI creation completes successfully, the output should contain: + ``` + ... + "State": { + "Code": "available" + }, + + ... + ``` + +Wait until the AFI becomes available before proceeding to execute on the F1 instance. + diff --git a/GUI_Flow_lab.md b/GUI_Flow_lab.md index 701ce92..6ba5367 100644 --- a/GUI_Flow_lab.md +++ b/GUI_Flow_lab.md @@ -2,230 +2,333 @@ ## Introduction -This lab guides you through the steps involved in using a GUI flow to create an SDAccel project. After creating a project you will run CPU and hardware emulations to verify the functionality. You will then use an AWS F1 instance to validate the design. +This lab guides you through the steps involved in using a GUI flow to create an SDAccel project. After creating a project you will run SW and hardware emulations to verify the functionality. You will then use an AWS F1 instance to validate the design, and perform profile and application timeline analysis. ## Objectives After completing this lab, you will be able to: - Create an SDAccel project through GUI flow -- Run CPU Emulation to verify the functionality of a design using a GUI flow +- Run SW Emulation to verify the functionality of a design using a GUI flow - Run HW Emulation to verify the functionality of a design using a GUI flow - Verify functionality in hardware on an AWS F1 instance +- Build system for hardware execution, and perform profile and application timeline analysis on the AWS F1 instance -## Procedure - -This lab is separated into steps that consist of general overview statements that provide information on the detailed instructions that follow. Follow these detailed instructions to progress through the lab. - -This lab comprises four primary steps: You will create an SDAccel project using one of the standard application templates. You will perform CPU emulation to validate application then perform HW emulation to see how much acceleration is possible. Next you will download the bitstream on F1 and validate application execution. The Appendix section lists steps involved in building the full hardware. - -## Step 1: Create an SDAccel Project -### 1.1. Source the SDAccel settings and create a directory called GUI\_flow under _~/aws-fpga_. Change the directory to the newly created directory. -**1.1.1.** Execute the following commands in a terminal window to source the required Xilinx tools: +## Steps +### Create an SDAccel Project +1. Execute the following commands, if it is not already done, in a terminal window to source the required Xilinx tools: ``` cd ~/aws-fpga source sdaccel_setup.sh source $XILINX_SDX/settings64.sh ``` -**1.1.2.** Execute the following commands to create a working directory: - +1. Execute the following commands to create a working directory: ``` mkdir GUI_flow cd GUI_flow ``` - -### 1.2. Launch SDx, create a workspace and create a project, called _GUI\_flow_, using the _Vector Addition_ template. -**1.2.1.** Launch SDAccel by executing **sdx** in the terminal window - -An Eclipse launcher widow will appear asking to select a directory as workspace - -**1.2.2.** Click on the **Browse…** button, browse to **/home/centos/aws-fpga/GUI\_flow**, click **OK** twice - -![alt tag](./images/guiflow_lab/FigGUIflowLab-1.png) -#### Figure 1. Selecting a workspace - -The Xilinx SDx IDE window will be displayed - -![alt tag](./images/FigSDXIDE.png) -#### Figure 2. The SDx IDE window - -**1.2.3.** Click on the **Add Custom Platform** link on the _Welcome_ page - -**1.2.4.** Click on the **Add Custom Platform** button, browse to **/home/centos/src/project_data/aws-fpga/SDAccel/aws\_platfom/xilinx\_aws-vu9p-f1\_dynamic\_5\_0** , and click **OK** - -![alt tag](./images/FigPlatform.png) -#### Figure 3. Hardware platform selected - -**1.2.5.** Click **Apply** and then click **OK** - -**1.2.6.** Click on the **Create SDx Project** link on the _Welcome_ page - -**1.2.7.** In the _New Project_'s page enter **gui\_flow\_example** in the _Project name:_ field and click **Next** - -Note the AWS-VU9P-F1 board is displayed as the hardware platform - -**1.2.8.** Click **Next** - -**1.2.9.** Click **Next** with Linux on x86 as the System Configuration and OpenCL as the Runtime options - -**1.2.10.** Select **Vector Addition** from the _Available Templates_ pane and click **Finish** - -![alt tag](./images/guiflow_lab/FigGUIflowLab-4.png) -#### Figure 4. Selecting an application template - -The project IDE will be displayed with six main windows: Project Explorer, Project Settings, Reports, Outline, multi-tab console, and Emulation Console. - -![alt tag](./images/guiflow_lab/FigGUIflowLab-5.png) -#### Figure 5. Project IDE - -## Step 2: Perform CPU Emulation -### **2.1.** Select the function(s) that needs to be accelerated. - -**2.1.1.** Click on the _Add Hardware Function_ button icon (![alt tag](./images/Fig-hw_button.png)) in the **Hardware Functions** tab to see the functions defined in the design - -**2.1.2.** Notice the _kml\_vadd_ function is the only function in the design and is already marked to be accelerated - -**2.1.3.** Make sure the **project.sdx** under _gui\_flow\_example_ in the **Project Explorer** tab is selected - -**2.1.4.** Either select **Project > Build Configurations > Set Active > Emulation-CPU** or click on the drop-down button of _Active build configuration_ and select **Emulation-CPU** - -![alt tag](./images/guiflow_lab/FigGUIflowLab-6.png) -#### Figure 6. Selecting CPU emulation build configuration - - -**2.1.5.** Either select **Project > Build Project** or click on the build (![alt tag](./images/Fig-build.png)) button - -This will build the project including gui\_flow\_example.exe file under the Emulation-CPU directory - -**2.1.6.** Run the application by clicking the Run button (![alt tag](./images/Fig-run.png)) - -The application will be run and the output will be displayed in the Console tab - -![alt tag](./images/guiflow_lab/FigGUIflowLab-7.png) -#### Figure 7. CPU Emulation run output - -## Step 3: Perform HW Emulation - -#### The SW Emulation flow checks functional correctness of the software application, but it does not guarantee the correctness of the design on the FPGA target. The Hardware Emulation flow can be used to verify the functionality of the generated logic. This flow invokes the hardware simulator in the SDAccel environment. As a consequence, the Hardware Emulation flow will take longer to run than the SW Emulation flow. - -#### The HW Emulation flow is not cycle accurate, but provides more detailed profiling information than software emulation and can be used to do some analysis and optimization of the performance of the application. - -### 3.1. Select the Emulation-HW build configuration and build the project. -**3.1.1.** Either select **Project > Build Configurations > Set Active > Emulation-HW** or click on the drop-down button of _Active build configuration_ and select **Emulation-HW** - -![alt tag](./images/guiflow_lab/FigGUIflowLab-8.png) -#### Figure 8. Selecting HW emulation build configuration - -**3.1.2.** Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button - -This will build the project including gui\_flow\_example.exe file under the Emulation-HW directory - -**3.1.3.** Select **Run > Run Configurations…** to open the configurations window - -**3.1.4.** Click on the **Arguments** tab and notice that the _binary\_container\_1.xclbin_ is already assigned - -If no argument was assigned then you would have to explicitly assign the **xclbin** by clicking on the _Automatically add binary container(s) to arguments_, and click **Apply** - -![alt tag](./images/guiflow_lab/FigGUIflowLab-9.png) -#### Figure 9. Populated Arguments tab - -**3.1.5.** Click **Run** to run the application - -**3.1.6.** The Console tab shows that the test was completed successfully along with the data transfer rate - -![alt tag](./images/guiflow_lab/FigGUIflowLab-10.png) -#### Figure 10. Hardware emulation run output - -### 3.2. Understand the HLS Report, profile summary, and Application Timeline. -**3.2.1.** Double-click on the **HLS Report** entry under _Emulation-HW > binary\_container\_1 > krnl\_vadd_ in the _Reports_ view to open the report - -![alt tag](./images/guiflow_lab/FigGUIflowLab-11.png) -#### Figure 11. The Report view - -The multi-tab window will open showing the Synthesis report for krnl\_vadd accelerator. It includes the target device information - -![alt tag](./images/guiflow_lab/FigGUIflowLab-12.png) -#### Figure 12. Multi-tab HLS synthesis report for krnl\_vadd accelerator - -**3.2.2.** Scroll down the window and observe the timing, latency, and loop performance results. Observe that the target frequency is 250 MHz (4 ns period) and achieved period is 2.92 ns indicating that the timing has been met. - -![alt tag](./images/guiflow_lab/FigGUIflowLab-13.png) -#### Figure 13. Performance estimate results - -**3.2.3.** Scroll further down and observe the resource utilization by the accelerator - -![alt tag](./images/guiflow_lab/FigGUIflowLab-14.png) -#### Figure 14. Resource utilization - -**3.2.4.** Scrolling down further shows the Interface summary indicating various ports, width, protocol that they are part of, type of object, and parameter type they belong to. - -As can be seen, there are three interfaces being used: control, s\_axi, and m\_axi. The s\_axi is 32-bit wide data, control provides necessary handshaking signals, and m\_axi has 32-bit data. The m\_axi is connected to gmem, the global memory which is DDR. The DDR memory uses 64-bit address. - -![alt tag](./images/guiflow_lab/FigGUIflowLab-15-1.png) -![alt tag](./images/guiflow_lab/FigGUIflowLab-15-2.png) -![alt tag](./images/guiflow_lab/FigGUIflowLab-15-3.png) -#### Figure 15. Interface details showing ports, direction, size, protocol, object and data types - -### 3.3. Review the profile summary report -**3.3.1.** Double-click on the **Profile Summary** entry under _Emulation-HW > gui\_flow\_example-Default_ in the _Reports_ tab - -Notice a multi-tab report window is opened. It has four tabs: the Top Operations, Kernels and Compute Units, the Data Transfers, and the OpenCL APIs. The Top Operations tab shows the device being used, the number of transfers (2112), average bytes per transfer (5.818), and the transfer efficiency. It also shows the kernel (krnl\_vadd) being used, the location of the kernel, beside the context ID. - -![alt tag](./images/guiflow_lab/FigGUIflowLab-16.png) -#### Figure 16. Top operation information in the profile summary - -**3.3.2.** Click on the **Kernels & Compute Units** tab and observe the number of Enqueues (1), the Global work size (1:1:1), the Local work size (1:1:1) and the number of calls to the kernel (1) - -![alt tag](./images/guiflow_lab/FigGUIflowLab-17.png) -#### Figure 17. Kernel and computer unit information in the profile summary - -**3.3.3.** Click on the **Data Transfers** tab and observe the number of read (1, result read), the number of write (2, two source operands being written), and the average size (4096 KB) between the host and memory. It also shows the number of read (2048, read data), the number write (64, write data), the transfer rates, and the average time. - -![alt tag](./images/guiflow_lab/FigGUIflowLab-18.png) -#### Figure 18. Data transfer information in the profile summary - -### 3.4. Review the Application Timeline report -**3.4.1.** Double-click on the **Application Timeline** entry in the _Reports_ tab, expand all entries in the timeline graph, zoom appropriately and observe the transactions. You will see when the kernel is running, when the write transaction takes place between host and global memory, when the read transactions are taking place between global memory and kernel memory, when the write transactions are taking place between the kernel and global memory, and when the read transaction is taking place between the global memory and host. - -![alt tag](./images/guiflow_lab/FigGUIflowLab-19.png) -#### Figure 19. Timeline graph showing various activities in various region of the system - -### 3.5. Review the System Estimate report. -**3.5.1.** Double-click on the **System Estimate** entry under the _Emulation-HW_ in the _Reports_ tab - -**3.5.2.** The report shows the estimated frequency and the resource utilization for the given kernel (krnl\_vadd) - -![alt tag](./images/guiflow_lab/FigGUIflowLab-20.png) -#### Figure 20. The system estimate report - -## Step 4: Run the Application on F1 -### 4.1. Since the System build and AFI availability takes considerable amount of time, a precompiled version is provided. Use the precompiled solution directory to verify the functionality. -**4.1.1.** Change to the solution directory by executing the following command - +1. Launch SDAccel by executing **sdx** in the terminal window +An Eclipse launcher window will appear asking to select a directory as workspace +1. Click on the **Browse…** button, browse to **/home/centos/aws-fpga/GUI\_flow**, click **OK** twice +

+ +

+

+ Selecting a workspace +

+ The Xilinx SDx IDE window will be displayed. Notice that the Welcome screen is gray-shaded as nothing has been done at this stage +

+ +

+

+ The SDx IDE window +

+1. Click on the **Add Custom Platform** link on the _Welcome_ page +1. Click on the **Add Custom Platform** button, browse to **/home/centos/aws-fpga/SDAccel/aws\_platfom/xilinx\_aws-vu9p-f1-04261818\_dynamic\_5\_0**, and click **OK** +

+ +

+

+ Hardware platform selected +

+1. Click **Apply** and then click **OK** + Notice that the Welcome screen is no longer gray-shaded since a platform is already defined +1. Click on the **Create SDx Project** link on the _Welcome_ page + The _Project Type_ page will be displayed +1. Select **Application** and click **Next** +1. In the _New Project_'s page enter **gui\_flow\_example** in the _Project name:_ field and click **Next** +Note the _aws-vu9p-f1-04261818_ under the board column is displayed +1. Click **Next** +1. Click **Next** with _Linux on x86_ as the System Configuration and _OpenCL_ as the Runtime options +1. Select **Vector Addition** from the _Templates_ pane and click **Finish** +

+ +

+

+ Selecting an application template +

+ The project IDE will be displayed with six main windows: Project Explorer, Project Settings, Assistant, Outline, multi-tab console, and Emulation Console. +

+ +

+

+ The Project IDE +

+ +### Perform SW Emulation +1. Click on the _Add Hardware Function_ button icon (![alt tag](./images/Fig-hw_button.png)) in the **Hardware Functions** tab to see functions defined in the design. Since there is only one function and it is already included, you won't see any listing +1. Notice the _kml\_vadd_ function is the only function in the design and is already marked to be accelerated +1. Make sure the **project.sdx** under _gui\_flow\_example_ in the **Project Explorer** tab is selected +1. Either select **Project > Build Configurations > Set Active > Emulation-SW** or click on the drop-down button of _Active build configuration_ and select **Emulation-SW** +

+ +

+

+ Selecting SW Emulation build configuration +

+1. Either select **Project > Build Project** or click on the build (![alt tag](./images/Fig-build.png)) button + This will build the project including gui\_flow\_example.exe file under the Emulation-SW directory +1. Run the application by clicking the Run button (![alt tag](./images/Fig-run.png)). + The application will be run and the output will be displayed in the Console tab +

+ +

+

+ SW Emulation run output +

+ +### Perform HW Emulation +**The SW Emulation flow checks functional correctness of the software application, but it does not guarantee the correctness of the design on the FPGA target. The Hardware (HW) Emulation flow can be used to verify the functionality of the generated logic. This flow invokes the hardware simulator in the SDAccel environment. As a consequence, the HW Emulation flow will take longer to run than the SW Emulation flow.** + +**The HW Emulation flow is not cycle accurate, but provides more detailed profiling information than software emulation and can be used to do some analysis and optimization of the performance of the application.** + +1. Either select **Project > Build Configurations > Set Active > Emulation-HW** or click on the drop-down button of _Active build configuration_ and select **Emulation-HW** +

+ +

+

+ Selecting HW Emulation build configuration +

+1. Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button + This will build the project including gui\_flow\_example.exe file under the Emulation-HW directory +1. Select **Run > Run Configurations…** to open the configurations window +1. Click on the **Environemnt** tab and change the LD_LIBRARY_PATH to **/opt/xilinx/xrt/lib**, leaving the other two variables to the default settings. This is required to see activities on the Kernel side using Xilinx Run Time library (xrt) +

+ +

+

+ Editing Environment tab +

+1. Click on the **Arguments** tab and notice that the _binary\_container\_1.xclbin_ is already assigned + If no argument was assigned then you would have to explicitly assign the **xclbin** by clicking on the _Automatically add binary container(s) to arguments_ +1. Click **Apply** +

+ +

+

+ Populated Arguments tab +

+1. Click **Run** to run the application +1. The Console tab shows that the test was completed successfully along with the data transfer rate +

+ +

+

+ Hardware Emulation run output +

+ +### Review the HLS Report +1. Double-click on the **HLS Report** entry under **Emulation-HW > binary\_container\_1 > krnl\_vadd** in the **Assistant** tab to open the report +

+ +

+

+ The Assistant tab content +

+ The window will open showing the Synthesis report for the **krnl\_vadd** accelerator. It includes the target device information +

+ +

+

+ HLS synthesis report for the krnl_vadd accelerator +

+1. Scroll down the window and observe the timing, latency, and loop performance results. Observe that the target frequency is 250 MHz (4 ns period) and achieved period is 2.92 ns indicating that the timing has been met. +

+ +

+

+ Performance estimate results +

+1. Scroll down further and observe the resource utilization by the accelerator +

+ +

+

+ Resource utilization +

+1. Scrolling down further shows the Interface summary indicating various ports, width, protocol that they are part of, type of object, and parameter type they belong to. + As can be seen, there are three interfaces being used: control, s\_axi, and m\_axi. The s\_axi is 32-bit wide data, control provides necessary handshaking signals, and m\_axi has 32-bit data. The m\_axi is connected to gmem, the global memory which is DDR. The DDR memory uses 64-bit address. +

+ + + +

+

+ Interface details showing ports, direction, size, protocol, object and data types +

+ +### Review the profile summary report +1. Double-click on the **Profile Summary** entry under **Emulation-HW > gui\_flow\_example-Default** in the **Assistant** tab + Notice a multi-tab report window is opened. It has four tabs: the Top Operations, Kernels and Compute Units, the Data Transfers, and the OpenCL APIs. The Top Operations tab shows the device being used, the Global work size (1:1:1), and the Local work size (1:1:1). It also shows the kernel (krnl\_vadd) being used, the location of the kernel, beside the context ID. +

+ +

+

+ Top operation information in the profile summary +

+1. Click on the **Kernels & Compute Units** tab and observe the number of Enqueues (1), and the execution time (0.051 ms) +

+ +

+

+ Kernel and compute unit information in the profile summary +

+1. Click on the **Data Transfers** tab and observe the number of read (1, result read), the number of write (2, two source operands being written), and the average size (16.384 KB) between the host and memory. +

+ +

+

+ Data transfer information in the profile summary +

+ +### Review the Application Timeline report +1. Double-click on the **Application Timeline** entry in the **Assistant** tab, expand all entries in the timeline graph, zoom appropriately and observe the transactions. You will see when the kernel is running, when the write transaction takes place between host and global memory, when the read transactions are taking place between global memory and kernel memory, when the write transactions are taking place between the kernel and global memory, and when the read transaction is taking place between the global memory and host. +

+ +

+

+ Timeline graph showing various activities in various region of the system +

+ +### Review the System Estimate report +1. Double-click on the **System Estimate** entry under the **Emulation-HW** in the **Assistant** tab +1. The report shows the estimated frequency and the resource utilization for the given kernel (krnl\_vadd) +

+ +

+

+ The system estimate report +

+ +### Setup for System Build +**Since hardware bitstream generation takes over two hours, you will go through basic steps involved in setting up System build in this section so you can perform profiling and application timeline analysis on AWS using the already pre-generated awsxclbin.** + +1. Either select **Project > Build Configurations > Set Active > System** or click on the drop-down button of _Active build configuration_ and select **System** +1. Click on the drop-down button of Hardware optimization and select -Oquick option which will make compilation relatively faster +

+ +

+

+ Selecting System build configuration and setting compilation option +

+1. In the **Assistant** tab, expand **gui_flow_example > System > binary_container_1 > krnl_vadd**, right-click and select **Settings...** +1. In the **Hardware Function Settings** window, select **Counters + Trace** using _Data Transfer_ drop-down button, click the **Execute Profiling** and **Stall Profiling** options, click **Apply**, and click **OK** +

+ +

+

+ Selecting System build configuration and setting compilation option +

+1. Normally, you would build the project, but since it will take long time **you will NOT BUILD it here** + +### Run the Application on F1 +**Since the System build and AFI availability takes considerable amount of time, a precompiled version is provided. Use the precompiled solution directory to verify the functionality.** + +1. Change to the solution directory by executing the following command ``` cd /home/centos/sources/gui_flow_solution - ``` - -**4.1.2.** Run the following commands to load the AFI and execute the application to verify the functionality - + ``` +1. Run the following commands to load the runtime environment and execute the application ``` sudo sh - source /opt/Xilinx/SDx/2017.4.rte.dyn/setup.sh + source /opt/xilinx/xrt/setup.sh ./gui_flow_example.exe xclbin/binary_container_1.awsxclbin ``` -**4.1.3.** The FPGA bitstream will be downloaded and the host application will be executed showing output something like: - -![alt tag](./images/guiflow_lab/FigGUIflowLab-21.png) -#### Figure 21. Execution output - -**4.1.4.** Enter **exit** in the teminal window to exit out of sudo shell. +1. The FPGA bitstream will be downloaded and the host application will be executed showing output something like: +

+ +

+

+ The Execution output +

+1. It will also create two csv files; one for profile and another application timeline analysis +1. Open another terminal window (_non-sh_), source the environment settings, and execute the following two commands to create \*.xprf (Xilinx profile), and timeline.wdb and timeline.wcfg files + ``` + cd ~/aws-fpga + source sdaccel_setup.sh + source $XILINX_SDX/settings64.sh + sdx_analyze profile --input sdaccel_profile_summary.csv -o profile + sdx_analyze trace --input sdaccel_timeline_trace.csv -o timeline + ``` -**4.1.5.** Close the SDx by selecting **File > Exit** +### Perform Profiling and Application Timeline Analysis on AWS F1 Using the Generate Files +**You will use the generated timeline and profile files to perform the analysis** + +1. Select **File > Open File...** +1. Browse to **/home/centos/sources/gui_flow_solution** and select **profile.xprf** and **timeline.wdb** and click **OK** +The _Waveform and Profile Summary tabs_ will open. In the Waveform tab, notice that the actual activities starts after 4,960 ms since the FPGA loading takes time +1. Run the application again from the command line of the **sh** terminal window and observe the output +It indicates the AFI is already loaded and so it is skipping the loading +

+ +

+

+ The Execution output of the second time running +

+1. Generate the profile and trace files from the _non-sh_ terminal window as done earlier +1. Close the Waveform and Profile Summary tabs and open the two files again +Notice that the activity starts at around 500 ms as no AFI loading took place. +1. Select the **Profile Summary** tab and observe the Total Data Transfer between Kernels and Global Memory, and Top Kernel Execution Duration +

+ +

+

+ Top operation information in the profile summary +

+1. Click on the **Kernels & Compute Units** tab and observe the number of Enqueues (1), and the execution time (0.289 ms). Also understand the **Compute Uni Utilization** and **Compute Units: Stall Information** section +

+ +

+

+ Kernel and compute unit information in the profile summary +

+1. Click on the **Data Transfers** tab and observe the data transfer rate between the host and memory, and the data transfer rate between kernels and global memory +

+ +

+

+ Data transfer information in the profile summary +

+ +1. Select the **Waveform** tab, expand all entries in the timeline graph and see various activities in each fucntional units of the design +

+ +

+

+ Timeline graph showing various activities in various region of the system +

+1. Using left-button mouse click, select region around 1.150 ms to 1.154 ms region to zoom in into the view +

+ +

+

+ Zoomed in view showing various activites +

## Conclusion -In this lab, you used SDAccel IDE to create a project using one of the application templates. You then ran the design using the software emulation and hardware emulation flows, and reviewed the reports. You also read through the steps to generate the AFI. Since the system build and AFI creation takes over two hours, you used the provided solution to download the application and kernel on the F1 instance and validated the functionality. +In this lab, you used SDAccel IDE to create a project using one of the application templates. You then ran the design using the software and hardware emulation flows, and reviewed the reports. You also read through the steps to generate the AFI. Since the system build and AFI creation takes over two hours, you used the provided solution to download the application and kernel on the F1 instance and validated the functionality. --------------------------------------- @@ -237,71 +340,52 @@ Start the next lab: 4. Optimization Lab ## Appendix: Build Full Hardware -### A.1. Set the build configuration to System and build the system (Note that since the building of the project takes over two hours skip this step in the workshop environment and move to next step). -**A.1.1.** Either select **Project > Build Configurations > Set Active > System** or click on the drop-down button of _Active build configuration_ and select **System** - -**A.1.2.** Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button - -This will build the project under the **System** directory. The built project will include gui\_flow\_example.exe file along with binary\_container\_1.xclbin file - -This step takes about two hours - -### A.2. Create an Amazon FPGA Image (AFI). - -To execute the application on F1, the following files are needed: - -- Host application (exe) -- FPGA binary (xclbin) -- Amazon FPGA Image (awsxclbin) +**Set the build configuration to System and build the system (Note that since the building of the project takes over two hours skip this step in the workshop environment).** +1. Either select **Project > Build Configurations > Set Active > System** or click on the drop-down button of _Active build configuration_ and select **System** +1. Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button +This will build the project under the **System** directory. The built project will include **gui\_flow\_example.exe** file along with **binary\_container\_1.xclbin** file. This step takes about two hours -The xclbin and the host applications are already generated by the System configuration step +**Once the full system is built, you can create an AFI by following the steps listed here** -**A.2.1.** Create a **xclbin** directory under the _GUI\_flow_ directory using the following commands: +## Appendix: Performing Application Timeline and Profile Analysis +**If you are using your own instance, follow below steps to perform the analysis after you have have build the full bitstream and created AFI** +1. Replace (after changing the extension) the **binary_container_1.xclbin** file with the registered **binary_container_1.awsxclbin** in the **System** directory +1. Before you can run the application in hardware on AWS F1, you will need to start the SDx program in the **su** mode +1. Run the following commands to setup the environment and start the SDx program ``` - cd /home/centos/aws-fpga/GUI_flow - mkdir xclbin - ``` -**A.2.2.** Copy the generated **xclbin** file ( **binary\_container\_1.xclbin** ) and the host application (gui\_kernel\_example.exe) from the **System** folder into the created **xclbin** directory, using the following commands - - ``` - cd xclbin - cp /home/centos/aws-fpga/GUI_flow/gui_flow_example/System/binary_container_1.xclbin . - cp /home/centos/aws-fpga/GUI_flow/gui_flow_example/System/gui_flow_example.exe . - ``` -### A.3. Create an AFI by running the create\_sdaccel\_afi.sh script and wait for the completion of the AFI creation process -**A.3.1.** Enter the following command to generate the AFI: - ``` - $SDACCEL_DIR/tools/create_sdaccel_afi.sh –xclbin=binary_container_1.xclbin –s3_bucket= -s3_dcp_key= -s3_logs_key= - ``` -In the above command, <bucket-name>, <dcp-folder-name>, and <logs-folder-name> are the names you would have given when running CLI script. In the workshop environment this was already done. - -The create\_sdaccel\_afi.sh script does the following: - -- Starts a background process to create the AFI -- Generates a \_afi\_id.txt which contains the FPGA Image Identifier (or AFI ID) and Global FPGA Image Identifier (or AGFI ID) of the generated AFIs -- Creates the \*.awsxclbin AWS FPGA binary file which will need to be read by the host application to determine which AFI should be loaded in the FPGA. - -**A.3.2.** Enter the following command to note the values of the AFI IDs by opening the *\_afi\_id.txt file. - - ``` - cat *afi_id.txt - ``` -**A.3.3.** Enter the **describe-fpga-images** API command to check the status of the AFI generation process: - - ``` - aws ec2 describe-fpga-images --fpga-image-ids - ``` - -Note: When AFI creation completes successfully, the output should contain: - - ``` - ... - "State": { - "Code": "available" - }, - - ... + sudo sh + source /opt/xilinx/xrt/setup.sh + /opt/Xilinx/SDx/2018.2.op2258646/bin/sdx& ``` -**A.3.4.** Wait until the AFI becomes available before proceeding to execute on the F1 instance. - +1. Click on the **Browse…** button of the _Workspace_ window, browse to **/home/centos/aws-fpga/GUI\_flow**, click **OK** +The _gui_flow_example_ project will open +1. In the **Assistant** tab, right-click on **System** and select **Run > Run Configurations** +1. In the _Main_ tab, select **System** as the _Run Configuration_ +

+ +

+

+ Selecting System as the active run configuration +

+1. In the _Profile_ tab, make sure that Enable Profiling option is selected +1. Click on the drop-down button of the **Generate timeline trace report:** field and select **Yes** +

+ +

+

+ Selecting to generate timeline trace report +

+1. Select **Internal Dataflow Stall** as an option of _Collect State Trace_ +

+ +

+

+ Selecting type of trace to collect +

+1. In the **Arguments** tab, make sure that **../binary_container_1.xclbin** is selected +1. In the **Environment** tab make sure that the **LD_LIBRARY_PATH** is set to **/opt/xilinx/xrt/lib** +1. Click **Apply** and **Run** to execute the application through GUI +1. Once, the execution is completed, double-click on the **Profile Summary** entry under **System > gui\_flow\_example-Default** in the **Assistant** tab + The profile report will open. +1. Double-click on the **Application Timeline** entry in the **Assistant** tab, expand all entries in the timeline graph and see various activities in each fucntional units of the design diff --git a/Makefile_Flow_lab.md b/Makefile_Flow_lab.md index 6de4fab..a9cab97 100644 --- a/Makefile_Flow_lab.md +++ b/Makefile_Flow_lab.md @@ -2,120 +2,109 @@ ## Introduction -This lab guides you through the steps involved in using a Makefile flow to build and perform CPU and hardware emulations to verify the functionality. You will then use an AWS F1 instance to validate the design. +This lab guides you through the steps involved in using a Makefile flow to build and perform software and hardware emulations to verify the functionality. You will then use an AWS F1 instance to validate the design. ## Objectives After completing this lab, you will be able to: -- Run CPU Emulation to verify the functionality of a design using a Makefile flow +- Run SW Emulation to verify the functionality of a design using a Makefile flow - Run HW Emulation to verify the functionality including kernel hardware using a Makefile flow - Build the full system and verify functionality in hardware on an AWS F1 instance -## Procedure - -This lab is separated into steps that consist of general overview statements that provide information on the detailed instructions that follow. Follow these detailed instructions to progress through the lab. - -This lab comprises three primary steps: You will source the environment settings, then you will build and run emulation flows using Makefile, you will next test the application on the F1 instance. The Appendix section lists steps involved in building a full system using Makefile flow including creating an Amazon FPGA Image (AFI). - -## Step 1: Source Environment Settings - -### 1.1. Open a Terminal window. Source the SDAccel settings and create a directory called Makefile\_flow under _~/aws-fpga_. Change the directory to the newly created directory. Copy the helloworld\_ocl directory from /home/centos/sources to ~/aws-fpga/Makefile\_flow directory. - -**1.1.1.** Right-click on the Centos desktop and select **Open Terminal** - -**1.1.2.** Execute the following commands in the terminal window to source the Xilinx tools +## Steps +### Source Environment Settings +1. Right-click on the Centos desktop and select **Open Terminal** +1. Execute the following commands in the terminal window to source the Xilinx tools ``` cd ~/aws-fpga source sdaccel_setup.sh source $XILINX_SDX/settings64.sh ``` - -**1.1.3.** Create a **Makefile\_flow** directory and change the working directory into it using the following commands: - +1. Create a **Makefile\_flow** directory and change the working directory into it using the following commands: ``` mkdir Makefile_flow cd Makefile_flow ``` - -**1.1.4.** Copy the provided helloworld\_ocl project directory into the current working directory using the following command: - +1. Copy the provided helloworld\_ocl project directory into the current working directory using the following command: ``` cp -r /home/centos/sources/helloworld_ocl/ . ``` -## Step 2: Build and Run Emulation Flows using Makefile Flow - -The SDAccel emulation flows allow testing, profiling and debugging of the application before deploying on F1. - -Software emulation allow functionality of the software application to be verified. +### Build and Run Emulation Flows using Makefile Flow +The SDAccel emulation flows allow testing, profiling and debugging of the application before deploying on F1. +Software emulation allow functionality of the software application to be verified. Hardware emulation allow the verification of the functionality of the generated logic generated for the FPGA and the application working together. -### 2.1. Using the command line, run a makefile flow to perform CPU Emulation. - -**2.1.1.** Execute the following commands in the terminal to build and run the *SW (CPU) emulation* flow for the SDAccel 'hello world' example: - - ``` +1. Execute the following commands in the terminal to build and run the *SW emulation* flow for the SDAccel 'hello world' example: + ``` cd helloworld_ocl make clean make check TARGETS=sw_emu DEVICES=$AWS_PLATFORM all - ``` - -The application will be compiled, the xclbin and the helloworld.exe files will be generated. The application will be executed on CPU in software emulation mode showing output like: - -![alt tag](./images/makefile_lab/FigMakefileLab-1.png) -#### Figure 1. Executing the application in software emulation mode - -**2.1.2.** Go to the _src_ folder and open the **host.cpp** file to see its content. Observe that the _DATA\_SIZE_ is defined as **256** (line 34), one operand (source\_a) is defined as constant **10** (line 46) and another operand (source\_b) as constant **32** (line 47), providing a result of 42 - -![alt tag](./images/makefile_lab/FigMakefileLab-2.png) -#### Figure 2. Program snippet - -**2.1.3.** Change the _DATA\_SIZE_ to **64** , _source\_a_ to **4** and _source\_b_ to **15,** save the file, and run the last two commands - -![alt tag](./images/makefile_lab/FigMakefileLab-3.png) -#### Figure 3. Output after modifying the source file - -**2.1.4.** Execute the following commands to build and run the *HW (hardware) emulation* flow for the 'hello world' example: - - ``` + ``` + The application will be compiled, the xclbin and the helloworld.exe files will be generated. The application will be executed on CPU in software emulation mode showing output like: +

+ +

+

+ Executing the application in software emulation mode +

+1. Go to the _src_ folder and open the **host.cpp** file to see its content. Observe that the _DATA\_SIZE_ is defined as **256** (line 34), one operand (source\_a) is defined as constant **10** (line 46) and another operand (source\_b) as constant **32** (line 47), providing a result of 42 +

+ +

+

+ Program snippet +

+1. Change the _DATA\_SIZE_ to **64**, _source\_a_ to **4**, and _source\_b_ to **15**; save the file and run the last two commands +

+ +

+

+ Output after modifying the source file +

+1. Execute the following commands to build and run the *HW (hardware) emulation* flow for the 'hello world' example: + ``` make clean make check TARGETS=hw_emu DEVICES=$AWS_PLATFORM all - ``` - -The kernel called vector\_add will be created by calling Vivado High-Level Synthesis (HLS) tool, which will try to pipeline the kernel and try to achieve initiation interval of 1. At the end of the HLS compilation an xo file is generated. - -![alt tag](./images/makefile_lab/FigMakefileLab-4.png) -#### Figure 4. HLS being used to compile the kernel - -The host application will then be compiled, the xclbin and the helloworld.exe files will be generated. The application will be executed on the CPU in the hardware emulation mode showing the output and transfer rate like: - -![alt tag](./images/makefile_lab/FigMakefileLab-5.png) -#### Figure 5. Execution output - -## Step 3: Run the Application on F1 -### 3.1. Since the System build and AFI availability takes considerable amount of time, a precompiled version is provided. Use the precompiled solution directory to verify the functionality. -**3.1.1.** Change to the solution directory by executing the following command - + ``` + The kernel called vector\_add will be created by calling Vivado High-Level Synthesis (HLS) tool, which will try to pipeline the kernel and try to achieve initiation interval of 1. At the end of the HLS compilation an xo file is generated. +

+ +

+

+ HLS being used to compile the kernel +

+ The host application will then be compiled, the xclbin and the helloworld.exe files will be generated. The application will be executed on the host CPU in the hardware emulation mode showing the output and transfer rate like: +

+ +

+

+ Execution output +

+### Run the Application on F1 +1. Change to the solution directory by executing the following command ``` cd /home/centos/sources/makefile_flow_solution ``` -**3.1.2.** Execute the following commands to load the AFI and execute the application to verify the functionality - +1. Execute the following commands to load the AFI and execute the application to verify the functionality ``` sudo sh - source /opt/Xilinx/SDx/2017.4.rte.dyn/setup.sh - ./helloworld xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_dynamic_5_0.awsxclbin - ``` -**3.1.3.** The FPGA bitstream will be downloaded and the host application will be executed showing output something like: - -![alt tag](./images/makefile_lab/FigMakefileLab-6.png) -#### Figure 6. Execution output - + source /opt/xilinx/xrt/setup.sh + ./helloworld xclbin/vector_addition.hw.xilinx_aws-vu9p-f1-04261818_dynamic_5_0.awsxclbin + ``` +1. The FPGA bitstream will be downloaded and the host application will be executed showing output something like: +

+ +

+

+ Execution output +

+1. Enter **exit** in the teminal window to exit out of _sudo shell_ ## Conclusion -In this lab, you used a Makefile flow to perform CPU and HW emulations. You then ran the application on F1 and validated the functionality. +In this lab, you used a Makefile flow to perform SW and HW emulations. You then ran the application on F1 and validated the functionality. --------------------------------------- @@ -128,78 +117,12 @@ Start the next lab: 3. GUI Flow lab ## Appendix: Build System Hardware using Makefile Flow -### A.1. Build the system hardware using the Makefile flow (Note that since the building of the project takes over two hours skip this step in the workshop environment and move to next step). -**A.1.1.** Execute the following commands to build the system hardware for the 'hello world' example: - +**Build the system hardware using the Makefile flow (Note that since the building of the project takes over two hours skip this step in the workshop environment)** +1. Execute the following commands to build the system hardware for the 'hello world' example: ``` make clean make check TARGETS=hw DEVICES=$AWS_PLATFORM all - ``` - -This will build the project under the **System** directory. The built project will include helloword.exe file along with binary\_container\_1.xclbin file - -This step takes about two hours - -### A.2. Create an Amazon FPGA Image (AFI) - -To execute the application on F1, the following files are needed: - -- Host application (exe) -- FPGA binary (xclbin) -- Amazon FPGA Image (awsxclbin) - -The xclbin and the host applications are already generated by the makefile flow - -**A.2.1.** Create a **xclbin** directory under the _Makefile\_flow_ directory using the following commands - - ``` - cd /home/centos/aws-fpga/Makefile_flow - mkdir xclbin - ``` - -**A.2.2.** Copy the generated **xclbin** file ( **binary\_container\_1.xclbin** ) and the host application (helloworld) from the **helloworld\_ocl** folder into the created **xclbin** directory, using the following commands - - ``` - cd xclbin - cp /home/centos/aws-fpga/Makefile_flow/helloworld_ocl/xclbin/vector_addition.hw.xilinx_aws-vu9p-f1_dynamic_5_0.xclbin . - cp /home/centos/aws-fpga/Makefile_flow/helloworld_ocl/helloworld . - ``` - -### A.3. Create an AFI by running the create\_sdaccel\_afi.sh script and wait for the completion of the AFI creation process -**A.3.1.** Enter the following command to generate the AFI: - - ``` - $SDACCEL_DIR/tools/create_sdaccel_afi.sh –xclbin=vector_addition.hw.xilinx_aws-vu9p-f1_dynamic_5_0.xclbin –s3_bucket= -s3_dcp_key= -s3_logs_key= - ``` - -In the above command, <bucket-name>, <dcp-folder-name>, and <logs-folder-name> are the names you would have given when running CLI script. In the workshop environment this was already done. - -The create\_sdaccel\_afi.sh script does the following: - -- Starts a background process to create the AFI -- Generates a \_afi\_id.txt which contains the FPGA Image Identifier (or AFI ID) and Global FPGA Image Identifier (or AGFI ID) of the generated AFIs -- Creates the \*.awsxclbin AWS FPGA binary file which will need to be read by the host application to determine which AFI should be loaded in the FPGA. - -**A.3.2.** Enter the following command to note the values of the AFI IDs by opening the *\_afi\_id.txt file. - - ``` - cat *afi_id.txt - ``` - -**A.3.3.** Enter the **describe-fpga-images** API command to check the status of the AFI generation process: - - ``` - aws ec2 describe-fpga-images --fpga-image-ids - ``` - -Note: When AFI creation completes successfully, the output should contain: - - ``` - ... - "State": { - "Code": "available" - }, - - ... - ``` -**A.3.4.** Wait until the AFI becomes available before proceeding to execute on the F1 instance. + ``` +This will build the project under the **helloworld\_ocl** directory. The built project will include executable helloworld file along with vector_addition.hw.xilinx_aws-vu9p-f1-04261818_dynamic_5_0.xclbin file under the sub-folder **xclbin** +This step takes about two hours. +**Once the full system is built, you can create an AFI by following the steps listed here** diff --git a/Optimization_lab.md b/Optimization_lab.md index dd4ae96..cd34195 100644 --- a/Optimization_lab.md +++ b/Optimization_lab.md @@ -2,7 +2,7 @@ ## Introduction -This lab guides you through the steps involved in creating a project and adding a kernel function. After creating a project you will run CPU and hardware emulations to verify the functionality, analyze various generated reports and then apply optimization techniques both on host and kernel side to improve throughput and data transfer rate. +This lab guides you through the steps involved in creating a project and adding a kernel function. After creating a project you will run software and hardware emulations to verify the functionality, analyze various generated reports and then apply optimization techniques both on host and kernel side to improve throughput and data transfer rate. ## Objectives @@ -14,426 +14,381 @@ After completing this lab, you will be able to: - Optimize host code to improve data transfer rate - Verify functionality in hardware on F1 -## Procedure - -This lab is separated into steps that consist of general overview statements that provide information on the detailed instructions that follow. Follow these detailed instructions to progress through the lab. - -This lab comprises six primary steps: You will create an SDAccel project, add a kernel function and perform CPU emulation to validate application, then perform HW emulation to see how much acceleration is possible. You will then optimize the kernel code to reduce the latency and improve the initiation interval followed by optimizing the code to improve data transfer rate. Next you will download the bitstream on F1 and validate application execution. The Appendix section lists steps involved in building the full hardware. - -## Step 1: Create an SDAccel Project -### 1.1. Source the SDAccel settings and create a directory called optimization\_lab under _~/aws-fpga_. Change the directory to the newly created directory. -**1.1.1.** Execute the following commands in a terminal window to source the required Xilinx tools: - +## Steps +### Create an SDAccel Project +1. Execute the following commands, if it is not already done, in a terminal window to source the required Xilinx tools: ``` cd ~/aws-fpga source sdaccel_setup.sh source $XILINX_SDX/settings64.sh ``` -**1.1.2.** Execute the following commands in a terminal window to create a working directory: - +1. Execute the following commands in a terminal window to create a working directory: ``` mkdir optimization_flow cd optimization_flow ``` - -### 1.2. Launch SDx, create a workspace in the current directory and create a project, called _optimization\_lab\_example_, using the _Empty Application_ template. -**1.2.1.** Launch SDAccel by executing **sdx** in the terminal window - -An Eclipse launcher widow will appear asking to select a directory as workspace - -**1.2.2.** Click on the **Browse…** button, browse to **/home/centos/src/project_data/aws-fpga/optimization\_lab** , click **OK** twice -![alt tag](./images/optimization_lab/FigOptimizationLab-1.png) -#### Figure 1. Selecting a workspace - -The Xilinx SDx IDE window will be displayed - -![alt tag](./images/FigSDXIDE.png) -#### Figure 2. The SDx IDE window - -**1.2.3.** Click on the **Add Custom Platform** link on the _Welcome_ page - -**1.2.4.** Click on the **Add Custom Platform** button, browse to **/home/centos/aws-fpga/SDAccel/aws\_platfom/xilinx\_aws-vu9p-f1\_dynamic\_5\_0** , and click **OK** - -![alt tag](./images/FigPlatform.png) -#### Figure 3. Hardware platform selected - -**1.2.5.** Click **Apply** and then click **OK** - -**1.2.6.** Click on the **Create SDx Project** link on the _Welcome_ page - -**1.2.7.** In the _New Project_'s page enter **optimization\_lab\_example** in the _Project name:_ field and click **Next** - -Note the AWS-VU9P-F1 board is displayed as the hardware platform - -**1.2.8.** Click **Next** - -**1.2.9.** Click **Next** with Linux on x86 as the System Configuration and OpenCL as the Runtime options - -**1.2.10.** Select **Empty Application** from the _Available Templates_ pane and click **Finish** - -![alt tag](./images/optimization_lab/FigOptimizationLab-4.png) -#### Figure 4. Selecting an application template - +1. Launch SDAccel by executing **sdx** in the terminal window +An Eclipse launcher window will appear asking to select a directory as workspace +1. Click on the **Browse…** button, browse to **/home/centos/src/project_data/aws-fpga/optimization\_lab**, click **OK** twice +

+ +

+

+ Selecting a workspace +

+ The Xilinx SDx IDE window will be displayed +

+ +

+

+ The SDx IDE window +

+1. Click on the **Add Custom Platform** link on the _Welcome_ page +1. Click on the **Add Custom Platform** button, browse to **/home/centos/aws-fpga/SDAccel/aws\_platfom/xilinx\_aws-vu9p-f1-04261818\_dynamic\_5\_0**, and click **OK** +

+ +

+

+ Hardware platform selected +

+1. Click **Apply** and then click **OK** +1. Click on the **Create SDx Project** link on the _Welcome_ page +1. Click **Next** +1. In the _New Project_'s page enter **optimization\_lab\_example** in the _Project name:_ field and click **Next** +Note the aws-vu9p-f1-04261818 board is displayed as the hardware platform +1. Click **Next** +1. Click **Next** with Linux on x86 as the System Configuration and OpenCL as the Runtime options +1. Select **Empty Application** from the _Available Templates_ pane and click **Finish** +

+ +

+

+ Selecting an application template +

The project IDE will be displayed with six main windows: Project Explorer, Project Settings, Reports, Outline, multi-tab console, and Emulation Console. - -![alt tag](./images/optimization_lab/FigOptimizationLab-5.png) -#### Figure 5. Project IDE - -### 1.3. Import the provided two source files from the /home/centos/sources/optimization\_lab folder to the project -**1.3.1.** Right-click on the **src** folder in the _Project Explorer_ and select **Import…** - -**1.3.2.** Select **General > File System** , click **Next** , and browse to the source directory at **/home/centos/sources/optimization\_lab** and click **OK** - -**1.3.3.** Select the **idct.cpp** and **krnl\_idct.cpp** files - -**1.3.4.** Click **Finish** - -**1.3.5.** Expand the **src** folder in the _Project Explorer_ and note the two added files -## Step 2: Add Kernel and Perform CPU Emulation -### 2.1. Select the function(s) that needs to be accelerated. - -**2.1.1.** Click on the _Add Hardware Function_ button icon (![alt tag](./images/Fig-hw_button.png)) in the **Hardware Functions** tab to see possible functions which may be accelerated - -**2.1.2.** Select _kernl\_idct_ function and click **OK** - -![alt tag](./images/optimization_lab/FigOptimizationLab-6.png) -#### Figure 6. Selecting a kernel function - -**2.1.3.** Notice the **binary\_container\_1** folder is created under which _kml\_idct_ function is added -### 2.2. Analyze the source files -**2.2.1.** Open the **krnl\_idct.cpp** file - -**2.2.2.** Locate the **Outline** viewer corresponds to a function in the selected source file. This view provides a convenient way of looking up and navigating the code hierarchy. Each green dot in the **Outline** viewer corresponds to a function in the selected source file - -![alt tag](./images/optimization_lab/FigOptimizationLab-7.png) -#### Figure 7. Outline view - -**2.2.3.** In the _Outline_ viewer, click **idct** to look up the function - -The idct function is the core algorithm implemented in the custom hardware accelerator. It is a computationally intensive function that can be highly parallelized on the FPGA, providing significant acceleration over a CPU-based implementation - -Review other functions of the accelerator. - -- **krnl\_idct** : This is the top-level for the custom hardware accelerator. Interface properties for the accelerator are specified in this function -- **krnl\_idct\_dataflow** : This function is called by the **krnl\_idct** function and encapsulates the main functions of the accelerator -- **read\_blocks** : This function reads from global memory values sent by the host application and streams them to the **execute** function -- **execute** : This function receives the streaming data and, for each 8x8 block received, calls the **idct** function to perform the actual computation and streams the results back out -- **write\_blocks** : This function receives the streaming results from the **execute** function and writes them back to global memory for the host application - -**2.2.4.** Open the **idct.cpp** file - -**2.2.5.** Again use the _Outline_ viewer to quickly look up and inspect the important functions of the host application: - -- **main** : Initializes the test vectors, sets-up OpenCL resources, runs the reference model, runs the hardware accelerator, releases the OpenCL resources, and compares the results of the reference IDCT model with the accelerator implementation -- **runFPGA** : This function takes in a vector of inputs and, for each 8x8 block, calls the hardware accelerated IDCT using the **write** , **run** , **read** , and **finish** helper functions. These function use OpenCL API calls to communicate with the FPGA -- **runCPU** : This function takes in a vector of inputs and, for each 8x8 block, calls **idctSoft** , a reference implementation of the IDCT -- **idctSoft** : This function is the reference software implementation of the IDCT algorithm, used in this example to check the results coming back from the FPGA -- **oclDct** : This class is used to encapsulate the OpenCL runtime calls to interact with the kernel in the FPGA -- **aligned\_allocator** , **smalloc** , **load\_file\_to\_memory** , **getBinaryName** : These are small helper functions used during test vector generation and OpenCL setup - -**2.2.6.** Go to line near line no. 580 of the **idct.cpp** file by pressing Ctrl+l (small L) and entering 580 - -This section of code is where the OpenCL environment is setup in the host application. It is typical of most SDAccel application and will look very familiar to developers with prior OpenCL experience. This body of code can often be reused as-is from project to project. - -To setup the OpenCL environment, the following API calls are made: - -- **clGetPlatformIDs** : This function queries the system to identify the any available OpenCL platforms. It is called twice as it first extracts the number of platforms before extracting the actual supported platforms -- **clGetPlatformInfo** : Get specific information about the OpenCL platform, such as vendor name and platform name -- **clGetDeviceIDs** : Obtain list of devices available on a platform -- **clCreateContext** : Creates an OpenCL context, which manages the runtime objects -- **clGetDeviceInfo** : Get information about an OpenCL device like the device name -- **clCreateProgramWithBinary** : Creates a program object for a context, and loads specified binary data into the program object. The actual program is obtained before this call through the load\_file\_to memory function -- **clCreateKernel** : Creates a kernel object -- **clCreateCommandQueue** : Create a command-queue on a specific device - +

+ +

+

+ The Project IDE +

+ +### Import the provided two source files from the /home/centos/sources/optimization\_lab folder to the project +1. Right-click on the **src** folder in the _Project Explorer_ and select **Import…** +1. Select **General > File System**, click **Next**, browse to the source directory at **/home/centos/sources/optimization\_lab** and click **OK** +1. Select the **idct.cpp** and **krnl\_idct.cpp** files +1. Click **Finish** +1. Expand the **src** folder in the _Project Explorer_ and note the two added files +### Select the function(s) that needs to be accelerated +1. Click on the _Add Hardware Function_ button icon (![alt tag](./images/Fig-hw_button.png)) in the **Hardware Functions** tab to see possible functions which may be accelerated +1. Select _kernl\_idct_ function and click **OK** +

+ +

+

+ Selecting a kernel function +

+1. Notice the **binary\_container\_1** folder is created under which _kml\_idct_ function is added +### Analyze the source files +1. Open the **krnl\_idct.cpp** file +1. Locate the **Outline** viewer corresponds to a function in the selected source file. This view provides a convenient way of looking up and navigating the code hierarchy. Each green dot in the **Outline** viewer corresponds to a function in the selected source file +

+ +

+

+ Outline view +

+1. In the _Outline_ viewer, click **idct** to look up the function +The idct function is the core algorithm implemented in the custom hardware accelerator. It is a computationally intensive function that can be highly paralleled on the FPGA, providing significant acceleration over a CPU-based implementation. + +1. Review other functions of the accelerator + + - **krnl\_idct** : This is the top-level for the custom hardware accelerator. Interface properties for the accelerator are specified in this function + - **krnl\_idct\_dataflow** : This function is called by the **krnl\_idct** function and encapsulates the main functions of the accelerator + - **read\_blocks** : This function reads from global memory values sent by the host application and streams them to the **execute** function + - **execute** : This function receives the streaming data and, for each 8x8 block received, calls the **idct** function to perform the actual computation and streams the results back out + - **write\_blocks** : This function receives the streaming results from the **execute** function and writes them back to global memory for the host application +5. Open the **idct.cpp** file. Again use the _Outline_ viewer to quickly look up and inspect the important functions of the host application: + - **main** : Initializes the test vectors, sets-up OpenCL resources, runs the reference model, runs the hardware accelerator, releases the OpenCL resources, and compares the results of the reference IDCT model with the accelerator implementation + - **runFPGA** : This function takes in a vector of inputs and, for each 8x8 block, calls the hardware accelerated IDCT using the **write** , **run** , **read** , and **finish** helper functions. These function use OpenCL API calls to communicate with the FPGA + - **runCPU** : This function takes in a vector of inputs and, for each 8x8 block, calls **idctSoft** , a reference implementation of the IDCT + - **idctSoft** : This function is the reference software implementation of the IDCT algorithm, used in this example to check the results coming back from the FPGA + - **oclDct** : This class is used to encapsulate the OpenCL runtime calls to interact with the kernel in the FPGA + - **aligned\_allocator** , **smalloc** , **load\_file\_to\_memory** , **getBinaryName** : These are small helper functions used during test vector generation and OpenCL setup +6. Go to line near line no. 580 of the **idct.cpp** file by pressing Ctrl+l (small L) and entering 580 +This section of code is where the OpenCL environment is setup in the host application. It is typical of most SDAccel application and will look very familiar to developers with prior OpenCL experience. This body of code can often be reused as-is from project to project. +To setup the OpenCL environment, the following API calls are made: + - **clGetPlatformIDs** : This function queries the system to identify the any available OpenCL platforms. It is called twice as it first extracts the number of platforms before extracting the actual supported platforms + - **clGetPlatformInfo** : Get specific information about the OpenCL platform, such as vendor name and platform name + - **clGetDeviceIDs** : Obtain list of devices available on a platform + - **clCreateContext** : Creates an OpenCL context, which manages the runtime objects + - **clGetDeviceInfo** : Get information about an OpenCL device like the device name + - **clCreateProgramWithBinary** : Creates a program object for a context, and loads specified binary data into the program object. The actual program is obtained before this call through the load\_file\_to memory function + - **clCreateKernel** : Creates a kernel object + - **clCreateCommandQueue** : Create a command-queue on a specific device Note: all objects accessed through a **clCreate..**. function call should be released before terminating the program by calling **clRelease...**. This avoids memory leakage and clears the locks on the device -### 2.3. Set the XOCC Kernel Linker flags - -#### In the idct.cpp file, locate lines 308-310 and note that there are two DDR banks (BANK0 and BANK1) are being used. By default, the compiler will connect all m\_axi ports to DDR BANK0. In order to instruct the compiler that BANK1 is available, the XOCC Kernel Linker flag has to be added. Add --sp krnl_idct_1.m_axi_gmem:bank0 --sp krnl_idct_1.m_axi_gmem1:bank0 --sp krnl_idct_1.m_axi_gmem2:bank1 in the linker flag field - -**2.3.1.** In the Project Explorer pane, right-click the project **optimization\_lab\_example** and select the **C/C++ Settings** - -**2.3.2.** Select **C/C++ Build** > **Settings** in the left pane - -**2.3.3.** Select the **Miscellaneous** under **SDx XOCC Kernel Linker** - -**2.3.4.** Using the gedit editor, open the file **xocc\_linker\_flag.txt** from the **/home/centos/sources/optimization\_lab/** directory, copy all the text and paste it in the **Other flags** field - -![alt tag](./images/optimization_lab/FigOptimizationLab-8.png) -#### Figure 8. Adding the XOCC Kernel Linker flag - -**2.3.5.** Click **OK** -### 2.4. Build and run software emulation (Emulation-CPU) -**2.4.1.** Make sure the **project.sdx** under _Optimization\_lab\_example_ in the **Project Explorer** tab is selected - -**2.4.2.** Either select **Project > Build Configurations > Set Active > Emulation-CPU** or click on the drop-down button of _Active build configuration_ and select **Emulation-CPU** - -![alt tag](./images/optimization_lab/FigOptimizationLab-9.png) -#### Figure 9. Selecting CPU emulation build configuration - -**2.4.3.** Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button - -This will build the project including Optimization\_lab\_example.exe file under the Emulation-CPU directory - -**2.4.4.** In the Project Explorer pane, right-click the project **optimization\_lab\_example** and select **Run As** > **Run Configurations…** - -**2.4.5.** Select the **Arguments** tab - -**2.4.6.** Click on the **Automatically add binary container(s) to arguments** check box - +### Set the XOCC Kernel Linker flags + +**In the idct.cpp file, locate lines 308-310 and note that there are two DDR banks (BANK0 and BANK1) are being used. By default, the compiler will connect all m\_axi ports to DDR BANK0. In order to instruct the compiler that BANK1 is available, the XOCC Kernel Linker flag has to be added. Add --sp krnl\_idct_1.m\_axi\_gmem:bank0 --sp krnl\_idct\_1.m\_axi\_gmem1:bank0 --sp krnl\_idct\_1.m\_axi\_gmem2:bank1 in the linker flag field** + +1. In the Project Explorer pane, right-click the project **optimization\_lab\_example** and select the **C/C++ Settings** +1. Select **C/C++ Build** > **Settings** in the left pane +1. Select the **Miscellaneous** under **SDx XOCC Kernel Linker** +1. Using the gedit editor, open the file **xocc\_linker\_flag.txt** from the **/home/centos/sources/optimization\_lab/** directory, copy --sp krnl\_idct_1.m\_axi\_gmem:bank0 --sp krnl\_idct\_1.m\_axi\_gmem1:bank0 --sp krnl\_idct\_1.m\_axi\_gmem2:bank1 and paste it in the **Other flags** field. Make sure that there no control characters at the end of the string +

+ +

+

+ Adding the XOCC Kernel Linker flag +

+1. Click **OK** +### Build and run software emulation (Emulation-SW) +1. Make sure the **project.sdx** under _Optimization\_lab\_example_ in the **Project Explorer** tab is selected +1. Either select **Project > Build Configurations > Set Active > Emulation-SW** or click on the drop-down button of _Active build configuration_ and select **Emulation-SW** +1. Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button +This will build the project including Optimization\_lab\_example.exe file under the Emulation-SW directory +1. In the Project Explorer pane, right-click the project **optimization\_lab\_example** and select **Run As** > **Run Configurations…** +1. Select the **Arguments** tab +1. Click on the **Automatically add binary container(s) to arguments** check box This will add **../binary\_container\_1.xclbin** - -**2.4.7.** Click **Apply** and then **Run** - +1. Click on the **Environment** tab and change the _LD\_LIBRARY\_PATH_ to **/opt/xilinx/xrt/lib** and click **OK** +1. Click **Apply** and then click **Run** The application will be run and the output will be displayed in the Console tab - -![alt tag](./images/optimization_lab/FigOptimizationLab-10.png) -#### Figure 10. CPU Emulation run output - -### 2.5. Review the software emulation reports -**2.5.1.** In the **Reports** tab, expand **optimization\_lab\_example** > **Emulation-CPU (sw\_emu)** > **optimization\_lab\_example-Default** - +

+ +

+

+ SW Emulation run output +

+ +### Review the software emulation reports +1. In the **Assistant** tab, expand **optimization\_lab\_example** > **Emulation-SW** > **optimization\_lab\_example-Default** There will be two files generated by the tool after running the software emulation: Profile Summary and Application Timeline - -![alt tag](./images/optimization_lab/FigOptimizationLab-11.png) -#### Figure 11. Generated reports - -**2.5.2.** Double-click the **Profile Summary** report and review it - -This report provides data related to how the application runs. Notice that the report has four tabs at the top: **Top Operations** , **Kernels & Compute Units** , **Data Transfers** , and **OpenCL APIs**. - -![alt tag](./images/optimization_lab/FigOptimizationLab-12.png) -#### Figure 12. The Profile Summary report - -Click the each tab and review the report: - -- **Top Operations** : Shows all the major top operations of memory transfer between the host and kernel to global memory, and kernel execution. This allows you to identify throughput bottlenecks when transferring data. Efficient transfer of data to the kernel/host allows for faster execution times -- **Kernels & Compute Units** : Shows the number of times the kernel was executed. Includes the total, minimum, average, and maximum run times. If the design has multiple compute units, it will show each compute unit's utilization. When accelerating an algorithm, the faster the kernel executes, the higher the throughput which can be achieved. It is best to optimize the kernel to be as fast as it can be with the data it requires -- **Data Transfers** : This tab has no bearing in software emulation as no actual data transfers are emulated across the host to the platform. In hardware emulation, this shows the throughput and bandwidth of the read/writes to the global memory that the host and kernel share -- **OpenCL APIs** : Shows all the OpenCL API command executions, how many time each was executed, and how long they take to execute - -**2.5.3.** Double-click the **Application Timeline** report and review it - -![alt tag](./images/optimization_lab/FigOptimizationLab-13.png) -#### Figure 13. The Application Timeline - +

+ +

+

+ Generated reports +

+1. Double-click the **Profile Summary** report and review it +This report provides data related to how the application runs. Notice that the report has four tabs at the top: **Top Operations**, **Kernels & Compute Units**, **Data Transfers**, and **OpenCL APIs**. +

+ +

+

+ The Profile Summary report +

+1. Click on each of tabs and review the report: + - **Top Operations** : Shows all the major top operations of memory transfer between the host and kernel to global memory, and kernel execution. This allows you to identify throughput bottlenecks when transferring data. Efficient transfer of data to the kernel/host allows for faster execution times + - **Kernels & Compute Units** : Shows the number of times the kernel was executed. Includes the total, minimum, average, and maximum run times. If the design has multiple compute units, it will show each compute unit's utilization. When accelerating an algorithm, the faster the kernel executes, the higher the throughput which can be achieved. It is best to optimize the kernel to be as fast as it can be with the data it requires + - **Data Transfers** : This tab has no bearing in software emulation as no actual data transfers are emulated across the host to the platform. In hardware emulation, this shows the throughput and bandwidth of the read/writes to the global memory that the host and kernel share + - **OpenCL APIs** : Shows all the OpenCL API command executions, how many time each was executed, and how long they take to execute +1. Double-click the **Application Timeline** report and review it +

+ +

+

+ The Application Timeline +

The **Application Timeline** collects and displays host and device events on a common timeline to help you understand and visualize the overall health and performance of your systems. These events include OpenCL API calls from the host code: when they happen and how long each of them takes. -## Step 3: Perform HW Emulation -### 3.1. Select the Emulation-HW build configuration, and build the project. -**3.1.1.** Either select **Project > Build Configurations > Set Active > Emulation-HW** or click on the drop-down button of _Active build configuration_ and select **Emulation-HW** - -![alt tag](./images/optimization_lab/FigOptimizationLab-14.png) -#### Figure 14. Selecting HW emulation build configuration - -**3.1.2.** Set the XOCC Kernel Linker flag as done in Step 2-3 above - -**3.1.3.** Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button - +### Perform HW Emulation +1. Either select **Project > Build Configurations > Set Active > Emulation-HW** or click on the drop-down button of _Active build configuration_ and select **Emulation-HW** +1. Set the XOCC Kernel Linker flag as done earlier for the Emulation-SW mode +1. Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button This will build the project including Optimization\_lab\_example.exe file under the Emulation-HW directory - -**3.1.4.** Select **Run > Run Configurations…** to open the configurations window - -**3.1.5.** Click on the **Arguments** tab and notice that the _binary\_container\_1.xclbin_ is already assigned - +1. Select **Run > Run Configurations…** to open the configurations window +1. In the **Main** tab, click on the _Use waveform for kernel debugging_ and then _Launch live waveform_ options +

+ +

+

+ Setting up for live waveform +

+1. Click on the **Arguments** tab and notice that the _binary\_container\_1.xclbin_ is already assigned If no argument was assigned then you would have to explicitly assign the **xclbin** by clicking on the _Automatically add binary container(s) to arguments_, and click **Apply** - -**3.1.6.** Click **Run** to run the application - -**3.1.7.** The Console tab shows that the test was completed successfully along with the data transfer rate - -![alt tag](./images/optimization_lab/FigOptimizationLab-15.png) -#### Figure 15. Hardware emulation run output - -### 3.2. Understand the HLS Report, profile summary, and Application Timeline. -**3.2.1.** In the **Reports** tab, expand **optimization\_lab\_example** > **Emulation-HW** > **optimization\_lab\_example-Default** - -**3.2.2.** Double-click the **Profile Summary** report and review it - -![alt tag](./images/optimization_lab/FigOptimizationLab-16.png) -#### Figure 16. HW-Emulation Profile Summary report - -**3.2.3.** Review the **Profile Rule Checks** section at the bottom of the **Profile Summary** - -- **Profile Rule Checks** (PRCs) interpret profiling results and suggest areas for performance improvements -- PRCs compare profiling results to threshold values. If a check does not meet the threshold value, the right hand column provides suggestions on how to improve performance -- PRCs work for both hardware emulation and system runs on the FPGA - -**3.2.4.** Click on the **Kernels & Compute Units** tab of the Profile Summary report - -**3.2.5.** Review the Kernel **Total Time (ms)** - +1. Click **Run** to run the application +1. The Console tab shows that the test was completed successfully along with the data transfer rate +

+ +

+

+ Hardware emulation run output +

+Notice that Vivado was started and the simulation waveform window is updated. +1. Click on the Zoom full button and scroll down the waveform window to see activites taking place in the kernel +Since there is no optimzation is done observe that the execution is sequential +

+ +

+

+ Vivado simulator output +

+ +### Understand the HLS Report, profile summary, and Application Timeline +1. In the **Assistant** tab, expand **optimization\_lab\_example** > **Emulation-HW** > **optimization\_lab\_example-Default** +1. Double-click the **Profile Summary** report and review it +

+ +

+

+ HW-Emulation Profile Summary report +

+1. Click on the **Kernels & Compute Units** tab of the Profile Summary report +1. Review the Kernel **Total Time (ms)** This number will serve as a baseline (reference point) to compare against after optimization. - -![alt tag](./images/optimization_lab/FigOptimizationLab-17.png) -#### Figure 17. HW-Emulation Kernels & Compute Units report - -**3.2.6.** In the **Reports** tab, expand **optimization\_lab\_example** > **Emulation-HW** > **binary\_container\_1** > **krnl\_idct** - -**3.2.7.** Double-click the **HLS Report** and review it - -![alt tag](./images/optimization_lab/FigOptimizationLab-18.png) -#### Figure 18. HLS report before optimization - -**3.2.8.** In the **Performance Estimates** section, expand the **Latency (clock cycles)** > **Summary** and note the following numbers: - -- Latency (min/max): -- Interval (min/max): - +

+ +

+

+ HW-Emulation Kernels & Compute Units report +

+1. In the **Assistant** tab, expand **optimization\_lab\_example** > **Emulation-HW** > **binary\_container\_1** > **krnl\_idct** +1. Double-click the **HLS Report** and review it +

+ +

+

+ HLS report before optimization +

+1. In the **Performance Estimates** section, expand the **Latency (clock cycles)** > **Summary** and note the following numbers: + - Latency (min/max): 6185 + - Interval (min/max): 6185 The numbers will serve as a baseline for comparison against optimized versions of the kernel - -**3.2.9.** In the HLS report, expand **Latency (clock cycles)** > **Detail** > **Instance** - -- Note that the 3 sub-functions read, execute and write have roughly the same latency and that their sum total is equivalent to the total Interval reported in the Summary table -- This indicates that the three sub-functions are executing sequentially, hinting to an optimization opportunity - -**3.2.10.** Close all the reports - -## Step 4. Optimizing the Kernel Code -### 4.1. Analyze the kernel code and apply the DATAFLOW directive. -**4.1.1.** Open the **src > krnl\_idct.cpp** file - -**4.1.2.** Using the **Outline** viewer, navigate to the **krnl\_idct\_dataflow** function - +1. In the HLS report, expand **Latency (clock cycles)** > **Detail** > **Instance** + - Note that the 3 sub-functions read, execute and write have roughly the same latency and that their sum total is equivalent to the total Interval reported in the Summary table + - This indicates that the three sub-functions are executing sequentially, hinting to an optimization opportunity +1. Close all the reports + +### Analyze the kernel code and apply the DATAFLOW directive +1. Open the **src > krnl\_idct.cpp** file +1. Using the **Outline** viewer, navigate to the **krnl\_idct\_dataflow** function Observe that the three functions are communicating using **hls::streams** objects. These objects model a FIFO-based communication scheme. This is the recommended coding style which should be used whenever possible to exhibit streaming behavior and allow **DATAFLOW** optimization - -**4.1.3.** Enable the DATAFLOW optimization by uncommenting the **#pragma HLS DATAFLOW** present in the krnl\_idct\_dataflow function (line 322) - -- The DATAFLOW optimization allows each of the subsequent functions to execute as independent processes -- This results in overlapping and pipelined execution of the read, execute and write functions instead of sequential execution -- The FIFO channels between the different processes do not need to buffer the complete dataset anymore but can directly stream the data to the next block - -**4.1.4.** Comment the three **#pragma HLS stream** statements on lines 327, 328 and 329 - -**4.1.5.** Save the file -### 4.2. Build the project in Hardware emulation configuration. -**4.2.1.** Make sure the active configuration is **Emulation-HW** - -**4.2.2.** Click on the Build button (![alt tag](./images/Fig-build.png)) to build the project -### 4.3. Analyze the HLS report. -**4.3.1.** In the **Reports** tab, expand **optimization\_lab\_example** > **Emulation-HW (hw\_emu)** > **binary\_container\_1** > **krnl\_idct** - -**4.3.2.** Click the **HLS Report** and review it - -![alt tag](./images/optimization_lab/FigOptimizationLab-19.png) -#### Figure 19. HLS report after applying pragma DATAFLOW - -**4.3.3.** In the **Performance Estimates** section, expand the **Latency (clock cycles)** > **Summary** and note the following numbers: - -- Latency (min/max): -- Interval (min/max): - -### 4.4. Run the Hardware Emulation. - -**4.4.1.** Run the application by clicking the Run button (![alt tag](./images/Fig-run.png)) - -Wait for the run to finish with RUN COMPLETE message - -**4.4.2.** In the **Reports** tab, expand **optimization\_lab\_example** > **Emulation-HW (hw\_emu)** > **optimization\_lab\_example** and double-click the **Profile Summary** report - -**4.4.3.** Select the **Kernels & Compute Units** tab. - +1. Enable the DATAFLOW optimization by uncommenting the **#pragma HLS DATAFLOW** present in the krnl\_idct\_dataflow function (line 322) - The DATAFLOW optimization allows each of the subsequent functions to execute as independent processes + - This results in overlapping and pipelined execution of the read, execute and write functions instead of sequential execution + - The FIFO channels between the different processes do not need to buffer the complete dataset anymore but can directly stream the data to the next block +1. Comment the three **#pragma HLS stream** statements on lines 327, 328 and 329 +1. Save the file +### Build the project in Hardware emulation configuration and analyze the HLS report +1. Make sure the active configuration is **Emulation-HW** +1. Click on the Build button (![alt tag](./images/Fig-build.png)) to build the project +1. In the **Assistant** tab, expand **optimization\_lab\_example** > **Emulation-HW** > **binary\_container\_1** > **krnl\_idct** +1. Double-click the **HLS Report** and review it +

+ +

+

+ HLS report after applying pragma DATAFLOW +

+1. In the **Performance Estimates** section, expand the **Latency (clock cycles)** > **Summary** and note the following numbers: + - Latency (min/max): 2085 + - Interval (min/max): 2069 +### Run the Hardware Emulation +1. Run the application by clicking the Run button (![alt tag](./images/Fig-run.png)) +Wait for the run to finish with RUN COMPLETE message +Notice the Vivado simulator starts, displaying various activites +Since there is dataflow optimzation done observe concurrent execution of reading, writing, pipelining and kernel running +

+ +

+

+ Vivado simulator output of the DATAFLOW optimized kernel +

+ +1. In the **Assistant** tab, expand **optimization\_lab\_example > Emulation-HW > optimization\_lab\_example-Default** and double-click the **Profile Summary** report +1. Select the **Kernels & Compute Units** tab. Compare the **Kernel Total Time (ms)** with the results from the un-optimized run - -![alt tag](./images/optimization_lab/FigOptimizationLab-20.png) -#### Figure 20. Total execution time of 0.029 compared to 0.046 of un-optimized - -## Step 5: Optimizing the Host Code -### 5.1. Analyze the host code. -**5.1.1.** Open the **src > idct.cpp** file - -**5.1.2.** Using the **Outline** viewer, navigate to the **runFPGA** function - -For each block of 8x8 values, the **runFPGA** function writes data to the FPGA, runs the kernel, and reads results back - -Communication with the FPGA is handled by the OpenCL API calls made within the cu.write, cu.run and cu.read functions - -- **clEnqueueMigrateMemObjects** schedules the transfer of data to or from the FPGA -- **clEnqueueTask** schedules the executing of the kernel - -These OpenCL functions use events to signal their completion and synchronize execution - -**5.1.3.** Open the **Application Timeline** of the _Emulation-HW_ run - +

+ +

+

+ Total execution time of 0.015 compared to 0.029 of un-optimized +

+ +### Analyze the host code +1. Open the **src > idct.cpp** file +1. Using the **Outline** viewer, navigate to the **runFPGA** function +For each block of 8x8 values, the **runFPGA** function writes data to the FPGA, runs the kernel, and reads results back. Communication with the FPGA is handled by the OpenCL API calls made within the cu.write, cu.run and cu.read functions + - **clEnqueueMigrateMemObjects** schedules the transfer of data to or from the FPGA + - **clEnqueueTask** schedules the executing of the kernel +These OpenCL functions use events to signal their completion and synchronize execution +1. Open the **Application Timeline** of the _Emulation-HW_ run The green segments at the bottom indicate when the IDCT kernel is running - -![alt tag](./images/optimization_lab/FigOptimizationLab-21.png) -#### Figure 21. Application Timeline before host code optimization - -**5.1.4.** Notice that there are gaps between each of the green segments indicating that the operations are not overlapping - -**5.1.5.** Zoom in by performing a left mouse drag across one of these gaps to get a more detailed view - -- The two green segments correspond to two consecutive invocations of the IDCT kernel -- The gap between the two segments is indicative of the kernel idle time between these two invocations -- The **Data Transfer** section of the timeline shows that **Read** and **Write** operations are happening when the kernel is idle -- The Read operation is to retrieve the results from the execution which just finished and the Write operation is to send inputs for the next execution -- This represents a sequential execution flow of each iteration - -**5.1.6.** Close the **Application Timeline** - -**5.1.7.** In the **idct.cpp** file, go to the **oclDct::write** function - -- Observe that on line 353, the function synchronizes on the **outEvVec** event through a call to **clWaitForEvents** -- This event is generated by the completion of the **clEnqueueMigrateMemObjects** call in the **oclDct::read** function (line 429) -- Effectively the next execution of the **oclDct::write** function is gated by the completion of the previous **oclDct::read** function, resulting in the sequential behavior observed in the **Application Timeline** - -**5.1.8.** Use the **Outline** viewer to locate the definition of the **NUM\_SCHED** macro in the **idct.cpp** file - -- This macro defines the depth of the event queue -- The value of 1 explains the observed behavior: new tasks (write, run, read) are only enqueued when the previous has completed effectively synchronizing each loop iteration -- By increasing the value of the **NUM\_SCHED** macro, we increase the depth of the event queue and enable more blocks to be enqueued for processing, which may result in the write, run and read tasks to overlap and allow the kernel to execute continuously or at least more frequently -- This technique is called software pipelining - -**5.1.9.** Modify line 213 to increase the value of **NUM\_SCHED** to 6 as follows - +

+ +

+

+ Application Timeline before host code optimization +

+1. Notice that there are gaps between each of the green segments indicating that the operations are not overlapping +1. Zoom in by performing a left mouse drag across one of these gaps to get a more detailed view + - The two green segments correspond to two consecutive invocations of the IDCT kernel + - The gap between the two segments is indicative of the kernel idle time between these two invocations + - The **Data Transfer** section of the timeline shows that **Read** and **Write** operations are happening when the kernel is idle + - The Read operation is to retrieve the results from the execution which just finished and the Write operation is to send inputs for the next execution + - This represents a sequential execution flow of each iteration +1. Close the **Application Timeline** +1. In the **idct.cpp** file, go to the **oclDct::write** function + - Observe that on line 353, the function synchronizes on the **outEvVec** event through a call to **clWaitForEvents** + - This event is generated by the completion of the **clEnqueueMigrateMemObjects** call in the **oclDct::read** function (line 429) + - Effectively the next execution of the **oclDct::write** function is gated by the completion of the previous **oclDct::read** function, resulting in the sequential behavior observed in the **Application Timeline** +1. Use the **Outline** viewer to locate the definition of the **NUM\_SCHED** macro in the **idct.cpp** file + - This macro defines the depth of the event queue + - The value of 1 explains the observed behavior: new tasks (write, run, read) are only enqueued when the previous has completed effectively synchronizing each loop iteration + - By increasing the value of the **NUM\_SCHED** macro, we increase the depth of the event queue and enable more blocks to be enqueued for processing, which may result in the write, run and read tasks to overlap and allow the kernel to execute continuously or at least more frequently + - This technique is called software pipelining +1. Modify line 213 to increase the value of **NUM\_SCHED** to 6 as follows **#define NUM\_SCHED 6** - -**5.1.10.** Save the file -### 5.2. Run the Hardware Emulation. -**5.2.1.** Run the application by clicking the Run button (![alt tag](./images/Fig-run.png)) - -- Since only the idct.cpp file was changed, the incremental makefile rebuilds only the host code before running emulation -- This results in a much faster iteration loop since it is usually the compilation of the kernel to hardware which takes the most time - -**5.2.2.** In the **Reports** tab, expand **optimization\_lab\_example** > **Emulation-HW (hw\_emu)** > **optimization\_lab\_example-Default** - -**5.2.3.** Double-click the **Application Timeline** report +1. Save the file +### Run the Hardware Emulation. +1. Change the run configuration by unchecking the **Use waveform for kernel debugging** option, click **Apply**, and then click **Close** +1. Run the application by clicking the Run button (![alt tag](./images/Fig-run.png)) + - Since only the idct.cpp file was changed, the incremental makefile rebuilds only the host code before running emulation + - This results in a much faster iteration loop since it is usually the compilation of the kernel to hardware which takes the most time +1. In the **Assistant** tab, expand **optimization\_lab\_example > Emulation-HW > optimization\_lab\_example-Default** +1. Double-click the **Application Timeline** report Observe how **software pipelining** enables overlapping of data transfers and kernel execution. - -![alt tag](./images/optimization_lab/FigOptimizationLab-22.png) -#### Figure 22. Application Timeline after the host code optimization +

+ + +

+

+ Application Timeline after the host code optimization +

Note: system tasks might slow down communication between the application and the hardware simulation, impacting on the measured performance results. The effect of software pipelining is considerably higher when running on the actual hardware -## Step 6: Run the Application on F1 -### 6.1. Since the System build and AFI availability takes considerable amount of time, a precompiled version is provided. Use the precompiled solution directory to verify the functionality. -**6.1.1.** Change to the solution directory by executing the following command +### Run the Application on F1 +**Since the System build and AFI availability takes considerable amount of time, a precompiled version is provided. Use the precompiled solution directory to verify the functionality** +1. Open a new terminal window, source sdaccel environment settings, and change to the solution directory by executing the following commands ``` - cd /home/centos/sources/optimization_lab_solution - ``` -**6.1.2.** Run the following commands to load the AFI and execute the application to verify the functionality - + cd ~/aws-fpga + source sdaccel_setup.sh + source $XILINX_SDX/settings64.sh + cd /home/centos/sources/optimization_lab_solution + ``` +1. Run the following commands to load the AFI and execute the application to verify the functionality ``` sudo sh - source /opt/Xilinx/SDx/2017.4.rte.dyn/setup.sh + source /opt/xilinx/xrt/setup.sh ./optimization_lab_example.exe xclbin/binary_container_1.awsxclbin ``` -**6.1.3.** The FPGA bitstream will be downloaded and the host application will be executed showing output something like: - -![alt tag](./images/optimization_lab/FigOptimizationLab-23.png) -#### Figure 23. Execution output - -**6.1.4.** Enter **exit** in the teminal window to exit out of the sudo shell - -**6.1.5.** Close the SDx by selecting **File > Exit** +1. The FPGA bitstream will be downloaded and the host application will be executed showing output something like: +

+ +

+

+ Execution output +

+1. Enter **exit** in the terminal window to exit out of the _sudo shell_ +1. Close the SDx by selecting **File > Exit** ## Conclusion -In this lab, you used SDAccel IDE to create a project and added a kernel function. After identifying the kernel you performed CPU and hardware emulations. You analyzed various generated reports and then you optimized kernel code using DATAFLOW and host code by increasing the number of read, write, and run tasks to improve throughput and data transfer rates. You then validated the functionality on F1. +In this lab, you used SDAccel IDE to create a project and added a kernel function. After identifying the kernel you performed software and hardware emulations. You analyzed various generated reports and then you optimized kernel code using DATAFLOW and host code by increasing the number of read, write, and run tasks to improve throughput and data transfer rates. You then validated the functionality on F1. --------------------------------------- @@ -445,74 +400,11 @@ Start the next lab: 5. RTL-Kernel Wizard Lab< ## Appendix Build Full Hardware -### A.1. Set the build configuration to System and build the system (Note that since the building of the project takes over two hours skip this step in the workshop environment and move to next step). -**A.1.1.** Either select **Project > Build Configurations > Set Active > System** or click on the drop-down button of _Active build configuration_ and select **System** - -**A.1.2.** Set the XOCC Kernel Linker flag as done in Step 2-3 above - -**A.1.3.** Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button - -This will build the project under the **System** directory. The built project will include optimization\_lab\_example.exe file along with binary\_container\_1.xclbin file - -This step takes about two hours - -### A.2. Create an Amazon FPGA Image (AFI) - -To execute the application on F1, the following files are needed: - -- Host application -- FPGA binary (xclbin) -- Amazon FPGA Image (awsxclbin) - -The xclbin and the host applications are already generated by the System configuration step +**Set the build configuration to System and build the system (Note that since the building of the project takes over two hours skip this step in the workshop environment).** -**A.2.1.** Create a **xclbin** directory under the _optimization\_lab_ directory using the following commands +1. Either select **Project > Build Configurations > Set Active > System** or click on the drop-down button of _Active build configuration_ and select **System** +1. Set the XOCC Kernel Linker flag as done before +1. Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button +This will build the project under the **System** directory. The built project will include **optimization\_lab\_example.exe** file along with **binary\_container\_1.xclbin** file. This step takes about two hours - ``` - cd /home/centos/aws-fpga/optimization_lab - mkdir xclbin - ``` - -**A.2.2.** Copy the generated **xclbin** file ( **binary\_container\_1.xclbin** ) and the host application (gui\_kernel\_example.exe) from the **System** folder into the created **xclbin** directory, using the following commands - - ``` - cd xclbin - cp /home/centos/aws-fpga/optimization_lab/optimization_lab_example/System/binary_container_1.xclbin . - cp /home/centos/aws-fpga/optimization_lab/optimization_lab_example/System/optimization_lab_example.exe . - ``` - -### A.3. Create an AFI by running the create\_sdaccel\_afi.sh script and wait for the completion of the AFI creation process -**A.3.1.** Enter the following command to generate the AFI: - - ``` - $SDACCEL_DIR/tools/create_sdaccel_afi.sh –xclbin=binary_container_1.xclbin –s3_bucket= -s3_dcp_key= -s3_logs_key= - ``` - -In the above command, <bucket-name>, <dcp-folder-name>, and <logs-folder-name> are the names you would have given when running CLI script. In the workshop environment this was already done. - -The create\_sdaccel\_afi.sh script does the following: - -- Starts a background process to create the AFI -- Generates a \_afi\_id.txt which contains the FPGA Image Identifier (or AFI ID) and Global FPGA Image Identifier (or AGFI ID) of the generated AFIs -- Creates the \*.awsxclbin AWS FPGA binary file which will need to be read by the host application to determine which AFI should be loaded in the FPGA. - -**A.3.2.** Enter the following command to note the values of the AFI IDs by opening the *\_afi\_id.txt file. - ``` - cat *afi_id.txt - ``` -**A.3.3.** Enter the **describe-fpga-images** API command to check the status of the AFI generation process: - - ``` - aws ec2 describe-fpga-images --fpga-image-ids - ``` -Note: When AFI creation completes successfully, the output should contain: - - ``` - ... - "State": { - "Code": "available" - }, - - ... - ``` -**A.3.4.** Wait until the AFI becomes available before proceeding to execute on the F1 instance. +**Once the full system is built, you can create an AFI by following the steps listed here** diff --git a/README.md b/README.md index abc584d..f1782f5 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,15 @@ +# awslabs_2018_2 - - - - - + + + + + +

XUP AWS F1 Labs

1. Connecting to your F1 instance2. Makefile Flow Lab3. GUI Flow Lab4. Optimization Lab5. RTL-Kernel Wizard Lab1. Connecting to your F1 instance2. Makefile Flow Lab3. GUI Flow Lab4. Optimization Lab5. RTL-Kernel Wizard Lab6. Debug Lab
@@ -50,4 +52,3 @@ Since building FPGA binaries is not instantaneous, all the labs will use precomp

Start the first lab: 1. Connecting to your F1 instance

- diff --git a/debug_lab.md b/debug_lab.md new file mode 100644 index 0000000..1d1248a --- /dev/null +++ b/debug_lab.md @@ -0,0 +1,240 @@ +# Hardware/Software Debugging + +## Introduction + +This lab is continuation of the previous (**RTL-Kernel Wizard Lab**) lab. You will add ChipScope cores to monitor the acitivities taking place at the kernel interface level and perform software debugging using SDx debug capabilities. + +## Objectives + +After completing this lab, you will be able to: + +- Add ChipScope cores at the kernel interface level +- Debug software application +- Verify functionality in hardware on F1 + +## Steps +### Open an SDAccel Project +1. Execute the following commands, if not already done, in a terminal window to source the required environment settings: + ``` + cd ~/aws-fpga + source sdaccel_setup.sh + source $XILINX_SDX/settings64.sh + ``` +1. Execute the following commands in a terminal window to change to a working directory where the pre-compiled project is provided: + ``` + cd /home/centos/sources/debug_lab + ``` +1. Since we will be executing application in System configuration mode, we need to start the SDx program as being **su**. Execute the following commands to launch **sdx** + ``` + sudo sh + source /opt/xilinx/xrt/setup.sh + /opt/Xilinx/SDx/2018.2.op2258646/bin/sdx + ``` +1. An Eclipse launcher window will appear asking you to select a directory as workspace. Click on the **Browse…** button, browse to **/home/centos/sources/debug\_lab**, click **OK** twice + +### Hardware Debugging +#### Review the Appendix section to understand how to add ChipScope Debug bridge core. It is already added in the pre-compiled design +#### Run the application +1. In the **Assistant** tab, expand **System > Run** and select **Run Configuration** +1. Make sure that the Arguments tab shows **../binary_container_1.xclbin** entry +1. Make sure that the Environment tab shows **/opt/xilinx/xrt/lib** in the _LD\_LIBRARY\_PATH_ field +1. Click **Run** +The host application will start executing, loading bitstream, and pausing for the user input as coded on line 246 +

+ +

+

+ Paused execution +

+### Start Vivado Hardware Manager +1. In another terminal window, start virtual jtag connection using following two commands. The Virtual JTAG XVC Server will start listining to TCP port 10201 + ``` + source $XILINX_SDX/settings64.sh + sudo fpga-start-virtual-jtag -P 10201 -S 0 + ``` +

+ +

+

+ Paused execution +

+1. Start Vivado from another terminal window from the debug_lab directory +1. Click on **Open Hardware Manager** link +1. Click **Open Target > Open New Target** +1. Click **Next** +1. Click **Next** keeping default _Local Server_ option +1. Click on the **Add Xilinx Virtual Cable (XVC)** button +1. Enter **localhost** in the _Host name_ and **10201** in the _Port_ fields, and click **OK** +

+ +

+

+ Adding Virtual JTAG cable +

+ The Open New Hardware Target form with scanned debug_bridge will appear +

+ +

+

+ Scanned debug bridge +

+1. Click **Next** and then **Finish** +The Vivado Hardware Manager will open showing _Hardware_, _Waveform_, _Settings-hw_, _Trigger-Setup_ windows. The _Hardware_ window also shows two ILA cores inserted in the design +

+ +

+

+ Hardware Manager +

+1. In the _Hardware Device Properties_ view, click on browse button of the **Probes file**, browse to **/home/centos/sources/debug_lab/rtl_kernel_example/System** folder, select **top_sp.ltx** entry and click **OK** +Notice four (Slot_0 to Slot_3) probes are filled in the Waveform window +1. Click on the _Run Trigger immediate_ button and observe the waveform window is filled indicating the four channels are in _Inactive_ state +

+ +

+

+ Forced triggered waveform window +

+1. Expand **slot_1 : KVAdd_1_m01_axi : W Channel** in the Waveform window, select the **WVALID** signal and drag it to the Trigger Setup - hw window +

+ +

+

+ Adding a probe in the Trigger Setup window +

+1. Click on drop-down button of the Value field and select trigger condition value as 1 +

+ +

+

+ Setting trigger condition +

+1. Click on the Run button and observe hw_ila_1 probe is waiting for the trigger condition to occur +

+ +

+

+ Waiting for the trigger condition to occur +

+1. Switch to the SDx window and hit Enter key in the Console window for the program to continue execution +Observe that the program finishes execution displaying **INFO: Test completed successfully** message in the Console window +1. Switch back to Vivado and observed that since the trigger condition is met, the waveform window is displaying activities +

+ +

+

+ Triggered waveform +

+1. Expand **Slot_0, slot_1,** and **slot_2** groups, zoom in region of about _450 to 1000_, and observe the data transfer taking place on each channels. Also note the addresses from where data are read and results are written into +

+ +

+

+ Zoomed view showing various activities +

+1. Zoom in on one of the data beats and hover your mouse at each successive samples and notice the data content changing +1. Close Vivado by selecting **File > Exit** +1. Close the jtag probe by switching to its terminal window and pressing _Ctrl-C_ + +### Perform Software Debugging +1. Switch to the SDx GUI +1. Comment out lines 246 and 247 +1. Save the file by typing **Ctrl-S** +1. In the **Assistant** tab, right-click on **System > Debug** and select **Debug Configuration** +1. Make sure that the **Arguments** tab shows **../binary_container_1.xclbin** entry +1. Make sure that the Environment tab shows **/opt/xilinx/xrt/lib** in the _LD\_LIBRARY\_PATH_ field +1. Click **Debug** +The host application will compile since we have modified it and a window will pop-up asking to switch to _Debug perspective_ +1. Click **Yes** +The program will be downloaded and execution will begin, halting at **main()** entry point +1. In the _main.c_ view scroll down to line 280 and double-click on the left border to set a breakpoint +At this point, three buffers would have been created +

+ +

+

+ Setting a breakpoint +

+1. Click on the **Resume** button or press **F8** +The execution will resume and stop at the breakpoint +At this point you can go to various tabs and see the contents in the current scope +Two of the important features of SDx debugging is examining command queues and memory buffers as the program execution progresses +1. Click on the **Step Over** button or press **F6** +The execution will progress one statement at a time +1. Continue pressing **F6** until you reach line number _344_ at which point kernel would have finished execution +1. Click on the _Suspend : Step_ entry in the **Debug** tab and then select **Memory Buffers** tab +Notice that three buffers are allocated, their IDs, DDR memory address, and sizes +

+ +

+

+ Memory buffers allocated +

+1. Select the **Command Queue** tab and notice that there no commands enqued. Lines 344-348 creates commands to read the data and results +

+ +

+

+ Setting a breakpoint +

+1. Press **F6** to execute _clEnqueueReadBuffer_ command to create a read buffer command for reading operand _d\_A_ +Notice the Command Queue tab shows one command submitted +

+ +

+

+ Setting a breakpoint +

+1. Press **F6** to execute _clEnqueueReadBuffer_ command for reading operand _d\_B_ +Notice the Command Queue tab shows two commands submitted +

+ +

+

+ Setting a breakpoint +

+1. Set a breakpoint at line _397_ and press **F8** to resume the execution +Notice that the Command Queue tab still shows entries +1. Press **F6** to execute _clReleaseKernel_ command +Notice the Memory Buffers tab is empty as all memories are released +1. Click **F8** to complete the execution +1. Close the SDx program + +## Conclusion + +In this lab, you used ChipScope Debug bridge and cores to perform hardware debugging. You also performed software debugging using SDx debug perspective. + +## Appendix +### Steps to Add ChipScope Debug core +1. In the **Assistant** tab, expand **System > binary_container_1 > KVadd** +1. Select **KVAdd**, right-click and select **Settings...** +1. In the **Hardware Function Settings** window, click on the _ChipScope Debug_ option for the _KVAdd_ kernel +

+ +

+

+ Adding ChipScope Debug module +

+1. Click **Apply** and **OK** +1. In the **Project Explorer** tab, expand **src > sdx_rtl_kernel > KVAdd** and double-click on the **main.c** to open it in the editor window +1. Go to line 246 and enter the following lines of code which will pause the host software execution after creating kernel but before allocating buffer + ``` + printf("\nPress ENTER to continue after setting up ILA trigger..."); + getc(stdin); + ``` +

+ +

+

+ Modifying code to stop its execution before kernel is executed to start Vivado Hardware manager +

+ + + + + + + + + + diff --git a/images/FigEmptyProjectTemplate.png b/images/FigEmptyProjectTemplate.png new file mode 100644 index 0000000..b05bc4a Binary files /dev/null and b/images/FigEmptyProjectTemplate.png differ diff --git a/images/FigPlatform.png b/images/FigPlatform.png index f2da40e..62621a2 100644 Binary files a/images/FigPlatform.png and b/images/FigPlatform.png differ diff --git a/images/FigPlatform_1.png b/images/FigPlatform_1.png new file mode 100644 index 0000000..f2da40e Binary files /dev/null and b/images/FigPlatform_1.png differ diff --git a/images/SDX_IDE.png b/images/SDX_IDE.png new file mode 100644 index 0000000..5995799 Binary files /dev/null and b/images/SDX_IDE.png differ diff --git a/images/Templates.png b/images/Templates.png new file mode 100644 index 0000000..44d5965 Binary files /dev/null and b/images/Templates.png differ diff --git a/images/debug_lab/FigDebugLab-1.png b/images/debug_lab/FigDebugLab-1.png new file mode 100644 index 0000000..298438f Binary files /dev/null and b/images/debug_lab/FigDebugLab-1.png differ diff --git a/images/debug_lab/FigDebugLab-10.png b/images/debug_lab/FigDebugLab-10.png new file mode 100644 index 0000000..4aecaf8 Binary files /dev/null and b/images/debug_lab/FigDebugLab-10.png differ diff --git a/images/debug_lab/FigDebugLab-11.png b/images/debug_lab/FigDebugLab-11.png new file mode 100644 index 0000000..95371b3 Binary files /dev/null and b/images/debug_lab/FigDebugLab-11.png differ diff --git a/images/debug_lab/FigDebugLab-12.png b/images/debug_lab/FigDebugLab-12.png new file mode 100644 index 0000000..8470903 Binary files /dev/null and b/images/debug_lab/FigDebugLab-12.png differ diff --git a/images/debug_lab/FigDebugLab-13.png b/images/debug_lab/FigDebugLab-13.png new file mode 100644 index 0000000..b8fda9d Binary files /dev/null and b/images/debug_lab/FigDebugLab-13.png differ diff --git a/images/debug_lab/FigDebugLab-14.png b/images/debug_lab/FigDebugLab-14.png new file mode 100644 index 0000000..d568b42 Binary files /dev/null and b/images/debug_lab/FigDebugLab-14.png differ diff --git a/images/debug_lab/FigDebugLab-15.png b/images/debug_lab/FigDebugLab-15.png new file mode 100644 index 0000000..b11aa83 Binary files /dev/null and b/images/debug_lab/FigDebugLab-15.png differ diff --git a/images/debug_lab/FigDebugLab-16.png b/images/debug_lab/FigDebugLab-16.png new file mode 100644 index 0000000..264bdc7 Binary files /dev/null and b/images/debug_lab/FigDebugLab-16.png differ diff --git a/images/debug_lab/FigDebugLab-17.png b/images/debug_lab/FigDebugLab-17.png new file mode 100644 index 0000000..fb15a5f Binary files /dev/null and b/images/debug_lab/FigDebugLab-17.png differ diff --git a/images/debug_lab/FigDebugLab-18.png b/images/debug_lab/FigDebugLab-18.png new file mode 100644 index 0000000..0ca0a79 Binary files /dev/null and b/images/debug_lab/FigDebugLab-18.png differ diff --git a/images/debug_lab/FigDebugLab-2.png b/images/debug_lab/FigDebugLab-2.png new file mode 100644 index 0000000..5432154 Binary files /dev/null and b/images/debug_lab/FigDebugLab-2.png differ diff --git a/images/debug_lab/FigDebugLab-3.png b/images/debug_lab/FigDebugLab-3.png new file mode 100644 index 0000000..8d9a762 Binary files /dev/null and b/images/debug_lab/FigDebugLab-3.png differ diff --git a/images/debug_lab/FigDebugLab-4.png b/images/debug_lab/FigDebugLab-4.png new file mode 100644 index 0000000..e704195 Binary files /dev/null and b/images/debug_lab/FigDebugLab-4.png differ diff --git a/images/debug_lab/FigDebugLab-5.png b/images/debug_lab/FigDebugLab-5.png new file mode 100644 index 0000000..3d8dc81 Binary files /dev/null and b/images/debug_lab/FigDebugLab-5.png differ diff --git a/images/debug_lab/FigDebugLab-6.png b/images/debug_lab/FigDebugLab-6.png new file mode 100644 index 0000000..55896f4 Binary files /dev/null and b/images/debug_lab/FigDebugLab-6.png differ diff --git a/images/debug_lab/FigDebugLab-7.png b/images/debug_lab/FigDebugLab-7.png new file mode 100644 index 0000000..5939d05 Binary files /dev/null and b/images/debug_lab/FigDebugLab-7.png differ diff --git a/images/debug_lab/FigDebugLab-8.png b/images/debug_lab/FigDebugLab-8.png new file mode 100644 index 0000000..c9595ae Binary files /dev/null and b/images/debug_lab/FigDebugLab-8.png differ diff --git a/images/debug_lab/FigDebugLab-9.png b/images/debug_lab/FigDebugLab-9.png new file mode 100644 index 0000000..c98be76 Binary files /dev/null and b/images/debug_lab/FigDebugLab-9.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-1.png b/images/guiflow_lab/FigGUIflowLab-1.png deleted file mode 100644 index 2e0feb8..0000000 Binary files a/images/guiflow_lab/FigGUIflowLab-1.png and /dev/null differ diff --git a/images/guiflow_lab/FigGUIflowLab-10.png b/images/guiflow_lab/FigGUIflowLab-10.png index 38cfec1..5dc4465 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-10.png and b/images/guiflow_lab/FigGUIflowLab-10.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-11.png b/images/guiflow_lab/FigGUIflowLab-11.png index 4892377..010a679 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-11.png and b/images/guiflow_lab/FigGUIflowLab-11.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-12.png b/images/guiflow_lab/FigGUIflowLab-12.png index d310429..34b705b 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-12.png and b/images/guiflow_lab/FigGUIflowLab-12.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-13.png b/images/guiflow_lab/FigGUIflowLab-13.png index a9685b8..4af5066 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-13.png and b/images/guiflow_lab/FigGUIflowLab-13.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-14.png b/images/guiflow_lab/FigGUIflowLab-14.png index 6c55cdd..3a9eae9 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-14.png and b/images/guiflow_lab/FigGUIflowLab-14.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-15-1.png b/images/guiflow_lab/FigGUIflowLab-15-1.png index d8a85fa..56ec4fc 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-15-1.png and b/images/guiflow_lab/FigGUIflowLab-15-1.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-15-2.png b/images/guiflow_lab/FigGUIflowLab-15-2.png index aa1f79a..f702186 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-15-2.png and b/images/guiflow_lab/FigGUIflowLab-15-2.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-15-3.png b/images/guiflow_lab/FigGUIflowLab-15-3.png index e1635e8..9344718 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-15-3.png and b/images/guiflow_lab/FigGUIflowLab-15-3.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-16.png b/images/guiflow_lab/FigGUIflowLab-16.png index 570c819..b6fae8b 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-16.png and b/images/guiflow_lab/FigGUIflowLab-16.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-17.png b/images/guiflow_lab/FigGUIflowLab-17.png index b790e9b..392144c 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-17.png and b/images/guiflow_lab/FigGUIflowLab-17.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-18.png b/images/guiflow_lab/FigGUIflowLab-18.png index a080331..e481e97 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-18.png and b/images/guiflow_lab/FigGUIflowLab-18.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-19.png b/images/guiflow_lab/FigGUIflowLab-19.png index f963106..c370f63 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-19.png and b/images/guiflow_lab/FigGUIflowLab-19.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-20.png b/images/guiflow_lab/FigGUIflowLab-20.png index be38db8..d063928 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-20.png and b/images/guiflow_lab/FigGUIflowLab-20.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-21-1.png b/images/guiflow_lab/FigGUIflowLab-21-1.png new file mode 100644 index 0000000..5235d9d Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-21-1.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-21.png b/images/guiflow_lab/FigGUIflowLab-21.png index 3ef6f82..6ac8a2a 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-21.png and b/images/guiflow_lab/FigGUIflowLab-21.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-22.png b/images/guiflow_lab/FigGUIflowLab-22.png new file mode 100644 index 0000000..3ddd9bd Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-22.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-23.png b/images/guiflow_lab/FigGUIflowLab-23.png new file mode 100644 index 0000000..ceb90ca Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-23.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-24.png b/images/guiflow_lab/FigGUIflowLab-24.png new file mode 100644 index 0000000..9644752 Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-24.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-25.png b/images/guiflow_lab/FigGUIflowLab-25.png new file mode 100644 index 0000000..1f58017 Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-25.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-26.png b/images/guiflow_lab/FigGUIflowLab-26.png new file mode 100644 index 0000000..7303b19 Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-26.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-27.png b/images/guiflow_lab/FigGUIflowLab-27.png new file mode 100644 index 0000000..60a8dbf Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-27.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-28.png b/images/guiflow_lab/FigGUIflowLab-28.png new file mode 100644 index 0000000..8c568c8 Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-28.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-29.png b/images/guiflow_lab/FigGUIflowLab-29.png new file mode 100644 index 0000000..7227802 Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-29.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-30.png b/images/guiflow_lab/FigGUIflowLab-30.png new file mode 100644 index 0000000..83f4a2a Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-30.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-31.png b/images/guiflow_lab/FigGUIflowLab-31.png new file mode 100644 index 0000000..aab4d7c Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-31.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-4.png b/images/guiflow_lab/FigGUIflowLab-4.png deleted file mode 100644 index b6a6676..0000000 Binary files a/images/guiflow_lab/FigGUIflowLab-4.png and /dev/null differ diff --git a/images/guiflow_lab/FigGUIflowLab-5.png b/images/guiflow_lab/FigGUIflowLab-5.png index 2739a83..bccb1c3 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-5.png and b/images/guiflow_lab/FigGUIflowLab-5.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-6.png b/images/guiflow_lab/FigGUIflowLab-6.png index 7be6bee..199c408 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-6.png and b/images/guiflow_lab/FigGUIflowLab-6.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-7.png b/images/guiflow_lab/FigGUIflowLab-7.png index d24ceaf..c8be966 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-7.png and b/images/guiflow_lab/FigGUIflowLab-7.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-8.png b/images/guiflow_lab/FigGUIflowLab-8.png index 9b93163..9326e98 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-8.png and b/images/guiflow_lab/FigGUIflowLab-8.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-9-1.png b/images/guiflow_lab/FigGUIflowLab-9-1.png new file mode 100644 index 0000000..6c1e03e Binary files /dev/null and b/images/guiflow_lab/FigGUIflowLab-9-1.png differ diff --git a/images/guiflow_lab/FigGUIflowLab-9.png b/images/guiflow_lab/FigGUIflowLab-9.png index f724bff..331985a 100644 Binary files a/images/guiflow_lab/FigGUIflowLab-9.png and b/images/guiflow_lab/FigGUIflowLab-9.png differ diff --git a/images/makefile_lab/FigMakefileLab-1.png b/images/makefile_lab/FigMakefileLab-1.png index 782bd82..e2e1a2b 100644 Binary files a/images/makefile_lab/FigMakefileLab-1.png and b/images/makefile_lab/FigMakefileLab-1.png differ diff --git a/images/makefile_lab/FigMakefileLab-3.png b/images/makefile_lab/FigMakefileLab-3.png index f0836b5..f6fdaad 100644 Binary files a/images/makefile_lab/FigMakefileLab-3.png and b/images/makefile_lab/FigMakefileLab-3.png differ diff --git a/images/makefile_lab/FigMakefileLab-4.png b/images/makefile_lab/FigMakefileLab-4.png index f870801..8359681 100644 Binary files a/images/makefile_lab/FigMakefileLab-4.png and b/images/makefile_lab/FigMakefileLab-4.png differ diff --git a/images/makefile_lab/FigMakefileLab-5.png b/images/makefile_lab/FigMakefileLab-5.png index b754039..b3a1c4e 100644 Binary files a/images/makefile_lab/FigMakefileLab-5.png and b/images/makefile_lab/FigMakefileLab-5.png differ diff --git a/images/makefile_lab/FigMakefileLab-6.png b/images/makefile_lab/FigMakefileLab-6.png index 158ff01..e9d839d 100644 Binary files a/images/makefile_lab/FigMakefileLab-6.png and b/images/makefile_lab/FigMakefileLab-6.png differ diff --git a/images/optimization_lab/FigOptimizationLab-1.png b/images/optimization_lab/FigOptimizationLab-1.png deleted file mode 100644 index bfddb98..0000000 Binary files a/images/optimization_lab/FigOptimizationLab-1.png and /dev/null differ diff --git a/images/optimization_lab/FigOptimizationLab-10.png b/images/optimization_lab/FigOptimizationLab-10.png index f396414..7eb63b1 100644 Binary files a/images/optimization_lab/FigOptimizationLab-10.png and b/images/optimization_lab/FigOptimizationLab-10.png differ diff --git a/images/optimization_lab/FigOptimizationLab-11.png b/images/optimization_lab/FigOptimizationLab-11.png index 171d974..05ac1f5 100644 Binary files a/images/optimization_lab/FigOptimizationLab-11.png and b/images/optimization_lab/FigOptimizationLab-11.png differ diff --git a/images/optimization_lab/FigOptimizationLab-12.png b/images/optimization_lab/FigOptimizationLab-12.png index 107de75..53e2579 100644 Binary files a/images/optimization_lab/FigOptimizationLab-12.png and b/images/optimization_lab/FigOptimizationLab-12.png differ diff --git a/images/optimization_lab/FigOptimizationLab-13.png b/images/optimization_lab/FigOptimizationLab-13.png index 334829d..b92ec4b 100644 Binary files a/images/optimization_lab/FigOptimizationLab-13.png and b/images/optimization_lab/FigOptimizationLab-13.png differ diff --git a/images/optimization_lab/FigOptimizationLab-14.png b/images/optimization_lab/FigOptimizationLab-14.png index b7051ec..a20e6da 100644 Binary files a/images/optimization_lab/FigOptimizationLab-14.png and b/images/optimization_lab/FigOptimizationLab-14.png differ diff --git a/images/optimization_lab/FigOptimizationLab-15-1.png b/images/optimization_lab/FigOptimizationLab-15-1.png new file mode 100644 index 0000000..d408b1d Binary files /dev/null and b/images/optimization_lab/FigOptimizationLab-15-1.png differ diff --git a/images/optimization_lab/FigOptimizationLab-15.png b/images/optimization_lab/FigOptimizationLab-15.png index 07a6b03..fd6cd6e 100644 Binary files a/images/optimization_lab/FigOptimizationLab-15.png and b/images/optimization_lab/FigOptimizationLab-15.png differ diff --git a/images/optimization_lab/FigOptimizationLab-16.png b/images/optimization_lab/FigOptimizationLab-16.png index 8c8ceb3..d97123f 100644 Binary files a/images/optimization_lab/FigOptimizationLab-16.png and b/images/optimization_lab/FigOptimizationLab-16.png differ diff --git a/images/optimization_lab/FigOptimizationLab-17.png b/images/optimization_lab/FigOptimizationLab-17.png index b7bf46c..984e52f 100644 Binary files a/images/optimization_lab/FigOptimizationLab-17.png and b/images/optimization_lab/FigOptimizationLab-17.png differ diff --git a/images/optimization_lab/FigOptimizationLab-18.png b/images/optimization_lab/FigOptimizationLab-18.png index 4609a2f..72c8f84 100644 Binary files a/images/optimization_lab/FigOptimizationLab-18.png and b/images/optimization_lab/FigOptimizationLab-18.png differ diff --git a/images/optimization_lab/FigOptimizationLab-19.png b/images/optimization_lab/FigOptimizationLab-19.png index a43ec15..5e19144 100644 Binary files a/images/optimization_lab/FigOptimizationLab-19.png and b/images/optimization_lab/FigOptimizationLab-19.png differ diff --git a/images/optimization_lab/FigOptimizationLab-2.png b/images/optimization_lab/FigOptimizationLab-2.png deleted file mode 100644 index f351b41..0000000 Binary files a/images/optimization_lab/FigOptimizationLab-2.png and /dev/null differ diff --git a/images/optimization_lab/FigOptimizationLab-20-1.png b/images/optimization_lab/FigOptimizationLab-20-1.png new file mode 100644 index 0000000..2bd0d80 Binary files /dev/null and b/images/optimization_lab/FigOptimizationLab-20-1.png differ diff --git a/images/optimization_lab/FigOptimizationLab-20.png b/images/optimization_lab/FigOptimizationLab-20.png index 76ef30c..496a27e 100644 Binary files a/images/optimization_lab/FigOptimizationLab-20.png and b/images/optimization_lab/FigOptimizationLab-20.png differ diff --git a/images/optimization_lab/FigOptimizationLab-21.png b/images/optimization_lab/FigOptimizationLab-21.png index 936c681..79a9a58 100644 Binary files a/images/optimization_lab/FigOptimizationLab-21.png and b/images/optimization_lab/FigOptimizationLab-21.png differ diff --git a/images/optimization_lab/FigOptimizationLab-22-1.png b/images/optimization_lab/FigOptimizationLab-22-1.png new file mode 100644 index 0000000..503048b Binary files /dev/null and b/images/optimization_lab/FigOptimizationLab-22-1.png differ diff --git a/images/optimization_lab/FigOptimizationLab-22.png b/images/optimization_lab/FigOptimizationLab-22.png index b1e9c08..44a83ff 100644 Binary files a/images/optimization_lab/FigOptimizationLab-22.png and b/images/optimization_lab/FigOptimizationLab-22.png differ diff --git a/images/optimization_lab/FigOptimizationLab-23.png b/images/optimization_lab/FigOptimizationLab-23.png index 5cb6c50..e0505fb 100644 Binary files a/images/optimization_lab/FigOptimizationLab-23.png and b/images/optimization_lab/FigOptimizationLab-23.png differ diff --git a/images/optimization_lab/FigOptimizationLab-3.png b/images/optimization_lab/FigOptimizationLab-3.png deleted file mode 100644 index f040007..0000000 Binary files a/images/optimization_lab/FigOptimizationLab-3.png and /dev/null differ diff --git a/images/optimization_lab/FigOptimizationLab-4.png b/images/optimization_lab/FigOptimizationLab-4.png deleted file mode 100644 index 0f92c7a..0000000 Binary files a/images/optimization_lab/FigOptimizationLab-4.png and /dev/null differ diff --git a/images/optimization_lab/FigOptimizationLab-5.png b/images/optimization_lab/FigOptimizationLab-5.png index 6013d7a..eba9386 100644 Binary files a/images/optimization_lab/FigOptimizationLab-5.png and b/images/optimization_lab/FigOptimizationLab-5.png differ diff --git a/images/optimization_lab/FigOptimizationLab-6.png b/images/optimization_lab/FigOptimizationLab-6.png index 5e4b3d1..313d9dd 100644 Binary files a/images/optimization_lab/FigOptimizationLab-6.png and b/images/optimization_lab/FigOptimizationLab-6.png differ diff --git a/images/optimization_lab/FigOptimizationLab-7.png b/images/optimization_lab/FigOptimizationLab-7.png index b82f8db..fb987de 100644 Binary files a/images/optimization_lab/FigOptimizationLab-7.png and b/images/optimization_lab/FigOptimizationLab-7.png differ diff --git a/images/optimization_lab/FigOptimizationLab-9.png b/images/optimization_lab/FigOptimizationLab-9.png deleted file mode 100644 index 7be6bee..0000000 Binary files a/images/optimization_lab/FigOptimizationLab-9.png and /dev/null differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-1.png b/images/rtlkernel_lab/FigRTLKernelLab-1.png deleted file mode 100644 index 1ca2e48..0000000 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-1.png and /dev/null differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-10.png b/images/rtlkernel_lab/FigRTLKernelLab-10.png index 5e97608..0584473 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-10.png and b/images/rtlkernel_lab/FigRTLKernelLab-10.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-11.png b/images/rtlkernel_lab/FigRTLKernelLab-11.png index 61a2e36..e6770dc 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-11.png and b/images/rtlkernel_lab/FigRTLKernelLab-11.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-17.png b/images/rtlkernel_lab/FigRTLKernelLab-17.png index 0e809c3..89b0118 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-17.png and b/images/rtlkernel_lab/FigRTLKernelLab-17.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-18.png b/images/rtlkernel_lab/FigRTLKernelLab-18.png index 22289ec..2529090 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-18.png and b/images/rtlkernel_lab/FigRTLKernelLab-18.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-19.png b/images/rtlkernel_lab/FigRTLKernelLab-19.png index 0ebfa2a..553b004 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-19.png and b/images/rtlkernel_lab/FigRTLKernelLab-19.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-2.png b/images/rtlkernel_lab/FigRTLKernelLab-2.png deleted file mode 100644 index f040007..0000000 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-2.png and /dev/null differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-20.png b/images/rtlkernel_lab/FigRTLKernelLab-20.png index e25a731..558a559 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-20.png and b/images/rtlkernel_lab/FigRTLKernelLab-20.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-21.png b/images/rtlkernel_lab/FigRTLKernelLab-21.png index 6ab40fa..65ba1dd 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-21.png and b/images/rtlkernel_lab/FigRTLKernelLab-21.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-22.png b/images/rtlkernel_lab/FigRTLKernelLab-22.png index cc98bf7..eb07c4c 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-22.png and b/images/rtlkernel_lab/FigRTLKernelLab-22.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-3.png b/images/rtlkernel_lab/FigRTLKernelLab-3.png deleted file mode 100644 index 3836405..0000000 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-3.png and /dev/null differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-4.png b/images/rtlkernel_lab/FigRTLKernelLab-4.png index 18961a7..db46204 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-4.png and b/images/rtlkernel_lab/FigRTLKernelLab-4.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-5.png b/images/rtlkernel_lab/FigRTLKernelLab-5.png index f3eb89e..4080dae 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-5.png and b/images/rtlkernel_lab/FigRTLKernelLab-5.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-6.png b/images/rtlkernel_lab/FigRTLKernelLab-6.png index 8da95b4..c179f3f 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-6.png and b/images/rtlkernel_lab/FigRTLKernelLab-6.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-7.png b/images/rtlkernel_lab/FigRTLKernelLab-7.png index f70355e..935c534 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-7.png and b/images/rtlkernel_lab/FigRTLKernelLab-7.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-8.png b/images/rtlkernel_lab/FigRTLKernelLab-8.png index 7d0594e..1f6f6c6 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-8.png and b/images/rtlkernel_lab/FigRTLKernelLab-8.png differ diff --git a/images/rtlkernel_lab/FigRTLKernelLab-9.png b/images/rtlkernel_lab/FigRTLKernelLab-9.png index ccee6b6..7fe59d7 100644 Binary files a/images/rtlkernel_lab/FigRTLKernelLab-9.png and b/images/rtlkernel_lab/FigRTLKernelLab-9.png differ diff --git a/images/workspace.png b/images/workspace.png new file mode 100644 index 0000000..8ec196b Binary files /dev/null and b/images/workspace.png differ diff --git a/rtl_kernel_wizard_lab.md b/rtl_kernel_wizard_lab.md index d0eb9b3..cbac3a2 100644 --- a/rtl_kernel_wizard_lab.md +++ b/rtl_kernel_wizard_lab.md @@ -12,295 +12,261 @@ After completing this lab, you will be able to: - Add the created IP in an application - Verify functionality in hardware on F1 -## Procedure - -This lab is separated into steps that consist of general overview statements that provide information on the detailed instructions that follow. Follow these detailed instructions to progress through the lab. - -This lab comprises three primary steps: You will create an SDAccel project. Use RTL Kernel project creation wizard, perform hardware emulation, and verify the functionality on F1. The Appendix section lists steps involved in building the full hardware. - -## Step 1: Create an SDAccel Project -### 1.1. Source the SDAccel settings and create a directory called rtl\_kernel under _~/aws-fpga_. Change the directory to the newly created directory. -**1.1.1.** Execute the following commands in a terminal window to source the required environment settings: - +## Steps +### Create an SDAccel Project +1. Execute the following commands in a terminal window to source the required environment settings: ``` cd ~/aws-fpga source sdaccel_setup.sh source $XILINX_SDX/settings64.sh ``` -**1.1.2.** Execute the following commands in a terminal window to create a working directory: - +1. Execute the following commands in a terminal window to create a working directory: ``` mkdir rtl_kernel cd rtl_kernel ``` - -### 1.2. Launch SDx, create a workspace and create a project, called _rtl\_kernel_, using the _Empty Application_ template. -**1.2.1.** Launch SDAccel by executing **sdx** in the terminal window - -An Eclipse launcher widow will appear asking to select a directory as workspace - -**1.2.2.** Click on the **Browse…** button, browse to **/home/centos/aws-fpga/rtl\_kernel** , click **OK** twice - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-1.png) -#### Figure 1. Selecting a workspace - -**1.2.3.** Click on the **Add Custom Platform** link on the _Welcome_ page - -**1.2.4.** Click on the **Add Custom Platform** button, browse to **/home/centos/aws-fpga/SDAccel/aws\_platfom/xilinx\_aws-vu9p-f1\_dynamic\_5\_0**, and click **OK** - -![alt tag](./images/FigPlatform.png) -#### Figure 2. Hardware platform selected - -**1.2.5.** Click **Apply** and then click **OK** - -**1.2.6.** Click on the **Create SDx Project** link on the _Welcome_ page - -**1.2.7.** In the _New Project_'s page enter **rtl\_kernel\_example** in the _Project name:_ field and click **Next** - -Note the AWS-VU9P-F1 board is displayed as the hardware platform - -**1.2.8.** Click **Next** - -**1.2.9.** Click **Next** with Linux on x86 as the System Configuration and OpenCL as the Runtime options - -**1.2.10.** Select **Empty Application** from the _Available Templates_ pane and click **Finish** - -![alt tag](./images/FigTemplate.png) -#### Figure 3. Selecting an application template - -## Step 2: Create RTL\_Kernel Project and Perform HW Emulation -### 2.1. Run the RTL Kernel wizard from the SDAccel project. -**2.1.1.** Make sure the **project.sdx** under _rtl\_kernel\_example_ in the **Project Explorer** tab is selected - -**2.1.2.** Select **Xilinx > Create RTL Kernel…** - +1. Launch SDAccel by executing **sdx** in the terminal window +An Eclipse launcher window will appear asking to select a directory as workspace +1. Click on the **Browse…** button, browse to **/home/centos/aws-fpga/rtl\_kernel**, click **OK** twice +

+ +

+

+ Selecting a workspace +

+ The Xilinx SDx IDE window will be displayed +

+ +

+

+ The SDx IDE window +

+1. Click on the **Add Custom Platform** link on the _Welcome_ page +1. Click on the **Add Custom Platform** button, browse to **/home/centos/aws-fpga/SDAccel/aws\_platfom/xilinx\_aws-vu9p-f1-04261818\_dynamic\_5\_0**, and click **OK** +

+ +

+

+ Hardware platform selected +

+1. Click **Apply** and then click **OK** +1. Click on the **Create SDx Project** link on the _Welcome_ page +1. Click **Next** +1. In the _New Project_'s page enter **rtl\_kernel\_example** in the _Project name:_ field and click **Next** +Note the aws-vu9p-f1-04261818 board is displayed as the hardware platform +1. Click **Next** +1. Click **Next** with Linux on x86 as the System Configuration and OpenCL as the Runtime options +1. Select **Empty Application** from the _Available Templates_ pane and click **Finish** +

+ +

+

+ Selecting an application template +

+ +### Create RTL\_Kernel Project using RTL Kernel Wizard +1. Make sure the **project.sdx** under _rtl\_kernel\_example_ in the **Project Explorer** tab is selected +1. Select **Xilinx > RTL Kernel Wizard…** Note that the Create RTL Kernel Wizard will be invoked displaying the Welcome screen - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-4.png) -#### Figure 4. Welcome screen of the RTL Kernel Wizard - -**2.1.3.** Click **Next** - -**2.1.4.** Change _Kernel_ name to **KVAdd** , (for Kernel Vector Addition), _Kernel vendor_ to **Xilinx** leaving the _Kernel library_ and _Number of clocks_ to the default values - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-5.png) -#### Figure 5. Setting general settings including name and number of clocks - -**2.1.5.** Click **Next** - -**2.1.6.** Click **Next** with _Number of scalar kernel input arguments_ default value being **1** and the _Argument type_ as **unit** - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-6.png) -#### Figure 6. Selecting number of scalar arguments - -**2.1.7.** We will have three arguments to the kernel (2 input and 1 output) which will be passed through Global Memory. Set _Number of AXI master interfaces_ to be **3** - -**2.1.8.** Keep the width of each AXI master data width to **64** (note this is specified in bytes so this will give a width of 512 bits), name **A** as the argument name to m00\_axi, **B** to m01\_axi, and **Res** to m02\_axi - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-7.png) -#### Figure 7. Selecting number of AXI master interfaces, their widths, and naming them - -**2.1.9.** Click **Next** and the summary page will be displayed showing a function prototype and register map - +

+ +

+

+ Welcome screen of the RTL Kernel Wizard +

+1. Click **Next** +1. Change _Kernel_ name to **KVAdd**, (for Kernel Vector Addition), _Kernel vendor_ to **Xilinx** leaving the _Kernel library_ and _Number of clocks_ to the default values +

+ +

+

+ Setting general settings including name and number of clocks +

+1. Click **Next** +1. Click **Next** with _Number of scalar kernel input arguments_ default value being **1** and the _Argument type_ as **unit** +

+ +

+

+ Selecting number of scalar arguments +

+1. We will have three arguments to the kernel (2 input and 1 output) which will be passed through Global Memory. Set _Number of AXI master interfaces_ to be **3** +1. Keep the width of each AXI master data width to **64** (note this is specified in bytes so this will give a width of 512 bits), name **A** as the argument name to m00\_axi, **B** to m01\_axi, and **Res** to m02\_axi +

+ +

+

+ Selecting number of AXI master interfaces, their widths, and naming them +

+1. Click **Next** and the summary page will be displayed showing a function prototype and register map Note the control register is accessed via S\_AXI\_CONTROL interface and is at offset 0 and the scalar operand is at offset 0x10. There are three master AXI interfaces being used. - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-8.png) -#### Figure 8. Summary of the design interface that will be created by the wizard - -**2.1.10.** Click **OK** to close the wizard - +

+ +

+

+ Summary of the design interface that will be created by the wizard +

+1. Click **OK** to close the wizard Notice that a Vivado Project will be created and opened - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-9.png) -#### Figure 9. Vivado project created by the wizard - -### 2.2. Analyze the design built by the RTL kernel wizard -**2.2.1.** Expand the hierarchy of the Design Sources in the Sources window and notice all the design sources, constraint file, and the basic testbench generated by the wizard - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-10.png) -#### Figure 10. Design hierarchy along with constraints and testbench files - -There is one module to handle the control signals ap\_start, ap\_done, and ap\_idle; and three master AXI channels to read source operands from and write the result back to DRAM. - -Expanded m02\_axi module shows read, write, fifo\_valid\_pipeline, and an adder modules. - -**2.2.2.** Select **Flow Navigator > RTL ANALYSIS > Open Elaborated Design** which will analyze the design and open a schematic view. Click **OK** - -**2.2.3.** You should see two top-level blocks: example and control as seen below - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-11.png) -#### Figure 11. Top-level modules - -**2.2.4.** Double-click on the example block and observe the three hierarchical Master AXI blocks - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-12.png) -#### Figure 12. Three master axi modules - -**2.2.5.** Zoom in into the top section and see the control logic the wizard has generated to provide ap\_start, ap\_idle, and ap\_done signals - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-13.png) -#### Figure 13. Control logic generation - -**2.2.6.** Traverse through one of the AXI interface blocks (m02) and observe that the example code it has generated consists of Read Master, Write Master, Read FIFO, Write FIFO, Read FIFO valid pipeline and an Adder - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-14.png) -#### Figure 14. A typical master axi hierarchical design generated by the wizard - -**2.2.7.** Close the elaborated view by selecting **File > Close Elaborated Design** - -**2.2.8.** Click **OK** - -**2.2.9.** Select **Flow > Generate RTL Kernel** - -**2.2.10.** Click **OK** using the default option (Sources-only kernel) - -The packager will be run, generating the xo file which will be used in the design. The Vivado will close after generating the design - -### 2.3. Analyze the created files and added to the SDAccel project after the RTL kernel has been generated. -**2.3.1.** Expand the _src_ folder under the **rtl\_kernel\_example** - +

+ +

+

+ Vivado project created by the wizard +

+ +### Analyze the design built by the RTL Kernel wizard +1. Expand the hierarchy of the Design Sources in the Sources window and notice all the design sources, constraint file, and the basic testbench generated by the wizard +

+ +

+

+ Design hierarchy along with constraints and testbench files +

+ There is one module to handle the control signals ap\_start, ap\_done, and ap\_idle; and three master AXI channels to read source operands from and write the result back to DDR. Expanded m02\_axi module shows read, write, fifo\_valid\_pipeline, and an adder modules. +1. Select **Flow Navigator > RTL ANALYSIS > Open Elaborated Design** which will analyze the design and open a schematic view. Click **OK** +1. You should see two top-level blocks: example and control as seen below +

+ +

+

+ Top-level modules +

+1. Double-click on the example block and observe the three hierarchical Master AXI blocks +

+ +

+

+ Three master axi modules +

+1. Zoom in into the top section and see the control logic the wizard has generated to provide ap\_start, ap\_idle, and ap\_done signals +

+ +

+

+ Control logic generation +

+1. Traverse through one of the AXI interface blocks (m02) and observe that the example code it has generated consists of Read Master, Write Master, Read FIFO, Write FIFO, Read FIFO valid pipeline and an Adder +

+ +

+

+ A typical master axi hierarchical design generated by the wizard +

+1. Close the elaborated view by selecting **File > Close Elaborated Design** +1. Click **OK** +1. Select **Flow > Generate RTL Kernel** +1. Click **OK** using the default option (Sources-only kernel) +The packager will be run, generating the xo file which will be used in the design. +1. Click **Yes** to close the Vivado + +### Analyze the created and added files to the SDAccel project after the RTL kernel has been generated +1. Expand the _src_ folder under the **rtl\_kernel\_example** Notice that _sdx\_rtl\_kernel\_wizard_ folder and its hierarchy has been added as the source, under which a Vivado project related folders/files are included (name starting with sdx\_rtl\_kernel…) and with wizard created KVAdd\_ex folder. - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-15.png) -#### Figure 15. The rtl kernel related files added to the src folder - -**2.3.2.** Double-click on the **main.c** and look at its content - +

+ +

+

+ The rtl kernel related files added to the src folder +

+1. Double-click on the **main.c** and look at its content The _main_ function is defined on line 60. The number of words it transfers is 4096. Lines 96 to 107 fills the source operands and expected result. Lines 200-244 deals with loading xclbin and creating kernel. Lines 248-266 creates the buffers in the device memory. Lines 319-333 sets arguments, executes the kernel, and waits for it to finish. Lines 346-385 reads the data and compares them. Lines 391-401 releases the memory, program, and kernel. -### 2.4. Add the binary container, select the Emulation-HW build configuration, and build the project. -**2.4.1.** Click on the **Add Binary Container** button (![alt tag](./images/Fig-binary_container.png)) - +### Add binary container and kernel, select the Emulation-HW build configuration, and build the project +1. Select **project.sdx** in the _Project Explorer_ tab to see the project settings page +1. Click on the **Add Binary Container** button (![alt tag](./images/Fig-binary_container.png)) Notice the _binary\_container\_1_ is added to the project. Since the design has RTL IP, the binary container does not have further hierarchy - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-16.png) -#### Figure 16. Adding binary container to the project - -**2.4.2.** Either select **Project > Build Configurations > Set Active > Emulation-HW** or click on the drop-down button of _Active build configuration_ and select **Emulation-HW** - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-17.png) -#### Figure 17. Selecting HW emulation build configuration - -**2.4.3.** Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button - +

+ +

+

+ Adding binary container to the project +

+1. Click on the **Add Hardware Function button** (![alt tag](./images/Fig-hw_button.png)) and select _KVAdd_ +1. Either select **Project > Build Configurations > Set Active > Emulation-HW** or click on the drop-down button of _Active build configuration_ and select **Emulation-HW** +

+ +

+

+ Selecting HW emulation build configuration +

+1. Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button This will build the project including rtl\_kernel\_example.exe file under the Emulation-HW directory - -**2.4.4.** Select **Run > Run Configurations…** to open the configurations window - -**2.4.5.** Click on the **Arguments** tab and notice that no binary container is assigned - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-18.png) -#### Figure 18. Unpopulated Arguments tab - -**2.4.6.** Click on the **Automatically add binary container(s) to arguments** check box, click **Apply** , and the click **Run** to run the application - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-19.png) -#### Figure 19. Program argument assigned - -**2.4.7.** The Console tab shows that the test was completed successfully along with the data transfer rate - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-20.png) -#### Figure 20. Hardware emulation run output - -**2.4.8.** Double-click on the **Application Timeline** entry in the _Reports_ tab, expand all entries in the timeline graph, zoom appropriately and observe the transactions - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-21.png) -#### Figure 21. Timeline graph showing various activities in various region of the system - -## Step 3: Run the Application on F1 -### 3.1. Since the System build and AFI availability takes considerable amount of time, a precompiled version is provided. Use the precompiled solution directory to verify the functionality. -**3.1.1.** Change to the solution directory by executing the following command - +1. Select **Run > Run Configurations…** to open the configurations window +1. Click on the **Arguments** tab and notice that no binary container is assigned +

+ +

+

+ Unpopulated Arguments tab +

+1. Click on the **Automatically add binary container(s) to arguments** check box +1. Click on the **Environment** tab, and change the _LD\_LIBRARY\_PATH_ variable setting to **/opt/xilinx/xrt/lib** and click **OK** +1. click **Apply**, and then click **Run** to run the application +

+ +

+

+ Program argument assigned +

+1. The Console tab shows that the test was completed successfully along with the data transfer rate +

+ +

+

+ Hardware emulation run output +

+1. In the **Assistant** tab, expand **Emulation-HW > rtl_kernel_example-Default**, and double-click on the **Application Timeline** entry, expand all entries in the timeline graph, zoom appropriately and observe the transactions +

+ +

+

+ Timeline graph showing various activities in various region of the system +

+ +### Run the Application on F1 +**Since the System build and AFI availability takes considerable amount of time, a pre-compiled version is provided. Use the precompiled solution directory to verify the functionality** + +1. Change to the solution directory by executing the following command ``` cd /home/centos/sources/rtl_kernel_solution - ``` -**3.1.2.** Run the following commands to load the AFI and execute the application to verify the functionality - + ``` +1. Run the following commands to load the AFI and execute the application to verify the functionality ``` sudo sh - source /opt/Xilinx/SDx/2017.4.rte.dyn/setup.sh + source /opt/xilinx/xrt/setup.sh ./rtl_kernel_example.exe xclbin/binary_container_1.awsxclbin ``` +1. The FPGA bitstream will be downloaded and the host application will be executed showing output something like: +

+ +

+

+ Execution output +

-**3.1.3.** The FPGA bitstream will be downloaded and the host application will be executed showing output something like: - -![alt tag](./images/rtlkernel_lab/FigRTLKernelLab-22.png) -#### Figure 22. Execution output - -**3.1.4.** Enter **exit** in the teminal window to exit out of sudo shell - -**3.1.5.** Close the SDx by selecting **File > Exit** +1. Enter **exit** in the terminal window to exit out of _sudo shell_ +1. Close the SDx by selecting **File > Exit** ## Conclusion In this lab, you used the RTL Kernel wizard to create a sample adder application. You saw that the wizard creates an RTL IP with the specified number of AXI master ports. You performed HW emulation and analyzed the application timeline. You finally ran the application on an AWS F1 instance and validated the functionality. -## Appendix: Build Full Hardware -### A.1. Set the build configuration to System and build the system (Note that since the building of the project takes over two hours skip this step in the workshop environment and move to next step). -**A.1.1.** Either select **Project > Build Configurations > Set Active > System** or click on the drop-down button of _Active build configuration_ and select **System** - -**A.1.2.** Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button - -This will build the project under the **System** directory. The built project will include rtl\_kernel\_example.exe file along with binary\_container\_1.xclbin file - -This step takes about two hours - -### A.2. Create an Amazon FPGA Image (AFI) - -To execute the application on F1, the following files are needed: +--------------------------------------- -- Host application (exe) -- FPGA binary (xclbin) -- Amazon FPGA Image (awsxclbin) +

+Start the next lab: 6. Debug Lab +

-The xclbin and the host applications are already generated by the System configuration step - -**A.2.1.** Create a **xclbin** directory under the _rtl\_kernel\_example_ directory using the following commands: - - ``` - cd /home/centos/aws-fpga/rtl_kernel - mkdir xclbin - ``` -**A.2.2.** Copy the generated **xclbin** file ( **binary\_container\_1.xclbin** ) and the host application (rtl\_kernel\_example.exe) from the **System** folder into the created **xclbin** directory, using the following commands - - ``` - cd xclbin - cp /home/centos/aws-fpga/rtl_kernel/rtl_kernel_example/System/binary_container_1.xclbin . - cp /home/centos/aws-fpga/rtl_kernel/rtl_kernel_example/System/rtl_kernel_example.exe . - ``` -### A.3. Create an AFI by running the create\_sdaccel\_afi.sh script and wait for the completion of the AFI creation process -**A.3.1.** Enter the following command to generate the AFI: +--------------------------------------- - ``` - $SDACCEL_DIR/tools/create_sdaccel_afi.sh –xclbin=binary_container_1.xclbin –s3_bucket= -s3_dcp_key= -s3_logs_key= - ``` -In the above command, <bucket-name>, <dcp-folder-name>, and <logs-folder-name> are the names you would have given when running CLI script. In the workshop environment this was already done. - -The create\_sdaccel\_afi.sh script does the following: - -- Starts a background process to create the AFI -- Generates a \_afi\_id.txt which contains the FPGA Image Identifier (or AFI ID) and Global FPGA Image Identifier (or AGFI ID) of the generated AFIs -- Creates the \*.awsxclbin AWS FPGA binary file which will need to be read by the host application to determine which AFI should be loaded in the FPGA. - -**A.3.2.** Enter the following command to note the values of the AFI IDs by opening the *\_afi\_id.txt file. - - ``` - cat *afi_id.txt - ``` -**A.3.3.** Enter the **describe-fpga-images** API command to check the status of the AFI generation process: - -**aws ec2 describe-fpga-images --fpga-image-ids <AFI ID>** - -Note: When AFI creation completes successfully, the output should contain: +## Appendix: Build Full Hardware +**Set the build configuration to System and build the system (Note that since the building of the project takes over two hours skip this step in the workshop environment).** - ``` - ... - "State": { - "Code": "available" - }, - - ... - ``` +1. Either select **Project > Build Configurations > Set Active > System** or click on the drop-down button of _Active build configuration_ and select **System** +1. Set the XOCC Kernel Linker flag as done before +1. Either select **Project > Build Project** or click on the (![alt tag](./images/Fig-build.png)) button +This will build the project under the **System** directory. The built project will include **rtl\_kernel\_example.exe** file along with **binary\_container\_1.xclbin** file. This step takes about two hours -**A.3.4.** Wait until the AFI becomes available before proceeding to execute on the F1 instance. +**Once the full system is built, you can create an AFI by following the steps listed here**