How to understand why LUT resources are so extreme in HLS estimations? #1106
Replies: 1 comment 3 replies
-
There are several techniques to reduce the area footprint of your model. First, if you can always quantise the model. At the very least, do post-training quantisation - most applications don't require 32-bit fixed point; usually 16-bit has no impact on accuracy. But, ideally if you can, do quantisation-aware training as shown here: https://github.com/fastmachinelearning/hls4ml-tutorial/blob/main/part4_quantization.ipynb Secondly, consider using io_stream instead of io_parallel for CNNs. It's scaled better for more computationally heavy models. See tutorial on CNNs here: https://github.com/fastmachinelearning/hls4ml-tutorial/blob/main/part6_cnns.ipynb |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am trying to implement the following network using HLS4ML flow with Vivado 2020.1 HLS & Tensorflow:
I am not sure if it is a surprise or not, so I am also doing my research on the internet to find some estimations about hardware utilization to synthesize such a network on ZCU102@200 MHz.
I am applying a fixed-point 32-bit for all layers, weights, results, and biases.
Resource
strategy for all layers and the network as well (by the why I'm not sure what the difference is between specifying the model strategy asResource
and manually setting it asResource
for each layer, I am doing both of them to save as much resources as I can). Moreover, I am setting a highReuseFactor
equal to 4096 which is modified by the HLS tool to different numbers based on the supported reuse factor of each operation/layer.HLS flow is taking a long time, ~4 hours and it was getting stuck at the following warning for softmax activation saying:
INFO: [HLS 200-42] -- Implementing module 'init_exp_table_ap_fixed_32_16_4_0_0_softmax_config10_s' INFO: [HLS 200-10] ---------------------------------------------------------------- INFO: [SCHED 204-11] Starting scheduling ... INFO: [SCHED 204-61] Pipelining function 'init_exp_table<ap_fixed<32, 16, 4, 0, 0>, softmax_config10>'. WARNING: [SCHED 204-69] Unable to schedule 'store' operation ('table_out_1_V_addr_18_write_ln160', firmware/nnet_utils/nnet_activation.h:160) of variable 'select_ln340_1090', firmware/nnet_utils/nnet_activation.h:159 on array 'table_out_1_V' due to limited memory ports. Please consider using a memory core with more ports or partitioning the array 'table_out_1_V'.
I have used complete array partitioning pragma for the mentioned array in the HLS code, which resulted in:
It looks to me that this array partitioning caused many resources for only this function. Which would fit on only a much larger FPGA. I would like to hear your thoughts about my progress, what I can try else, and if it is normal for this type of network. I have also tried cyclic partitioning for the same array but it also did not pass the HLS scheduling phase.
Are there more straightforward ways for such design including directly from HLS4ML or in HLS that I can use to save up many LUTs? I think quantization may reduce the resources but it will still probably show a large shift in LUT resources of the final layer compared to other layers in the design.
I am attaching the current HLS code for Softmax and its estimated area and timing results.
softmax_hls_and_results.zip
Beta Was this translation helpful? Give feedback.
All reactions