# **NetFPGA Summer Course**



Presented by:

Andrew W Moore, Noa Zilberman, Gianni Antichi Stephen Ibanez, Marcin Wojcik, Jong Hun Han, Salvator Galea, Murali Ramanujam, Jingyun Zhang, Yuta Tokusashi

> University of Cambridge July 24 – July 28, 2017

> > http://NetFPGA.org



Finally, after hours of work you managed to finalise your HDL code and make it working in simulation!!!!



Once the bitfile is created you need to:

- Check if your design meet the timing.
- Debug your HW code if regression tests are not passed.









Once the bitfile is created you need to:

- Check if your design meet the timing.
- Debug your HW code if regression tests are not passed.



YAY.



Natalie Dee Machine.com



A design netlist is an interconnected set of ports, cells and nets

 The functionality of a design is determined by RTL code (verilog, vhdl, etc.) and verified by simulation tools

- The functionality of a design is determined by RTL code (verilog, vhdl, etc.) and *verified by simulation tools*
- The *quality of your RTL* determines how easy timing will be met



- The functionality of a design is determined by RTL code (verilog, vhdl, etc.) and verified by simulation tools
- The *quality of your RTL* determines how easy timing will be met
- The performance of a design is determined by the delays of cells that compromise the design (STA)

- The functionality of a design is determined by RTL code (verilog, vhdl, etc.) and verified by simulation tools
- The *quality of your RTL* determines how easy timing will be met
- The performance of a design is determined by the delays of cells that compromise the design (STA)
- Static timing analysis doesn't check the functionality of the components but rather performance of components

### **STA Goals**

### Many FPGA processes are timing driven:

- Synthesis for circuit construction
- Placer for optimal cells locations
- Router for choosing routing elements



### **STA Goals**

### Many FPGA processes are timing driven:

- Synthesis for circuit construction
- Placer for optimal cells locations
- Router for choosing routing elements

Constraints are used to determine the desired performance goals

STA reports whether the design will provide the desired performance through reports





## **Component delays**

### Each component has delays to perform it function:

- LUT has propagation delay from it's ins to outs
- Net has delay from driver to receiver
- FF required stable data for a certain time around sampling point



## **Component delays**

Each component has delays to perform it function:

- LUT has propagation delay from it's ins to outs
- Net has delay from driver to receiver
- FF required stable data for a certain time around sampling point

Delays are also dependent of environment factors. These are determined and characterized by Xilinx during device design.

Timing is extracted over the operating range of the device:

- Process (different speed grades)
- Voltage (min  $\rightarrow$  max)
- Temperature (min  $\rightarrow$  max)

## **Static Timing Path**

- A static timing path is a path that starts at a clock element
- Propagates through any # combinatorial elements and nets
- Ends at clocking element

**Source clock delay** – starting top level clock port and ending at the launch FF

**Data path delay** – delay to the capturing FF

**Destination clock delay** – there might be a difference bw these two FFs





### Setup check

Setup Timing Check checks that data arrives in good time Setup, Hold Time



Checks that change in a clocked element has time to propagate to other clocked elements before the next clock event

Simple case – same domain & only data path is considered: T(D1\_CLK) + T(FF1<sub>(Clk->Q)</sub>) + T(Comb) < T (CLK<sub>period</sub>) – T(FF2<sub>(setup)</sub>) – T(SU) + T(D2\_CLK)

### Hold check

Hold time checks that data doesn't arrive too quickly Setup, Hold Time



Checks DATA isn't caught at destination FF at the same clock as the clock that launched it at source FF

Simple case – same domain & only data path is considered:  $T(D1\_CLK) + T(FF1_{(Clk->Q)}) + T(Comb) > T(FF2_{(hold)}) + T(D2\_CLK) + T(HU)$ 

Design constraints define the requirements that must be met by the compilation flow in order for the design to be functional on the board





Design constraints define the requirements that must be met by the compilation flow in order for the design to be functional on the board

 Over-constraining and under-constraining is bad, so use reasonable constraints that correspond to your requirements





Design constraints define the requirements that must be met by the compilation flow in order for the design to be functional on the board

- Over-constraining and under-constraining is bad, so use reasonable constraints that correspond to your requirements
- Xilinx provides new Xilinx Design Constraint (XDC) file -- quite different from previously used User Constraints File (UCF)





Design constraints define the requirements that must be met by the compilation flow in order for the design to be functional on the board

- Over-constraining and under-constraining is bad, so use reasonable constraints that correspond to your requirements
- Xilinx provides new Xilinx Design Constraint (XDC) file -- quite different from previously used User Constraints File (UCF)
- Single or multiple XDC files in a design might serve a different purpose





## **Xilinx Design Constraint file**

| Project Manager - project_wave_gen               |         |
|--------------------------------------------------|---------|
| Sources                                          | _ 🗆 🖻 × |
| 옥 🛣 🖨 🖬 🔂                                        |         |
| Messages: () <u>1 warning</u>                    |         |
| 🖃 😚 Design Sources (2)                           |         |
| 😟 🔚 Verilog Header (1)                           |         |
| 🗄 🐨 🐨 👬 wave_gen (wave_gen.v) (14)               |         |
| 🖻 🗁 Constraints (2)                              |         |
| 🖨 🚾 constrs_1 (active)                           |         |
| wave_gen_timing.xdc (target)                     |         |
| wave_gen_pins.xdc                                |         |
| 🖻 📠 constrs_2                                    |         |
| wave_gen.xdc                                     |         |
| ⊕ Simulation Sources (1)                         |         |
| Hierarchy IP Sources   Libraries   Compile Order |         |
| & Sources 🖓 Templates                            |         |

| 📝 Apply 🛛 🙀        | Cancel                                          |
|--------------------|-------------------------------------------------|
| wave_gen_pins.xdc* |                                                 |
| Location:<br>Type: | C:/2012.2/project_wave_gen/project_wave_gen.    |
| Size:              | 2.7 KB                                          |
| Modified:          | Tuesday 07/03/12 04:18:02 AM                    |
| Copied to:         | project_wave_gen.srcs/constrs_1/imports/verilog |
| Copied from:       | C:/Xilinx/14.2_P28xc/14.2/ISE_DS/Avrum/verilog/ |
| Copied on:         | Monday 05/14/12 10:33:58 AM                     |
| Read-only:         | No                                              |
| C Enabled          |                                                 |
| Used In            |                                                 |
| Synthesis          |                                                 |
| Implementation     |                                                 |
| •                  | III                                             |
| General Attributes |                                                 |

### **XDC** constraints are a combination of:

- Synopsys Design Constraints format (SDC)
- Xilinx centric extensions
- Tcl-compatible for advanced scripting

# XDC constraints have the following properties:

- follow the Tcl semantic,
- interpreted like any other Tcl command,
- read in and parsed sequentially.

### You can use constraints for:

Synthesis and/or Implementation

### Options are specified in file properties or via tcl :

set\_property used\_in\_synthesis false [get\_files
wave\_gen\_pins.xdc]

set\_property used\_in\_implementation true [get\_files
wave\_gen\_pins.xdc]

### **XDC File Order**

The constraint files are loaded in **the same sequence as the** way they are listed

### To change order either drag and drop or reorder using:

reorder\_files -fileset constrs\_1 -before [get\_files wave\_gen\_timing.xdc] \ [get\_files wave\_gen\_pins.xdc]

### IPs:

If you use the native IPs, their XDC files are loaded after your files

You cannot change the IP XDC files order, but you can disable them and re-apply constraints in your XDC files



## **Common pitfalls**

### **Missing constraints:**

- The corresponding paths are not optimized for timing
- No violation will be reported but design may not work on HW

### **Incorrect constraints:**

- Runtime and optimization efforts will be spent on the wrong paths
- Reported timing violations may not result in any issues on HW

### **Unreasonable hold requirements:**

- May result in long runtime and SETUP violations
- P&R fixes HOLD violations as #1 priority, because:
  - Designs with HOLD violations won't work on HW
  - Designs with SETUP violations will work, but slower

## **Timing report**



### Report Summary

Contains info about design, device, tool version, data and time of report

### > Path summary

Summarizes timing information for the path: timing is met (Slack), source and destination, clock used, setup and hold check (requirements), number of level of logic, skew and uncertainty

### **Timing command summary**

| ietup                          |            | Hold                         |          | Pulse Width                              |          |   |
|--------------------------------|------------|------------------------------|----------|------------------------------------------|----------|---|
| Worst Negative Slack (WNS):    | 0.317 ns   | Worst Hold Slack (WHS):      | 0.023 ns | Worst Pulse Width Slack (WPWS):          | 0.000 ns |   |
| Total Negative Slack (TNS):    | 0.000 ns   | Total Hold Slack (THS):      | 0.000 ns | Total Pulse Width Negative Slack (TPWS): | 0.000 ns |   |
| Number of Failing Endpoints:   | 0          | Number of Failing Endpoints: | 0        | Number of Failing Endpoints:             | 0        |   |
| Total Number of Endpoints:     | 164580     | Total Number of Endpoints:   | 164580   | Total Number of Endpoints:               | 73283    |   |
| ll user specified timing const | raints are | met.                         |          |                                          |          |   |
|                                |            |                              |          |                                          |          | 4 |

### Create and validate clocks:

- check\_timing: for missing clocks and IO constraints
- report\_clocks: check frequency and phase
- report\_clock\_networks: possible clock root

### Validate clock groups:

- report\_clock\_interaction

### Validate I/O delays

- report\_timing –from [input\_port] –setup/-hold
- report\_timing –to [output\_port] –setup/-hold

#### Add exceptions if necessary

Validate using report\_timing

Once the bitfile is created you need to:

- Check if your design meet the timing.
- Debug your HW code if regression tests are not passed.









# **Debugging the design**

### RTL-level design simulation

- Visibility of the entire design; ability to quickly iterate through debug cycle
- x Difficulty of simulating larger designs in a reasonable amount of time
- Post-implemented design simulation
  - Debugging the post-implemented timing-accurate model for the design
  - x Long run-times and system model accuracy
- In-system debugging
  - ✓ **Debugging of post-implemented design on an FPGA device**
  - ✓ **Debugging** actual system environment at system speeds
  - x Lower visibility of debug signals
  - x Longer design/implementation/debug iterations & hard close timing

## **Integrated Logic Analyzer**

I. Probing phase: Identifying what signals in your design you want to probe and how you want to probe them

> Identifying what signals or nets you want to probe Deciding how you want to add debug cores to your design

2. Implementation phase: Implementing the design that includes the additional debug IP that is attached to the probed nets

The debug core hub must be implemented prior to running the PL & RT.

3. Analysis phase: Interacting with the debug IP contained in the design to debug and verify functional issues

**Connecting to the Hardware Target and Programming the FPGA Device** 

Setting up the ILA Core to Take a Measurement

Viewing ILA Cores in the Debug Probes Window

**Using Basic Trigger Mode** 

Viewing ILA Probe Data in the Waveform Viewer

## **Inserting ILA cores**

- Either *Manually* add the debug IP component instances through the source code, or
- Allow Vivado tool to *automatically insert* the debug cores into your post-synthesis netlist

### The first approach is more straight forward:

- Start with Identifying signals for debugging at the HDL source level prior to synthesis
- (\* mark\_debug = "true" \*) wire [7:0] char\_fifo\_dout; -- Verilog example
- Once design is synthesized use Set up Debug wizard for core assignment and configuration

### **Inserting ILA cores (cont.)**



You can insert it from GUI as well:

- Synthesize your design first
- Open synthesized design
- Set up debug
- The core can be seen in the Netlist folder

## **Inserting Debug Cores**

Open synthesized design and Insert Debug cores from the list of Unassigned nets.

| cify Nets to Debug<br>pecify Nets for debugging |              |             |              |                    | Chassigned Debug Nets (0)                       |
|-------------------------------------------------|--------------|-------------|--------------|--------------------|-------------------------------------------------|
|                                                 |              |             |              |                    | Debug Cores Debug Nets                          |
| Name                                            | Clock Domain | Driver Cell | TRIG         | DATA               | Cell Properties                                 |
| ⊕ <b>∮r≋ counterA (</b> 32)                     | dk           | FDRE        | V            | ~                  |                                                 |
| ⊕ ∰¤ counterB (32)                              | dk           | FDRE        | V            | V                  | 🦉 u_ila_0                                       |
| 违                                               | dk           | FDRE        | $\checkmark$ | $\checkmark$       | C_DATA_DEPTH 1024                               |
| ⊕- <b>√∱≋ counterD</b> (32)                     | dk           | FDRE        | $\checkmark$ | $\checkmark$       | C_TRIGIN_EN 1024                                |
| ☆ A_or_B                                        | dk           | LUT6        | $\checkmark$ | $\checkmark$       | C TRIGOUT EN 2048                               |
| SorC                                            | dk           | LUT6        | $\checkmark$ | $\checkmark$       | C ADV TRIGGER 4096                              |
| ∽ C_or_D                                        | dk           | LUT6        | $\checkmark$ | $\checkmark$       | C_INPUT_PIPE_STAGES 16384                       |
| ∽_S D_or_A                                      | dk           | LUT6        | $\checkmark$ | $\checkmark$       | C_EN_STRG_QUAL 32768                            |
| CS                                              | clk          | STARTUPE2   | $\checkmark$ | $\checkmark$       | ALL_PROBE_SAME_MU 65536                         |
|                                                 |              |             |              |                    | ALL_PROBE_SAME_MU_CNT 131072                    |
| dd/Remove Nets                                  |              |             |              | Nets to debug: 133 | C_DATA_DEPTH<br>Number of data samples to store |

The Set up Debug wizard automatically selects clock domains

The properties of each **core can be customized** using GUI or manually

The appropriate code will be **inserted automatically** into XDC file

### **Inserting Debug Cores (cont.)**

> XDC Commands can be also used to Insert Debug Cores

create\_debug\_core u\_ila\_0 ila set\_property C\_DATA\_DEPTH 1024 [get\_debug\_cores u\_ila\_0] set\_property C\_TRIGIN\_EN false [get\_debug\_cores u\_ila\_0] set\_property C\_TRIGOUT\_EN false [get\_debug\_cores u\_ila\_0] set\_property C\_ADV\_TRIGGER false [get\_debug\_cores u\_ila\_0] ...

- Saving constraints may cause the synthesis and implementation to go out-of-date;
- you do not need to re-synthesize the design since the debug XDC constraints are only used during implementation
- Check Xil UG908 for advanced debugging capabilities and IBERT

# **Debugging Logic Designs in Hardware**

- 1. Connect to the hardware target and program the FPGA with the .bit file
- 2. Set up the ILA debug core trigger and capture controls.
- 3. Arm the ILA debug core trigger.
- 4. View the captured data from the ILA debug core in the **Waveform** window

| 😂 🕼 💵 🗈 🏗 🗙 🔈 🍖                               |                     | Default Layout 👻 💘 🔖 🔭                    | (F)                                         |                                                                                                                                                                                | bitstream Comple |
|-----------------------------------------------|---------------------|-------------------------------------------|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|
|                                               |                     |                                           | Q/                                          | write                                                                                                                                                                          |                  |
| Hardware Manager - localhost/xilinx_tcf/Digil |                     | .20.9.47)                                 |                                             |                                                                                                                                                                                |                  |
| Hardware                                      | - 🗆 🖻 ×             |                                           | 🔊 ILA - hw_ila_1 🗙 🔊 ILA - hw_ila_2 🗙 🔊 VIC | 0-hw_vio_1 ×                                                                                                                                                                   | 003              |
| 🌋 🖨 🛃 🕨 🕨 🔳                                   |                     | 🔍 🛣 🖨 🛃                                   | ILA Properties «                            | Trigger Capture Status                                                                                                                                                         | * □              |
| Name<br>3- 🚪 localhost (172.20.9.47) (2)      | Status<br>Connected |                                           | Trigger Mode Settings                       | Core status:         Idle         Waiting for Trigger         Post-Trigger         Full           Window 1 of 1         Window sample 0 of 1024         Total sample 0 of 1024 |                  |
| ☐ ■ / xilinx_tcf/Digilent/210203339395A (     |                     |                                           | Inggerniode. DASIC_ONCT                     | Capture status: Idle Idle Idle                                                                                                                                                 |                  |
| xc7k325t_0 (5) (active) XADC (System Monitor) | Programmed          |                                           | TRIG_OUT mode: DISABLED *                   |                                                                                                                                                                                |                  |
| 1 hw_vio_1 (VIO)                              | OK - Outputs Reset  |                                           | Capture Mode Settings                       |                                                                                                                                                                                |                  |
| hw_axi_1 (AXI)                                | Idle                | fast_cnt_reset_1                          |                                             | Basic Trigger Setup                                                                                                                                                            | <b>а</b> П       |
| hw_ila_2 (ILA)                                | ○ Idle              | ∿o fast_cnt_up_down_1<br>∿o ila1_trig_out | Capture mode: ALWAYS V                      | Name Compare Value                                                                                                                                                             |                  |
|                                               | Closed              | - ∿ ila2_trig_in                          | Number of windows: 1 [1 - 10                | X                                                                                                                                                                              |                  |
| A Core Properties                             | - D & X             | - idz_dig_il_dot_z                        | Window data depth: 1024 - [1 - 10           |                                                                                                                                                                                |                  |
|                                               |                     |                                           | Trigger position in window: 0 [0 - 10       | -0,                                                                                                                                                                            |                  |
| •                                             |                     | vio_feedback_2[31:0]                      |                                             |                                                                                                                                                                                |                  |
| riw_iia_1                                     |                     | vio_slice15_fb_2[14:0]                    | General Settings                            |                                                                                                                                                                                |                  |
| Name: hw_ila_1                                |                     | wio_slice1_fb_2 vio_slice1a_fb_2          | Refresh rate (ms): 500                      |                                                                                                                                                                                |                  |
| Device: 🔷 xc7k325t_0                          |                     | vio_slice200_fb_2[199:0]                  |                                             | Drag and drop ILA                                                                                                                                                              |                  |
| HW core: core_3                               |                     |                                           |                                             | probes from Debug<br>Probes window.                                                                                                                                            |                  |
| Capture sample count: 0 of 1024               |                     | Holaices_ib_2[4.0]                        |                                             |                                                                                                                                                                                |                  |
| Core status: Idle                             |                     | fast_cnt_count[31:0]                      |                                             |                                                                                                                                                                                |                  |
| To view editable ILA properties:              |                     | ∿o ila2_trig_in_ack                       |                                             |                                                                                                                                                                                |                  |
| Open ILA Dashboard                            |                     | 🕀 🍓 vio_feedback[31:0]                    |                                             |                                                                                                                                                                                |                  |
|                                               |                     | vio_slice15_fb[14:0]                      |                                             |                                                                                                                                                                                |                  |
|                                               |                     | vio_slice1_fb                             |                                             |                                                                                                                                                                                |                  |
|                                               |                     | In-text of dice200 fb[109-0]              |                                             |                                                                                                                                                                                |                  |
| General Properties                            |                     |                                           | < [                                         |                                                                                                                                                                                |                  |
| d Console                                     |                     |                                           |                                             |                                                                                                                                                                                | _ 🗆 🖻 ×          |
| set_property PROGRAM.FILE {                   | C:/projects/ks_cou  | unter2_k7/project_1/project_1.run         | s/impl_1/ks_counter2.bit} [lindex [get_]    | hw_devices] 0]                                                                                                                                                                 |                  |
| set_property PROBES.FILE {C                   |                     |                                           | /impl_1/debug_nets.ltx} [lindex [get_hw]    | _devices] 0]                                                                                                                                                                   |                  |
| current_hw_device [lindex [                   |                     |                                           |                                             |                                                                                                                                                                                |                  |
|                                               |                     | AG device index=0) has 1 JTAG AX          | I cores.                                    |                                                                                                                                                                                |                  |
| INFO: [Labtools 27-1432] De                   | vice xc7k325t (JI)  | AG device index = 0) is programme         | d with a design that has 2 ILA core(s) :    |                                                                                                                                                                                | _                |
| INFO: [Labtools 27-1828] De                   |                     |                                           | d with a design that has 1 VIO core(s) :    | in it.                                                                                                                                                                         | E                |
| <pre>refresh_hw_device: Time (s)</pre>        | : cpu = 00:00:04 ;  | elapsed = 00:01:17 . Memory (MB           | ): peak = 1226.832 ; gain = 8.648           |                                                                                                                                                                                |                  |

### **Taking measurements**

- Add Probes to Waveform
- > Add Probes to Basic Trigger Setup
- Add Probes to Basic Capture Setup
- Specify capture conditions
- > Arm the core and analyse received data





# Acknowledgments (I)

### NetFPGA Team at University of Cambridge (Past and Present):

Andrew Moore, David Miller, Muhammad Shahbaz, Martin Zadnik Matthew Grosvenor, Yury Audzevich, Neelakandan Manihatty-Bojan, Georgina Kalogeridou, Jong Hun Han, Noa Zilberman, Gianni Antichi, Charalampos Rotsos, Marco Forconesi, Jinyun Zhang, Bjoern Zeeb

### NetFPGA Team at Stanford University (Past and Present):

Nick McKeown, Glen Gibb, Jad Naous, David Erickson,

G. Adam Covington, John W. Lockwood, Jianying Luo, Brandon Heller, Paul Hartke, Neda Beheshti, Sara Bolouki, James Zeng, Jonathan Ellithorpe, Sachidanandan Sambandan, Eric Lo

### All Community members (including but not limited to):

Paul Rodman, Kumar Sanghvi, Wojciech A. Koszek, Yahsar Ganjali, Martin Labrecque, Jeff Shafer, Eric Keller, Tatsuya Yabe, Bilal Anwer, Yashar Ganjali, Martin Labrecque, Lisa Donatini, Sergio Lopez-Buedo

Kees Vissers, Michaela Blott, Shep Siegel, Cathal McCabe

### **Acknowledgements (II)**



<u>Disclaimer:</u> Any opinions, findings, conclusions, or recommendations expressed in these materials do not necessarily reflect the views of the National Science Foundation or of any other sponsors supporting this project.

This effort is also sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249. This material is approved for public release, distribution unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

Summer Course Technion, Haifa, IL 2015