# **NetFPGA Summer Course**



Presented by: Noa Zilberman Yury Audzevich

Technion August 2 – August 6, 2015

http://NetFPGA.org



### **Section I: General Overview**



### **Constraints Methodology**

Design constraints define the requirements that must be met by the compilation flow in order for the design to be functional on the board

- Over-constraining and under-constraining is bad, so use reasonable constraints that correspond to your requirements
- Xilinx provides new Xilinx Design Constraint (XDC) file -- quite different from previously used User Constraints File (UCF)
- Single or multiple XDC files in a design might serve a different purpose



## **Xilinx Design Constraint file**

|                               | Project Manager - project_wave_gen               |  |  |  |  |  |  |
|-------------------------------|--------------------------------------------------|--|--|--|--|--|--|
|                               | Sources _ 🗆 🖻 😕                                  |  |  |  |  |  |  |
|                               | 🔍 🔀 🖨 📷 🛃 📓 📕                                    |  |  |  |  |  |  |
| Messages: () <u>1 warning</u> |                                                  |  |  |  |  |  |  |
|                               | 🖃 📅 Design Sources (2)                           |  |  |  |  |  |  |
|                               | ⊞ 🛅 Verilog Header (1)                           |  |  |  |  |  |  |
|                               | 🗄 🔞 📲 wave gen (wave gen.v) (14)                 |  |  |  |  |  |  |
|                               | E-Constraints (2)                                |  |  |  |  |  |  |
|                               | constrs 1 (active)                               |  |  |  |  |  |  |
|                               | wave_gen_timing.xdc (target)                     |  |  |  |  |  |  |
|                               | wave_gen_pins.xdc                                |  |  |  |  |  |  |
|                               | Constrs 2                                        |  |  |  |  |  |  |
|                               | wave gen.xdc                                     |  |  |  |  |  |  |
|                               |                                                  |  |  |  |  |  |  |
|                               |                                                  |  |  |  |  |  |  |
|                               | Hierarchy IP Sources   Libraries   Compile Order |  |  |  |  |  |  |
| 2                             | & Sources 🛛 🖓 Templates                          |  |  |  |  |  |  |

| 🛃 Apply 🛛 🙀        | Cancel                                           |
|--------------------|--------------------------------------------------|
| wave_gen_pins.xdc* |                                                  |
| Location:          | C:/2012.2/project_wave_gen/project_wave_gen.s    |
| Type:              | XDC 👻                                            |
| Size:              | 2.7 KB                                           |
| Modified:          | Tuesday 07/03/12 04:18:02 AM                     |
| Copied to:         | project_wave_gen.srcs/constrs_1/imports/verilog  |
| Copied from:       | C:/Xilinx/14.2_P28xc/14.2/ISE_DS/Avrum/verilog/v |
| Copied on:         | Monday 05/14/12 10:33:58 AM                      |
| Read-only:         | No                                               |
| Enabled            |                                                  |
| Used In            |                                                  |
| Synthesis          |                                                  |
| Synulesis          |                                                  |
| Implementation     |                                                  |
|                    |                                                  |
| •                  | J 4 III                                          |

#### XDC constraints are a combination of:

- Synopsys Design Constraints format (SDC)
- Xilinx centric extensions
- Tcl-compatible for advanced scripting

# XDC constraints have the following properties:

- follow the Tcl semantic,
- interpreted like any other Tcl command,
- read in and parsed sequentially.

#### You can use constraints for:

Synthesis and/or Implementation

#### Options are specified in file properties or via tcl :

set\_property used\_in\_synthesis false [get\_files
wave\_gen\_pins.xdc]

set\_property used\_in\_implementation true [get\_files
wave\_gen\_pins.xdc]

### **XDC File Order**

The constraint files are loaded in the same sequence as the way they are listed

To change order either drag and drop or reorder using: reorder\_files -fileset constrs\_1 -before [get\_files wave\_gen\_timing.xdc] \ [get\_files wave\_gen\_pins.xdc]

#### IPs:

If you use the native IPs, their XDC files are loaded after your files

You cannot change the IP XDC files order, but you can disable them and re-apply constraints in your XDC files

### **Common pitfalls**

### **Missing constraints:**

- The corresponding paths are not optimized for timing
- No violation will be reported but design may not work on HW

### **Incorrect constraints:**

- Runtime and optimization efforts will be spent on the wrong paths
- Reported timing violations may not result in any issues on HW

### **Unreasonable hold requirements:**

- May result in long runtime and SETUP violations
- P&R fixes HOLD violations as #1 priority, because:
  - Designs with HOLD violations won't work on HW
  - Designs with SETUP violations will work, but slower

# Key to creating XDC constraints

2)

#### Organize your constraints in the following sequence:

#### 1) ## Timing Assertions Section

- # Primary clocks
- # Virtual clocks
- # Generated clocks
- # Clock Groups
- # Input and output delay constraints
- 3) ## Physical Constraints Section

#### **VALIDATE constraints at each step:**

- Constrain it, check reports
- Validate timing



## Timing Exceptions Section

# Max Delay / Min Delay

# Multicycle Paths

# Case Analysis

# Disable Timing

# False Paths

### **Constraining the design**



Constraints include: **Timing constraints**, Pin assignments, Placement constraints (floorplanning), Properties and Attributes.

#### Syntax of commonly used XDC commands can be checked through:

- Help pages in tcl command line
- XDC Templates (accessed through UI)

#### Start with an Elaborated Design:

fix timing at early stages -- debug and optimize your RTL

## **Synthesis Constraints**

Vivado IDE synthesis engine transforms the **RTL description into technology mapped netlist** 

With synth design net delay modelling is not very accurate; synth netlist should either meet timing or fail by a small amount before starting implementation.

There are three categories of constraints for synthesis:

- RTL Attributes

   directives written in the RTL files (MARK\_DEBUG, etc.)
  - Timing Constraints (XDC) - the following have real impact on synthesis create\_clock create\_generated\_clock set\_input\_delay set\_output\_delay set\_clock\_groups set\_false\_path set\_max\_delay

۲

- set\_multicycle\_path
- Physical and Configuration Constraints
  - ignored by synthesis algorithms

NetFPGA Summer Course Technion, Haifa, IL 2015



### **Implementation Constraints**

# Synthesized netlist allows running timing analysis:

- Correct the timing constraints and save them to an implementation-only XDC file.
- Add missing constraints, such as asynchronous and exclusive clock groups.
- Add timing exceptions, such as multicycle paths and max delay constraints.
- Identify large violations due to long paths in the design and correct the RTL description.



### **Section II: Static Timing Analysis**



# Static Timing Analysis (STA)

A design netlist is an interconnected set of ports, cells and nets

- The functionality of a design is determined by RTL code (verilog, vhdl, etc.) and *verified by simulation tools*
- The *quality of your RTL* determines how easy timing will be met
- The performance of a design is determined by the delays of cells that compromise the design (STA)
- Static timing analysis doesn't check the functionality of the components but rather performance of components

### **STA Goals**

#### Many FPGA processes are timing driven:

- Synthesis for circuit construction
- Placer for optimal cells locations
- Router for choosing routing elements

Constraints are used to determine the desired performance goals

STA reports whether the design will provide the desired performance through reports

> Have you heard of Setup/Hold requirements for a single FF?



... not quite the same as Setup and Hold path delays that STA is using



### **Component delays**

#### Each component has delays to perform it function:

- LUT has propagation delay from it's ins to outs
- Net has delay from driver to receiver
- FF required stable data for a certain time around sampling point

Delays are also dependent of environment factors. These are determined and characterized by Xilinx during device design.

#### Timing is extracted over the operating range of the device:

- Process (different speed grades)
- Voltage (min  $\rightarrow$  max)
- Temperature (min  $\rightarrow$  max)

#### Range delays are extracted at various process corners (STA):

- Slow process corner: slow process, lowest voltage, highest temperature
- Fast process corner: fastest process, highest voltage, lowest temperature

# **Static Timing Path**

- A static timing path is a path that starts at a clock element
- Propagates through any # combinatorial elements and nets
- Ends at clocking element

Vivado's synthesis, place and route tool does STA of all paths both fast and slow corners

**Source clock delay** – starting top level clock port and ending at the launch FF

**Data path delay** – delay to the capturing FF

**Destination clock delay** – there might be a difference bw these two FFs



### **Setup check**



Checks that change in a clocked element has time to propagate to other clocked elements before the next clock event

Simple case – same domain & only data path is considered:  $T(D1\_CLK) + T(FF1_{(Clk->Q)}) + T(Comb) < T(CLK_{period}) - T(FF2_{(setup)}) - T(SU) + T(D2\_CLK)$ 

Summer Course Technion, Haifa, IL 2015

### Hold check



# Checks DATA isn't caught at destination FF at the same clock as the clock that launched it at launch FF

Simple case – same domain & only data path is considered:  $T(D1\_CLK) + T(FF1_{(Clk->Q)}) + T(Comb) > T(FF2_{(hold)}) + T(D2\_CLK) + T(HU)$ 

# Section III: Timing constraints in Vivado



### Method to create good constraints

### Create clocks and define clock interactions:

- 4 step rule

### **Setup Input and Output delays**

- Try not creating wrong HOLD violations

### Set timing exceptions

- Less is more let Vivado do magic for you
- Try not creating wrong HOLD violations

### Use report commands to validate each step

# **Clocks in the design**

CLKs are periodic signals with:

- 1) period time from rising edge to the next rising edge
- 2) Duty cycle high to low ratio of the clock
- 3) Jitter variation of period from nominal
- 4) Phase position of the rising edge

Clocks are created with create\_clock Tcl command:

- create\_clock –name <name> -period <period> <objects>
- <objects> are the list of pins, ports, or nets to which attach the clock,



Clocks with phase offsets and different duty cycles can be created using "waveform" option:

- waveform <edges> - list of numbers representing times of successive edges

create\_clock -name sys\_clk1 -period 5.0 -waveform {1.0 4.0} \
[get\_ports clk\_in1]
1.0 4.0 6.0

### **Clock rules**

> Clock only exist when you create them

#### > Clocks propagate automatically through clocking modules

- MMCM/PLL/BUFR clock clocks are automatically generated
- Transceiver clocks are not supported create them manually



> Use create\_generated\_clocks for internal clocks (if needed)

Note that timing analysis will be performed using originating primary clock

#### > ALL inter-clock path are evaluated by default

# 4 Steps for creating clocks

# BEWARE: In Vivado all clocks are related unless you specifically say that they are not!

#### > Step 1

- Use create\_clock for all primary clocks on top level ports
- Run the synthesis or open netlist design

#### Step 2

- Run report\_clocks
- Study the report to verify period, phase and propagation
- Apply corrections to your constraints if needed

| Attributes<br>P: Propagate<br>G: Generated |        |               |            |                                           |
|--------------------------------------------|--------|---------------|------------|-------------------------------------------|
| Clock                                      | Period | Waveform      | Attributes | Sources                                   |
| sys_clk                                    | 10.000 | {0.000 5.000} | Р          | {sys_clk}                                 |
| pll0/clkfbout                              | 10.000 | {0.000 5.000} | P,G        | <pre>{pll0/plle2_adv_inst/CLKFBOUT}</pre> |
| pll0/clkout0                               | 2.500  | {0.000 1.250} | P,G        | <pre>{pll0/plle2_adv_inst/CLKOUT0}</pre>  |
| pll0/clkout1                               | 10.000 | {0.000 5.000} | P,G        | <pre>{pll0/plle2_adv_inst/CLKOUT1}</pre>  |

#### Output of report\_clocks



# 4 Steps for creating clocks (cont.)

> Step 3

- Evaluate the clock interaction using <a href="mailto:report\_clock\_interaction">report\_clock\_interaction</a> BEWARE: All inter-clock paths are constrained by default!

- Unconstraint inter-clock paths (Clock Domain Crossing) as needed:
  - Make sure you designed proper CDC synchronizers
  - Use set\_clock\_groups (preferred method to set\_false\_path)
  - use report\_cdc command in Vivado 2015
- Do you have unconstrained objects?
  - Find out with check\_timing

### Step 4

- Run report\_clock\_networks
- You want the design to have clean clock lines without logic
  - Tip: Use clock gating option in synthesis to remove LUTs on the clock line

# **Constraining clock crossing domains**

#### Use appropriate synchronizing techniques

- 2 or more register synchronizers, for single bit
- Asynchronous FIFOs for buses



### Maximize Mean Time Between Failures (MTBF)

- Use ASYNC\_REG to place synchronizing flops in the same slice

set\_property ASYNC\_REG TRUE \

[get\_cells [list sync0\_reg sync1\_reg]]

#### Set the tool to ignore timing paths between individual clocks

set\_clock\_groups -asynchronous -group {clk1} -group {clk2}
This is equivalent to:

set\_false\_path -from [get\_clocks clk1] -to [get\_clocks clk2]
set\_false\_path -from [get\_clocks clk2] -to [get\_clocks clk1]

### **Asynchronous CDC**

### Ignoring timing paths between groups of clocks

#### create\_clock for the two primary clocks

create\_clock -name clk\_oxo -period 10 [get\_ports clk\_oxo] create\_clock -name clk\_core -period 10 [get\_ports clk\_core]



#### Set Asynchronous Clock Groups

set\_clock\_groups -asynchronous -group [get\_clocks –include\_generated\_clocks
clk\_oxo] \

-group [get\_clocks -include\_generated\_clocks clk\_core} ]



# **Setting Input/Output delay**

Constraints should be developed in the following order:

- 1) Baseline constraints Optimize Internal Paths first
- 2) Add I/O constraints Optimize entire chip
- 3) Add timing exceptions and Floorplan Fine-tuning step

#### set\_input\_delay (check options):

a) Data propagation from external chip to input package pin of FPGA device, and b) Relative reference board clock

set\_output\_delay (setup requirement of external source):

a) Data propagating from the output package pin of FPGA device through the board to another device and, b) relative ref. board clock

- Use set\_input\_delay and set\_output\_delay for realistic delays
- Wrong delay value (e.g. 0 ns) can cause wrong HOLD violations

### **Timing exceptions**

- > are needed when the logic behaves in a way that is not timed correctly by default:
- **set\_multicycle\_path** # clock cycles required to propagate data from the start to the end of a path.
- **set\_false\_path** logic path in the design that should not be analysed.
- **set\_max\_delay**, **set\_min\_delay** overrides the default setup and hold constraints with user specified max & min delays.
- **set\_case\_analysis** restricts certain signals being propagated through the design.

### **Timing report**



#### Report Summary

Contains info about design, device, tool version, data and time of report

#### Path summary

Summarizes timing information for the path: timing is met (Slack), source and destination, clock used, setup and hold check (requirements), number of level of logic, skew and uncertainty

# **Timing report (cont.)**

#### Source clock delay

Delays of clock network: edge of the SRC clock, through clock network, until clk pin of launch FF

#### Data path delay

Delay: clock pin of launch FF, plus combinational delay until D input of the capturing FF

#### The above 2 are accumulated for slack calculation

#### Destination Clock delay

Propagation from destination clk to the clk pin of destination clocked element

#### Slack calculation

Subtracts the arrival time (end of Data Path section) from the required time (end of Destination Clock section)

### **Timing command summary**

| etup                                     | Hold                         |          | Pulse Width                              |          |       |
|------------------------------------------|------------------------------|----------|------------------------------------------|----------|-------|
| Worst Negative Slack (WNS): 0.317 ns     | Worst Hold Slack (WHS):      | 0.023 ns | Worst Pulse Width Slack (WPWS):          | 0.000 ns |       |
| Total Negative Slack (TNS): 0.000 ns     | Total Hold Slack (THS):      | 0.000 ns | Total Pulse Width Negative Slack (TPWS): | 0.000 ns |       |
| Number of Failing Endpoints: 0           | Number of Failing Endpoints: | 0        | Number of Failing Endpoints:             | 0        |       |
| Total Number of Endpoints: 164580        | Total Number of Endpoints:   | 164580   | Total Number of Endpoints:               | 73283    |       |
| ll user specified timing constraints are | met.                         |          |                                          |          |       |
|                                          |                              |          |                                          |          | 4 ▷ 🗉 |

#### Create and validate clocks:

- check\_timing: for missing clocks and IO constraints
- report\_clocks: check frequency and phase
- report\_clock\_networks: possible clock root

#### > Validate clock groups:

– report\_clock\_interaction

#### Validate I/O delays

- report\_timing –from [input\_port] –setup/-hold
- report\_timing -to [output\_port] -setup/-hold

#### > Add exceptions if necessary

Validate using report\_timing

### **Section III: Integrated Logic Analyzer**



# **Debugging the design**

#### RTL-level design simulation

- Visibility of the entire design; ability to quickly iterate through debug cycle
- x Difficulty of simulating larger designs in a reasonable amount of time

#### Post-implemented design simulation

- Debugging the post-implemented timing-accurate model for the design
- x Long run-times and system model accuracy

#### In-system debugging

- ✓ **Debugging of post-implemented design on an FPGA device**
- ✓ **Debugging** actual system environment at system speeds
- x Lower visibility of debug signals
- x Longer design/implementation/debug iterations & hard close timing

### **Integrated Logic Analyzer**

I. Probing phase: Identifying what signals in your design you want to probe and how you want to probe them

> Identifying what signals or nets you want to probe Deciding how you want to add debug cores to your design

2. Implementation phase: Implementing the design that includes the additional debug IP that is attached to the probed nets

The debug core hub must be implemented prior to running the PL & RT.

3. Analysis phase: Interacting with the debug IP contained in the design to debug and verify functional issues

Connecting to the Hardware Target and Programming the FPGA Device

Setting up the ILA Core to Take a Measurement

Viewing ILA Cores in the Debug Probes Window

Using Basic Trigger Mode

Viewing ILA Probe Data in the Waveform Viewer

# **Inserting ILA cores**

- Either *Manually* add the debug IP component instances through the source code, or
- Allow Vivado tool to *automatically insert* the debug cores into your post-synthesis netlist

### The first approach is more straight forward:

Start with Identifying signals for debugging at the HDL source level prior to synthesis

(\* mark\_debug = "true" \*) wire [7:0] char\_fifo\_dout; -- Verilog example

Once design is synthesized use Set up Debug wizard for core assignment and configuration

### **Inserting ILA cores (cont.)**



# **Inserting Debug Cores**

Open synthesized design and Insert Debug cores from the list of Unassigned nets.

| cify Nets to Debug<br>Specify Nets for debugging |              |             |              |                  | Unassigned Debug Nets (0)                       |
|--------------------------------------------------|--------------|-------------|--------------|------------------|-------------------------------------------------|
| Name                                             | Clock Domain | Driver Cell | TRIG         | DATA             | Cell Properties                                 |
| ⊕-√r t counterA (32)                             | clk          | FDRE        |              | V                |                                                 |
|                                                  | dk           | FDRE        | v            | ý.               | 2 u_ila_0                                       |
| ±                                                | dk           | FDRE        | v            | V.               | C_DATA_DEPTH 1024                               |
| ± frœ counterD (32)                              | dk           | FDRE        | V            | V                | C_TRIGIN_EN 1024                                |
| 🛱 A_or_B                                         | dk           | LUT6        | V            | $\checkmark$     | C_TRIGOUT_EN 2048                               |
| ∫☆ B_or_C                                        | dk           | LUT6        | V            | $\checkmark$     | C ADV TRIGGER 4096                              |
| C_or_D                                           | dk           | LUT6        | $\checkmark$ | $\checkmark$     | C_INPUT_PIPE_STAGES 8192<br>16384               |
| ∫ஜ D_or_A                                        | dk           | LUT6        | V            | $\checkmark$     | C_EN_STRG_QUAL 32768                            |
|                                                  | dk           | STARTUPE2   | $\checkmark$ | $\checkmark$     | ALL_PROBE_SAME_MU 65536                         |
|                                                  |              |             |              |                  | ALL_PROBE_SAME_MU_CNT 131072                    |
| dd/Remove Nets                                   |              |             |              | Nets to debug: 1 | C_DATA_DEPTH<br>Number of data samples to store |

The Set up Debug wizard automatically selects clock domains

The properties of each core can be customized using GUI or manually

The appropriate code will be inserted automatically into XDC file

# **Inserting Debug Cores (cont.)**

#### > XDC Commands can be also used to Insert Debug Cores

create\_debug\_core u\_ila\_0 ila set\_property C\_DATA\_DEPTH 1024 [get\_debug\_cores u\_ila\_0] set\_property C\_TRIGIN\_EN false [get\_debug\_cores u\_ila\_0] set\_property C\_TRIGOUT\_EN false [get\_debug\_cores u\_ila\_0] set\_property C\_ADV\_TRIGGER false [get\_debug\_cores u\_ila\_0] ...

- Saving constraints may cause the synthesis and implementation to go out-of-date;
- you do not need to re-synthesize the design since the debug XDC constraints are only used during implementation
- Check Xil UG908 for advanced debugging capabilities and IBERT

# **Debugging Logic Designs in Hardware**

- 1. Connect to the hardware target and program the FPGA with the .bit file
- 2. Set up the ILA debug core trigger and capture controls.
- 3. Arm the ILA debug core trigger.
- 4. View the captured data from the ILA debug core in the Waveform window



### **Taking measurements**

- Add Probes to Waveform
- Add Probes to Basic Trigger Setup
- Add Probes to Basic Capture Setup
- Specify capture conditions
- > Arm the core and analyse received data





Summer Course Technion, Haifa, IL 2015

### **Section IX: Conclusion**



# Acknowledgments (I)

#### NetFPGA Team at University of Cambridge (Past and Present):

Andrew Moore, David Miller, Muhammad Shahbaz, Martin Zadnik Matthew Grosvenor, Yury Audzevich, Neelakandan Manihatty-Bojan, Georgina Kalogeridou, Jong Hun Han, Noa Zilberman, Gianni Antichi, Charalampos Rotsos, Marco Forconesi, Jinyun Zhang, Bjoern Zeeb

#### NetFPGA Team at Stanford University (Past and Present):

Nick McKeown, Glen Gibb, Jad Naous, David Erickson, G. Adam Covington, John W. Lockwood, Jianying Luo, Brandon Heller, Paul Hartke, Neda Beheshti, Sara Bolouki, James Zeng, Jonathan Ellithorpe, Sachidanandan Sambandan, Eric Lo

#### All Community members (including but not limited to):

Paul Rodman, Kumar Sanghvi, Wojciech A. Koszek, Yahsar Ganjali, Martin Labrecque, Jeff Shafer, Eric Keller, Tatsuya Yabe, Bilal Anwer, Yashar Ganjali, Martin Labrecque, Lisa Donatini, Sergio Lopez-Buedo

Kees Vissers, Michaela Blott, Shep Siegel, Cathal McCabe

### **Acknowledgements (II)**



<u>Disclaimer:</u> Any opinions, findings, conclusions, or recommendations expressed in these materials do not necessarily reflect the views of the National Science Foundation or of any other sponsors supporting this project.

This effort is also sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249. This material is approved for public release, distribution unlimited. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

**EFPGA** Summer Course Technion, Haifa, IL 2015