#### Intel's Hyperscale-Ready Infrastructure Processing Unit (IPU)

Brad Burres, Intel Fellow

Co-Authors: Dan Daly, Mark Debbage, Eliel Louzoun, Christine Severns-Williams, Naru Sundar,

Nadav Turbovich, Barry Wolford, Yadong Li

#### Major Advantages of IPUs



Separation of Infrastructure & Tenant

Guest can fully control the CPU with their SW, while CSP maintains control of the infrastructure and Root of Trust



#### Infrastructure Offload

Accelerators help process these task efficiently. Minimize latency and jitter and maximize revenue from CPU



#### Diskless Server Architecture

Simplifies data center architecture while adding flexibility for the CSP

### Infrastructure Workloads Migrating to IPU



Hot Chips **2021** 

#### Intel's 200G IPU



|                     | Co-designed with a top cloud provider                    |  |  |  |  |  |  |  |
|---------------------|----------------------------------------------------------|--|--|--|--|--|--|--|
| Hyperscale<br>Ready | Integrated learnings from multiple gen. of FPGA sNIC/IPL |  |  |  |  |  |  |  |
|                     | High performance under real world load                   |  |  |  |  |  |  |  |
|                     | Security and isolation from the ground up                |  |  |  |  |  |  |  |

Best-in-Class Programmable Packet Processing EngineTechnologyNVMe storage interface scaled up from Intel Optane TechInnovationNext Generation Reliable TransportAdvanced crypto and compression accel.

Software SW/HW/Accel co-design P4 Studio based on Barefoot Leverage and extend DPDK and SPDK Enable broad adoption of IPUs

Architectural Breakdown

#### Network Subsystem –



#### Compute Complex

#### **Network subsystem**

Support for up to 4 host Xeons with 200Gb/s full duplex

High-performance RDMA running with ROCEv2 & Reliable Transport Protocol

NVMe device interface with inline AES-XTS and VM QoS for efficient software backend

Programmable packet pipeline with QoS and telemetry capabilities supporting 200Mpps

Advanced transmit scheduling capabilities

Inline IPSec for high scale connection at wire speed



#### **Compute Complex**



# Mt. Evans - Packet Processing



#### Leadership P4 programmable pipeline

- Support complete vSwitch + beyond fully in hardware
- Pipeline composition via recirculation and chained operations without sacrificing performance
- Programmable Parser, Exact Match, Wildcard Match, Range Match, LPM, Meters, Statistics, Modifier

#### **Packet Processing at scale**

- @scale classification for > 10M entries backed by DDR
- Support pipeline driven operations like flow auto-add and aging

#### **Tightly coupled with the Compute Complex**

- Large L1 caches, optionally backed in compute cache, designed to meet hyperscale performance challenges
- Multi-TB cross-sectional BW between the network subsystem and the compute complex
- Broad metadata capabilities, including handoff to software

#### Scale out Storage Architecture



| 2                                                                                                                         |                                |            |            | and the    | _        |           |         |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------|--------------------------------|------------|------------|------------|----------|-----------|---------|--|--|--|
| PC                                                                                                                        |                                |            |            |            |          |           |         |  |  |  |
| VF   VF   ···   VF   VF     00   01   ···   00   01                                                                       | VF VF 01                       | VF<br>00   | VF         |            |          |           |         |  |  |  |
| PF PF                                                                                                                     | PF                             | PF PF      |            |            |          |           |         |  |  |  |
| PCle Gen 4                                                                                                                | PCle Gen 4 x16 (SR-IOV, S-IOV) |            |            |            |          |           |         |  |  |  |
|                                                                                                                           |                                |            |            | =          | • Arm Ne | overse N  | 1 Cores |  |  |  |
| RDMA                                                                                                                      | NVMe                           | LA         | AN .       |            | Syste    | m Level ( | Cache   |  |  |  |
| Packet Pro                                                                                                                | III<br>I                       | LP<br>DDR4 | LP<br>DDR4 | LP<br>DDR4 |          |           |         |  |  |  |
| Inline Crypto                                                                                                             | Traf                           | fic Sha    | aper       |            |          | &         |         |  |  |  |
| 200G E                                                                                                                    |                                | · · · · ·  |            |            |          |           |         |  |  |  |
| RDMA NVMe LAN Arm Neoverse NI Cores   Packet Processing Pipeline LP LP LP   Inline Crypto Traffic Shaper Lookaside Crypto |                                |            |            |            |          |           |         |  |  |  |
|                                                                                                                           |                                |            |            |            |          |           |         |  |  |  |



|              | PCIe SerDes                |          |          |        |          |            |       |                    |          |                                             |            |            |            |
|--------------|----------------------------|----------|----------|--------|----------|------------|-------|--------------------|----------|---------------------------------------------|------------|------------|------------|
| VF V<br>00 0 | /F                         | VF<br>00 | VF<br>01 |        | VF<br>00 |            |       | VF<br>00           | VF<br>01 | 1                                           |            |            |            |
| P            | PF                         | F PF     |          |        | PF PF    |            |       |                    |          |                                             |            |            |            |
|              | - PCle                     | e Ge     | en 4     | x16    | (SR      | -10\       | ′, S- | 107                | ) —      |                                             | • Arm N    | leoverse N | Il Cores – |
| R            | RDMA NV                    |          |          | Me LAN |          |            |       | System Level Cache |          |                                             |            |            |            |
|              | Packet Processing Pipeline |          |          |        |          |            |       |                    |          | LP<br>DDR4                                  | LP<br>DDR4 | LP<br>DDR4 |            |
| lr           | Inline Crypto Traf         |          |          |        |          | fic Shaper |       |                    |          | Lookaside Crypto<br>&<br>Compression Engine |            |            |            |
|              | 200G Ethernet MAC          |          |          |        |          |            |       |                    |          |                                             |            |            |            |
|              | 56G Ethernet Serdes        |          |          |        |          |            |       |                    | Mana     | gement C                                    | omplex     |            |            |

System Security





Isolation and Recovery





Hot Chips 2021









# Thank you!





Hot Chips 2021