

NEXT
GENERATION
"ZEN 3" CORE

MARK EVERS LESLIE BARNES MIKE CLARK



## CAUTIONARY STATEMENT

This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to, the features, functionality, availability, timing, expectations and expected benefits of AMD future products, including Ryzen™ 5000 Series CPUs and Socket AM4, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD's Securities and Exchange Commission filings, including but not limited to AMD's Quarterly Report on Form 10-Q from the quarter ending on June 26, 2021.



## **OUR "ZEN" JOURNEY**

2017







2021

## "ZEN" / "ZEN+"

- Up to 4.35GHz max boost<sup>4</sup>
- +52% IPC<sup>1</sup>
- 4-core complex
- Up to 8MB L3 per complex
- SMT enabled
- New boost algorithms
- 14nm/12nm

### "ZEN 2"

- Up to 4.7GHz max boost
- +15% IPC<sup>2</sup>
- 4-core complex
- Up to 16MB L3 per complex
- Chiplet design
- FP-256
- 7nm

## "ZEN 3"

- Up to 4.9GHz max boost
- +19% IPC<sup>3</sup>
- New 8-core complex
- Up to 32MB L3 per complex
- AMD 3D V-Cache support
- Doubled INT8 throughput
- 7nm



## "ZEN 3" OBJECTIVES

#### **PERFORMANCE**

- Deliver another landmark increase in 1T performance through IPC and frequency
- Unify cores and cache in a contiguous 8-core complex to improve effective latency
- Provide scale-out performance for servers, datacenters and super-computers

#### **NEW CAPABILITIES**

- Introduce new ISA extensions
- Expanded security features
- Support for AMD 3D V-Cache integration

#### **PLATFORM**

- Support for scaling and energy efficiency
- Socket compatibility





## "ZEN 3" OVERVIEW

2 THREADS PER CORE (SMT)
STATE-OF-THE-ART BRANCH PREDICTOR

#### CACHES

- I-cache 32k, 8-way
- Op-cache, 4K instructions
- D-cache 32k, 8-way
- L2 cache 512k, 8-way

#### DECODE

- 4 instructions / cycle decode or 8 ops from Op-cache
- 6 ops / cycle dispatched to Integer or FP

#### **EXECUTION CAPABILITIES**

- 4 integer units
- Dedicated branch and store data units
- 3 address generations per cycle
- 2 256-bit FP multiply accumulate / cycle

#### 3 MEMORY OPS PER CYCLE

#### **TLBs**

- L1 64 entries I & D, all page sizes
- L2 512 I, 2K D, everything but 1G





## DOUBLE DIGIT IPC GAIN - AGAIN

"ZEN 3" 19% IPC UPLIFT FOR PCs\*





## FETCH / DECODE

#### REDUCED LATENCIES

- Lower mispredict penalty
- No "bubble" on most taken branch predictions

#### IMPROVED BRANCH PREDICTION

- TAGE branch predictor
- Redistributed BTBs for better prediction latency
  - L1 BTB, 1024 entries
  - L2 BTB, 6.5K entries
- Larger 1.5K indirect target array (ITA)

#### OPTIMIZED 32KB, 8-WAY L1I CACHE

- Improved prefetching
- Improved utilization

#### STREAMLINED OP-CACHE

- Faster sequencing of Op-cache fetches
- Finer-grained switching of Op-cache / I-cache pipes

FASTER FETCH, ESPECIALLY FOR BRANCHY AND LARGE FOOTPRINT CODE





## INT EXECUTION

- New distributed scheduler organization
- Lower latencies for some instructions
- Larger out-of-order window
- 10 issue per cycle, up from 7

| RESOURCE              | "ZEN 2" | "ZEN 3" |
|-----------------------|---------|---------|
| Integer issue width   | 7       | 10      |
| Integer register file | 180     | 192     |
| Integer scheduler     | 92      | 96      |
| ROB                   | 224     | 256     |

LOWER LATENCIES AND LARGER STRUCTURES TO EXTRACT ILP FOR FEEDING THE EXECUTION **ENGINES** 





# WIDER INTEGER EXECUTION

#### PICK BANDWIDTH IS INCREASED

- Still four "ALU" and three "AGU" execution units
  - But adds branch and store data capabilities
  - Up to 10 integer ops picked per cycle
- No increase in register file write ports or bypass network inputs
- Shared ALU/AGU schedulers allow for balanced use across workloads

DELIVERING WIDER EXECUTION RESOURCES
IN A POWER- AND AREA-EFFICIENT MANNER

# Integer Rename Sched Sc





## FP EXECUTION

- Increased Dispatch Bandwidth (6-wide)
- Larger Scheduler
- Separate F2I/Store Units
- Faster 4-cycle FMAC
- Doubled INT8 throughput

| RESOURCE                     | "ZEN 2" | "ZEN 3" |
|------------------------------|---------|---------|
| FP issue width               | 4       | 6       |
| FADD / FMUL / FMA<br>latency | 3/3/5   | 3/3/4   |
| FP scheduler                 | 36      | 64      |

LOWER LATENCIES AND LARGER STRUCTURES
TO EXTRACT ILP FOR FEEDING THE
EXECUTION ENGINES





## LOAD/STORE

LARGER STORE QUEUE (64, UP FROM 48)

#### PREFETCH IMPROVEMENTS

- More consistent prefetch on page crossing
- Better L1/L2 cache prefetch coordination
- MSR control of prefetch enablement (server)

#### MORE LOADS / STORES PER CYCLE

- 3 loads per cycle (max 2 if 256b)
- 2 stores per cycle (max 1 if 256b)
- Max 3 total memory ops

#### 2K ENTRY L2 DTLB

6 page table walkers for misses

32KB, 8-WAY L1 DATA CACHE

**FASTER COPY OF SHORT STRINGS** 

BETTER PREDICTION OF STORE-TO-LOAD DEPENDENCIES

LARGER STRUCTURES AND BETTER PREFETCHING TO EXTRACT ILP FOR FEEDING WIDER EXECUTION





## MAJOR CHANGES VS. "ZEN 2"



#### FRONT-END ENHANCEMENTS

2X Larger L1 BTB (1024) Improved branch predictor bandwidth "No-bubble" branch prediction Faster recovery from mispredict Faster sequencing of Op-cache fetches Quicker switching of Op-cache pipes

#### **EXECUTION**

Int: Dedicated Branch / St-data pickers Int: Larger windows (+32) FP/Int: Reduced latency for select ops FP: 6-wide dispatch and issue (+2) FP: Faster FMAC (-1 cycle)

#### LOAD / STORE

Higher load bandwidth (+1) Higher store bandwidth (+1) More flexibility in load/store ops Improved memory dependence detection TLB: 6 table walkers (+4)





## "ZEN 3" IPC UPLIFT

GEOMEAN: +19% VS. "ZEN 2"







# AMD INFINITY GUARD NEW LAYERS OF SECURITY FOR TENANTS IN THE CLOUD



SEV

SECURE ENCRYPTED VIRTUALIZATION

Encrypt Each VM with Unique Keys

**SEV-ES** 

**ENCRYPTED STATE** 

VM Integrity with Protected CPU Registers

**SEV-SNP** 

**SECURE NESTED PAGING** 

Hardware Protection Against Malicious Hypervisors



## ISA ENHANCEMENTS

| FEATURE                             | NOTES                                                                                                                                                                                                                                               | CLIENT<br>SoCs | SERVER<br>SoCs |
|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|----------------|
| 256-bit<br>VAES/VPCLMULQDQ          | 256-bit instruction extensions for accelerating encryption / decryption algorithms                                                                                                                                                                  | <b>~</b>       | <b>~</b>       |
| Memory Protection<br>Keys for Users | Application control for access-disable and write disable settings w/o TLB management                                                                                                                                                                | <b>~</b>       | <b>✓</b>       |
| CET Shadow Stack                    | Helps protect against ROP (Return Oriented Programming) attacks by mirroring return addresses on a shadow stack, requires OS and/or hypervisor enablement                                                                                           | <b>~</b>       | <b>~</b>       |
| SEV-ES<br>Enhancements              | Interrupt injection restrictions: Limit types of interrupts/exceptions that a (malicious) hypervisor may inject into an SEV-ES guest Debug registers added to swapped state                                                                         |                | <b>~</b>       |
| Secure Nested<br>Paging             | Builds on confidentiality established by encryption of VM memory and VM registers in SEV/SEV-ES to add integrity protection features to help protect against malicious hypervisors including protection against replay/corruption/remapping attacks |                | <b>~</b>       |
| INVLPGB                             | New instruction, use instead of inter-core interrupts to broadcast page invalidates, requires OS and/or hypervisor enablement                                                                                                                       |                | <b>~</b>       |
| Process Context ID (PCID)           | Process tags in TLB to reduce flush requirements                                                                                                                                                                                                    |                | <b>✓</b>       |



# BEYOND THE CORE

AMD RYZEN™ 5000 SERIES SOC ARCHITECTURE CHANGES





2X L3 Cache Directly Accessible Per Core

Accelerates Core and Cache Communication for Gaming

Reduction in Effective Memory Latency







New Bi-Directional Ring Bus

High Bandwidth Low Latency

32 Bytes **Each Direction** 



## "ZEN 3" CACHE HIERARCHY

### (8-CORE CCD)

- Fast private 512K L2 cache
- High bandwidth interfaces at all levels
- L3 is filled from L2 victims (i.e., mostly exclusive)
- L2 tags duplicated in L3 for probe filtering and fast cache transfer
- 64 outstanding misses supported from L2 to L3 per core
- 192 outstanding misses supported from L3 to memory
- L3 shared among all 8 cores in the complex
- Support for AMD 3D V-Cache





# AMD 3D V-CACHE 192M L3 PROTOTYPE

- Zen 3 base CCD design includes 32M L3 cache
- Increased to 96M per CCD with 64M AMD 3D V-Cache
- Enabled by Through Silicon Vias on CCD
- Direct copper-to-copper bond



## 15% FASTER GAMING ON AVERAGE\*







## "ZEN 3" PRODUCTS



AMD RYZEN™ 5000 SERIES **MOBILE PROCESSORS** 

Unprecedented performance and battery life with "Zen 3" core architecture<sup>3</sup>



AMD RYZEN™ PRO 5000 **SERIES MOBILE PROCESSORS** 

Multi-layered security features help provide protection at every level, from silicon to OS



AMD RYZEN™ 5000 SERIES **DESKTOP PROCESSORS** 

Up to 26% gaming performance generational uplift<sup>1</sup>



3<sup>RD</sup> GEN AMD EPYC™ **PROCESSORS** 

World record performance<sup>2</sup> and advanced security features with AMD Infinity Guard



#### AMD EPYC

## THE BEST GETS BETTER

## 200+ WORLD RECORDS AND COUNTING

**DATABASES & ANALYTICS** 

30

Relational

31

**Big Data** 

HCI/SDI/CLOUD

13

Cloud and Virtualization

16

**Integer Performance** 

**ENTERPRISE** 

6

**ERP Business Apps** 

46

Java® Based Performance 26

Energy Efficiency **HPC** 

59

**High Performance Computing Apps** 

15

Floating Point Performance

12

**Floating Point Energy Efficiency** 

## LEADERSHIP POWER EFFICIENCY

## "ZEN 3" STRENGTHENS OUR LEAD





## MAJOR GAMING UPLIFTS WITH "ZEN 3"1

1920x1080 Resolution / High Image Quality Preset



19% IPC Uplift<sup>2</sup>

2X Direct Access L3 Cache Per Core

**Higher Frequencies** Across the Stack

**Unified 8-Core** Complex

Average ~26% Gaming Improvement at 1080p<sup>1</sup>



## "ZEN 3"

## WE ACCOMPLISHED OUR DESIGN GOALS FOR LEADERSHIP GAMING AND SERVER PERFORMANCE<sup>1,5</sup>

#### HISTORIC IPC UPLIFT

+19% improvement in instructions per cycle versus the "Zen 2" architecture in client workloads like PC gaming<sup>2</sup>

#### LOWER EFFECTIVE LATENCY

Unified 8-core complex with 32MB direct L3 cache accelerates core and cache communication

#### AMD 3D V-CACHE

Industry-first prototype demo of Cu-to-Cu die stacked memory

#### HIGHER FREQUENCIES

AMD design methodologies enable higher frequencies across the Ryzen™ 5000 Series desktop processor family

#### LEADERSHIP EFFICIENCY

7nm "Zen 3" processors up to 2.8X more efficient than competing solutions for PC enthusiasts<sup>3</sup>

#### **SERVER IMPROVEMENTS**

New server world records<sup>5</sup> and next generation infinity guard security



## HIGH PERFORMANCE MOMENTUM



\* Roadmap subject to change.

2017 •

# THANK YOU

TO THE WORLDWIDE AMD CORES TEAM
AND ALL THE OTHER AMD TEAMS THAT MADE "ZEN 3" POSSIBLE

THEIR DEDICATION AND HARD WORK ARE WHAT TRULY BREATHES LIFE INTO

ALL OF OUR PRODUCTS



## **ENDNOTES**

GD-122: "Zen" is a codename for AMD architecture and is not a product name.

GD-150: Max boost for AMD Ryzen processors is the maximum frequency achievable by a single core on the processor running a bursty single-threaded workload. Max boost will vary based on several factors, including, but not limited to: thermal paste; system cooling; motherboard design and BIOS; the latest AMD chipset driver; and the latest OS updates.

R5K-002: Testing by AMD performance labs as of 9/2/2020 based the average FPS of 40 PC games at 1920x1080 with the High image quality preset.

R5K-003: Testing by AMD performance labs as of 09/01/2020. IPC evaluated with a selection of 25 workloads running at a locked 4GHz frequency on 8-core "Zen 2" Ryzen 7 3800XT and "Zen 3" Ryzen 7 5800X desktop processors configured with Windows® 10, NVIDIA GeForce RTX 2080 Ti (451.77), Samsung 860 Pro SSD, and 2x8GB DDR4-3600. Results may vary.

R5K-009: Testing by AMD performance labs as of 09/01/2020 measuring gaming performance of a Ryzen 9 5900X desktop processor vs. a Ryzen 9 3900XT in 11 popular titles at 1920x1080, the High image quality preset, and the newest graphics API available for each title (e.g., DirectX® 12 or Vulkan™ or DirectX® 11). Results may vary.

R5K-007: Testing by AMD Performance Labs as of 09/01/2020 using Cinebench R20 nT versus system wall power during full load CPU test using a Core i9--10900K, Ryzen 9 3900XT, Ryzen 9 3950X, and a Ryzen 9 5950X configured with: 2x8GB DDR4-3600, GeForce RTX 2080 Ti, Samsung 860 Pro SSD, Noctua NH-D15s cooler, and an open-air test bench with no additional power draw sources. Results may vary.

R5K-078 Testing by AMD performance labs as of April 28, 2021 based on the average FPS of 32 PC games at 1920x1080 with the High image quality preset using an AMD Ryzen™ 9 5900X processor vs 12-Core 3D Chiplet Prototype. Results may vary.

RZ3-24: Based on AMD Labs testing in May 2019, an AMD "Zen 2"-based system configured with a "Matisse" B0 sample, AMD Reference Mobo, AMD Reference Cooler, 4x8GB DDR4-2667 RAM, Ubuntu O/S, and GeForce GTX 1080 GPU vs. a similarly configured "Summit Ridge" B2 sample, scored an estimated 15% higher using estimated SPECint®\_base2006 results. SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org.

EPYC-22: For a complete list of world records see http://amd.com/worldrecords.

GD-108: Generational IPC uplift for the "Zen" architecture vs. "Piledriver" architecture is +52% with an estimated SPECint\_base2006 score compiled with GCC 4.6 −O2 at a fixed 3.4GHz. Generational IPC uplift for the "Zen" architecture vs. "Excavator" architecture is +64% as measured with Cinebench R15 1T, and also +64% with an estimated SPECint\_base2006 score compiled with GCC 4.6 −O2, at a fixed 3.4GHz. System configs: AMD reference motherboard(s), AMD Radeon™ R9 290X GPU, 8GB DDR4-2667 ("Zen")/8GB DDR3-2133 ("Excavator")/8GB DDR3-1866 ("Piledriver"), Ubuntu Linux 16.x (SPECint\_base2006 estimate) and Windows® 10 x64 RS1 (Cinebench R15). SPECint\_base2006 estimates: "Zen" vs. "Piledriver" (31.5 vs. 20.7 | +52%), "Zen" vs. "Excavator" (31.5 vs. 19.2 | +64%). Cinebench R15 1t scores: "Zen" vs. "Piledriver" (139 vs. 79 both at 3.4G | +76%), "Zen" vs. "Excavator" (160 vs. 97.5 both at 4.0G | +64%).

CZM-1: 'Best Mobile Processors' is defined as having the highest multi-thread processing performance in each of four (4) classes of Ryzen 5000 series processors. Testing by AMD engineering using the Cinebench R20 nT benchmark, measuring multithreaded performance of a Ryzen 9 5900HX processor engineering sample vs Core i9-10980HK, Ryzen 7 5800U processor engineering sample vs Core i7-1185G7 processor, the Ryzen 5 5600U processor engineering sample vs Core i5-1135G7 processor, and a Ryzen 3 5400U processor engineering sample vs Core i3-1115G4 processor. Performance may vary.

CZM-12 :Testing by AMD Performance Labs as of 09/02/2020 utilizing an engineering platform configured with a Ryzen 9 5900H processor, 32GB RAM, 512MB SSD, Radeon™ Graphics, and Win 10, a similarly configured ASUS ROG Zephyrus G15 laptop with a Ryzen™ 9 4900H processor and NVIDIA GTX 1660Ti graphics in the following benchmarks: Cinebench R20 nT, Cinebench R20 1T and 3DMark Physics for gaming performance. PC manufacturers may vary configurations yielding different results. Performance may vary.

CZM-34: Performance based on MobileMark 2018 published test results posted at https://results.bapco.com/results/benchmark/MobileMark\_2018 using an AMD Ryzen 7 5800U-equipped HP Probook Aero 8 laptop with a 53 WHr battery and power slide set to 'better battery' vs. a similarly configured Ryzen 7 4700U-equipped HP Probook 635 Aero G7 notebook. Results may vary.

## COPYRIGHT AND DISCLAIMER

©2021 Advanced Micro Devices, Inc. All rights reserved.

AMD, the AMD Arrow logo, EPYC, Ryzen, Infinity fabric. and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

This information is provided 'as is." AMD makes no representations or warranties with respect to the contents hereof and assumes no responsibility for any inaccuracies, errors, or omissions that may appear in this information. AMD specifically disclaims any implied warranties of non-infringement, merchantability, or fitness for any particular purpose. In no event will AMD be liable to any person for any reliance, direct, indirect, special, or other consequential damages arising from the use of any information contained herein, even if AMD is expressly advised of the possibility of such damages.

