View RSS Feed

HiGame

New leak hints at AMD Zen’s architecture, organization - We know a bit more about the FPU

Rating: 2 votes, 5.00 average.
by , 05-07-2015 at 02:40 AM (1017 Views)
      
   

AMD’s Analyst Day is set for next week and it now appears certain that the company will spend at least some time discussing its Zen architecture. This new slide is designed in AMD’s colors and style — if it’s a fake, it’s a good one. In this case, however, I don’t think it’s fake. The architectural details make sense and are reasonable based on what we might expect AMD to build.




Let’s start with comparing and contrasting Zen with Excavator. If these slides are accurate, what we’ve got here is a CPU core that eschews the shared resource model that characterized the entire Bulldozer family. Note that the Excavator diagram shows a separate arrow for the two integer pipeline blocks, reflecting the fact that Steamroller added dual decoder blocks to the CPU family. The floating point scheduler can be addressed by either block.

Zen, in contrast, has a unified decode block for both its integer and its floating-point schedulers. The dual arrows, in this case, likely signify Simultaneous Multi-Threading — what Intel calls Hyper-Threading. An SMT design would allow AMD to execute components of two different threads simultaneously.

As for the integer pipelines, we need to know more before we can say anything. The Bulldozer family’s four lanes per core may look impressive, but two of those lanes are Address Generation Units, or AGUs. They aren’t used for calculating integer workloads, but for calculating the addresses the CPU uses to access main memory. In other words, simply showing us six pipelines doesn’t tell us what the pipelines do, or how efficient they are. AMD’s older K10 architecture had six integer pipelines, with three ALU and three AGUs per core.

We know a bit more about the FPU — it supports 256-bit registers, which puts it on par with Haswell, at least as far as register sizes. Interestingly, this was a feature previously forecast for Excavator. That CPU apparently packs AVX2 support, but it may have kept 128-bit register sizes.

Core architecture, cache design


A second leaked slide puts Zen in more context.



Last week, I said that the leaked Zen slide might be inaccurate — it looked more like a wish list than a functional processor. This is much more along the lines of what I’d expect AMD to be building. We see four Zen cores per “unit” (AMD is dropping the module terminology) with an L3 cache for every group of cores. AMD retains the ability to build multi-core systems using Multi-Chip Modules (MCMs), which means it can physically link a group of four Zen cores. It may also build these cores using a so-called “native” interface, with 8-16 cores per die — that’s not something we’re privy to at this point.

The slide notes that AMD may be using a “fully inclusive” cache design for high performance and lower latency. This deserves a bit of additional explanation. There are (generally speaking) three types of cache design — strictly inclusive (data in the L1 is always stored in the L2 cache), strictly exclusive (data is either stored in the L1 or L2 caches, but never in both), and mainly inclusive, where data stored in L1 can be evicted from L2, but typically isn’t.

Before Bulldozer, AMD historically used an exclusive cache design, which made certain memory operations slower, but offered more effective space, since you aren’t duplicating data between the L1 and L2. This made particular sense in the K7 days when the chip’s large L1 cache (128KB) would have been extremely expensive to duplicate in L2 cache. Inclusive cache designs typically have lower latency for certain operations and can simplify coherency checks.

The rest of the implications of this slide are straightforward. AMD has designed a chip that’s clearly meant to deploy in an MCM — the reference to high-speed interconnects between units appears to refer to on-package modules, rather than between-socket interconnects in a 2P system. The 512K of L2 and 8MB L3 per group of four cores also makes sense as an organizational principle.

None of these details, it must be noted, tell us anything about how Zen will perform, or how Keller and the rest of the AMD chose their power and performance targets. There are design elements here that could echo Phenom II, but without more information we can’t conclude that, yet.

May 6 is going to be very, very interesting. As always, take these posts with a grain of salt — nothing is official until it’s announced.

More...

Submit "New leak hints at AMD Zen’s architecture, organization - We know a bit more about the FPU" to Google Submit "New leak hints at AMD Zen’s architecture, organization - We know a bit more about the FPU" to del.icio.us Submit "New leak hints at AMD Zen’s architecture, organization - We know a bit more about the FPU" to Digg Submit "New leak hints at AMD Zen’s architecture, organization - We know a bit more about the FPU" to reddit

Tags: zen Add / Edit Tags
Categories
Uncategorized

Comments