The AMD RDNA 3 architecture has finally been officially unwrapped, alongside the new $999 Radeon RX 7900 XTX and $899 Radeon RX 7900 XT graphics cards. These are set to go head-to-head with the best graphics cards, and AMD seems like it might have a legitimate shot at the top of the GPU benchmarks hierarchy. Here’s what we know.

First, most of the details align with what was already expected and covered in our AMD RDNA 3 architecture and RX 7000-series GPUs. RDNA 3 will use chiplets, with a main GCD (Graphics Compute Die) and up to six MCDs (Memory Cache Dies). In addition, there are a lot of under-the-hood changes to the architecture, including more Compute Units and a lot more GPU shaders compared to the previous generation.

Fundamentally, AMD continues to focus on power and energy efficiency and has targeted a 50% improvement in performance per watt with RDNA 3 compared to RDNA 2. We know Nvidia’s RTX 4090 and Ada Lovelace pushed far up the voltage and frequency curve, and as we showed in our RTX 4090 efficiency scaling, power limiting the RTX 4090 to 70% greatly boosted Nvidia's efficiency. However, AMD apparently feels no need to dial the power use up to 11 at default.

Let's start with a quick overview of the core specifications, comparing AMD's upcoming GPUs with the top previous generation RDNA 2 and Nvidia's RTX 4090.


AMD has two variants of the Navi 31 GPU coming out. The higher spec RX 7900 XTX card uses the fully enabled GCD and six MCDs, while the RX 7900 XT has 84 of the 96 Compute Units enabled and only uses five MCDs. The sixth MCD is technically still present on the cards, but it's either a non-functional die or potentially even a dummy die. Either way, it will be fused off, and it's not connected to the extra 4GB of GDDR6 memory, so there won't be a way to re-enable the extra MCD.

Compared to the competition, the RX 7900 XTX still technically comes in behind the RTX 4090 in raw compute, and Nvidia has a lot more AI processing power with its tensor cores. But we also have to remember that the RX 6950 XT managed to keep up with the RTX 3090 Ti at 1080p and 1440p and was only about 5% behind at 4K. That's despite having theoretically 40% less raw compute. So, when the RX 7900 XTX on paper has 32% less compute than the RTX 4090, we don't actually know what that will mean in the real world of performance benchmarks.

Also, note that AMD's presentation says 61 teraflops while our figure is 56.5 teraflops. That's because AMD's RDNA 3 has a split clock domain for efficiency purposes. The front end (render outputs and texturing units, perhaps) runs at 2.5 GHz, while the shaders run at 2.3 GHz. We used the 2.3 GHz value since the teraflops come from the shaders. Of course, these are "Game Clocks," which, at least with RDNA 2, were a conservative estimate of real-world clocks while running actual games. (That's the same for Nvidia's Ada Lovelace and Intel's Arc Alchemist, which both tend to run 150–250 MHz higher than the stated boost clock values in our testing.)

AMD also has a higher boost clock relative to the Game Clock, which is where it gets the 61 teraflops figure — the boost clock on the RX 7900 XT is 2.5 GHz. But, again, we'll need to test the hardware in a variety of games to see where the actual clocks land. With RDNA 2, we found the boost clocks were pretty consistently what we saw in games, maybe even a bit low, so consider the 56.5 teraflops figure a very conservative estimate.

Of course, the bigger deal isn't how RX 7900 XT stacks up against the RTX 4090 but rather how it will compete with the RTX 4080. It has more memory and memory bandwidth, plus 16% more compute. So even if the performance per clock on the RDNA 3 shaders dropped a bit (more on this in a second), AMD looks like it should be very competitive with Nvidia's penultimate RTX 40-series part, especially since it costs $200 less.

With the high-level overview out of the way, let's dig into some architectural details. Unfortunately, AMD is keeping some things under wraps, so we're not entirely sure about the memory clocks right now, and we've asked for more information on other parts of the architecture. We'll fill in the details as we get them, but some things might remain unconfirmed until the RDNA 3 launch date on December 13.


AMD has said a lot about energy efficiency with the past two generations of RDNA architectures, and RDNA 3 continues that focus. AMD claims up to a 54% performance per watt improvement compared to RDNA 2, which in turn was 54% better PPW than RDNA. In the past three generations, AMD's efficiency has skyrocketed — and that's not just marketing speak.

If you look at the RX 6900 XT as an example, it's basically double the performance of the previous generation RX 5700 XT at 1440p ultra. Meanwhile, it consumes 308W in our testing compared to 214W on the 5700 XT. So that's a 38% improvement in efficiency, just picking the two fastest RDNA 2 and RDNA offerings at the time of launch.

How does AMD continue to improve efficiency? Of course, a big part of the latest jump comes thanks to the move from TSMC N7 to N5 (7nm to 5nm), but the architectural updates also help.

The new RDNA 3 unified compute unit has 64 dual-issue stream processors (GPU shaders). That's double the amount of RDNA 2 per compute unit, and AMD can send different workloads to each SIMD unit — or it can have both working on the same type of instruction. It's interesting to note that the latest AMD, Intel, and Nvidia GPUs are now all using 128 shaders for each major building block — Compute Units (CUs) for AMD, Streaming Multiprocessors (SMs) for Nvidia, and Xe Vector Engines (XVEs) for Intel.

[Note: AMD has some details saying it's still 64 Stream Processors per Compute Unit. We're trying to get some clarification on why the number of ALUs has doubled but the SP counts remained the same. We'll update when we get a good answer. The math does require 12,288 GPU shaders for the 61 teraflops, either way.]

Along with doubling the GPU shaders per CU, AMD has increased the total number of CUs from 80 to 96. Gen over gen, AMD's Navi 31 has 2.4 times as many shaders as Navi 21, and the power draw only increased by 18%.

AMD also increased the performance of its AI Accelerators, which it hasn't really talked about much. We're not sure about the raw compute power, but we do know that the AI accelerators support both INT8 and BF16 (brain-float 16-bit) operations. So they're probably at least partially similar to Nvidia's tensor cores, but the total number of supported instruction sets aren't the same. Regardless, AMD says the new AI accelerators provide up to a 2.7x improvement — double the number, more CUs, and slightly higher throughput combined would get there.

Finally, AMD says it has optimized its Ray Accelerators and that the RDNA 3 versions can handle 1.5x as many rays, with new dedicated instructions and improved BVH (ray/box) sorting and traversal. What that means in the real world still isn't totally clear, but we definitely expect a large leap in ray tracing performance along with improved rasterization performance. Will it be enough to catch Nvidia? We'll have to wait and see.