5 Attachment(s)
Intel Slips Out New Gen11 Graphics Architecture Details
In a stunningly unceremonious reveal, Intel posted the architecture of its new Gen11 graphics to its website. The document shines a spotlight on some fine-grained details of the Intel's new graphics engine that will debut with its forthcoming 10nm Ice Lake processors, but it came with no advance notification to the press or public.
Intel announced the new Gen11 graphics at its recent Architecture Day, telling us that the Gen11 engineering team focused heavily on creating a dramatic performance improvement over its previous-gen graphics engine, stating that the goal was to cram one teraflop of 32-bit and two teraflops of 16-bit floating point performance into a low power envelope. Early indications are that the new Gen11 graphics provide a substantial boost to real-world performance.
Given the facts and figures the company presented, we can reasonably spitball raw performance of these new integrated graphics in the range of the Radeon Vega 8 cores that come with the Ryzen 3 2200G. That could set the stage for a radical improvement to the default graphics engine that ships with nearly every mainstream Intel processor, serving a blow to Nvidia and AMD's low-end graphics cards in the process.
The Gen11 Graphics Architecture
Intel's documentation states the graphics are based on Intel's 10nm process with third-gen FinFET technology. As expected, it supports all of the major APIs.
Intel's Gen9 graphics employed the familiar modular arrangement with sub-slices that house eight execution units (EU). Intel brought the Gen11 design up to eight sub-slices, or 64 execution units (EUs), in the most common variants, but that may be adjusted for some designs. In either case, that's a big improvement over Gen9's 24 EUs, totaling a 2.67x improvement in compute capability. The revamped engine also processes two pixels per clock.
As we can see above, Intel's SoC (System On a Chip) design, which is used in its Core series of processors, is connected via a ring interconnect that ties together the CPU cores, GPU cores, LLC (Last Level Cache), and the system agent functions (PCIe, memory and display controllers).
Notably, the last level cache is shared between both the CPU cores and the graphics. The SoC design has numerous clock domains split out into per-CPU core, processor graphics clock, and ring interconnect clock domains.
The engine features support for tile-based rendering in addition to immediate mode rendering, which helps to reduce memory demands during some rendering workloads.
Each slice houses 3D Fixed Function Geometry, eight sub-slices containing the EUs, and a "slice common" that holds various fixed function blocks and the L3 cache. Intel improved the memory subsystem by quadrupling the L3 cache to 3MB and separated the shared local memory to promote parallelism. The new design also has enhanced memory compression algorithms.
Other improvements include a new HEVC Quick Sync Video engine that provides up to a 30% bitrate reduction over Gen9 (at the same or better visual quality), support for multiple 4K and 8K video streams within a lower power envelope, and support for Adaptive Sync technology. VP9 decode bit depth is improved up to 10 bits (from 8) to support HDR video.
Diving deeper down into the slice, we can see that each slice houses eight subslices, each with eight EU. Each subslice houses a local thread dispatcher unit and its own instruction caches to feed it. A shared local memory (SLM), 3D texture sampler unit, media sampler unit, and dataport unit round out each subslice.