Intel Unveils Sunny Cove, Gen11 Graphics, Xe Discrete GPU, 3D Stacking

**Rhialto** · 12-12-2018

Intel's new heads of silicon development, led by Raja Koduri, the Senior Vice President of Core and Visual Computing, and Jim Keller, the Senior Vice President of Silicon Engineering, hosted its Architecture Day here in Santa Clara to outline the company's broad new vision for the future. Dr. Murthy Renduchintala, Intel's chief engineering officer and group president of the Technology, Systems Architecture & Client Group (TSCG) also presented at the event, which was held in the home of Intel co-founder Robert Noyce.

Highlights included the unveiling of the company's new Sunny Cove CPU microarchitecture, its new Gen11 integrated graphics, 'Foveros' 3D chip stacking technology, a teaser of the company's new Xe line of discrete graphics cards, and a new "One API" software designed to simplify programming across Intel's entire product stack. We also caught a glimpse of the first 10nm Ice Lake processor for the data center.

Intel has amassed a treasure trove of new technologies over the last several years as it has diversified into new areas like AI, autonomous driving, 5G, FPGAs, and IoT, and other areas. It's even added GPUs to the list. Intel's process technology touches every segment of all that tech, as well as the chips that power them, but its delayed 10nm process has slowed the company's progress.

To help get back on track, Intel brought in Raja Koduri and Jim Keller to outline a new cohesive vision that spans all facets of its operations. Together with the company's leadership, the pair identified six key building blocks that the company will focus on over the coming years. Those pillars include process technology, architectures, memory, interconnects, security, and software. The company hopes that focusing on these key areas will accelerate its pace of innovation and help it regain its competitive footing.

The event was a wide-ranging affair with an almost overwhelming amount of information and insight into the company's plans for the future, but a few new key technologies stood out as particularly promising. Let's take a look at some of the most interesting new technologies Intel is working on.

3D Chip Stacking With Foveros

Foveros (Greek for "awesome") is a new 3D packaging technology that Intel plans to use to build new processors stacked atop one another. The concept behind 3D chip stacking is a well-traveled topic that has been under development for decades, but the industry hasn't been able to circumvent the power and thermal challenges, not to mention poor yields, well enough to bring the technology to high-volume manufacturing.

Intel says it built Foveros upon the lessons it learned with its innovative EMIB (Embedded Multi-Die Interconnect Bridge) technology, which is a complicated name for a technique that provides high-speed communication between several chips. That technique allowed the company to connect multiple dies together with a high-speed pathway that provides nearly the same performance as a single large processor. Now Intel has expanded on the concept to allow for stacking die atop each other, thus improving density.

The key idea behind chip stacking is to mix and match different types of dies, such as CPUs, GPUs, and AI processors, to build custom SOCs (System-On-Chip). It also allows Intel to combine several different components with different processes onto the same package. That lets the company use larger nodes for the harder-to-shrink or purpose-built components. That's a key advantage as shrinking chips becomes more difficult.

Intel had a fully functioning Foveros chip on display at the event, that it built for an unnamed customer. The package consists of a 10nm CPU and an I/O chip. The two chips mate with TSVs (Through Silicon Via) that connect the die through vertical electrical connections in the center of the die. The channels then mate with microbumps on an underlying package. Intel also added a memory chip to the top of the stack using conventional a PoP (Package on Package) implementation. The company envisions even more complex implementations in the future that include radios, sensors, photonics, and memory chiplets.

The current design consists of two dies. The lower die houses all of the typical southbridge features, like I/O connections, and is fabbed on the 22FFL process. The upper die is a 10nm CPU that features one large compute core and four smaller 'efficiency' cores, similar to an ARM big.LITTLE processor. Intel calls this a "hybrid x86 architecture," and it could denote a fundamental shift in the company's strategy. The company later confirmed that it is working on building a new line of products based on the new hybrid x86 architecture, which could be the company's response to the Qualcomm Snapdragon processors that power Always Connected laptops. Intel representatives did confirm the first product draws less than 7 watts (2mW standby) and is destined for fanless devices but wouldn't elaborate further.

The package measures 12x12x1mm, but Intel isn't disclosing the measurements of the dies. Stacking small dies should be relatively simple compared to stacking larger dies, but Intel seems confident in its ability to bring the technology to larger processors. Ravishankar Kuppuswamy, Vice President & General Manager of Intel's Programmable Solutions Group, announced that the company is already developing a new FPGA using the Foveros technology. Kuppuswamy claims Foveros technology will enable up to a 2x order of magnitude performance improvement over the Falcon Mesa FPGAs.

Intel Xe Discrete Graphics and Gen11 Graphics

Intel also unveiled its new Gen11 integrated graphics engine and presented a demo of Tekken 4 playing amazingly well using the new graphics architecture. The demo ran on a 10nm processor, marking the company's first public demo of a graphics engine running on a 10nm processor.

Intel cautioned that the diagrams it used for the presentation aren't entirely to scale, but they do give us a close look under the hood of the new graphics engine. The Gen11 engineering team focused heavily on creating a dramatic performance improvement over its previous-gen graphics engine, stating that the goal was to cram 1 teraflop of 32-bit and 2 teraflops of 16-bit floating point performance into a low power envelope.

Intel employed the familiar modular arrangement with sub-slices that house eight execution units (EU). Intel brought the design up to eight sub-slices, or 64 execution units (EUs). That's a big improvement from Gen9's 24 EUs. The revamped engine also processes two pixels per clock.

The new design features support for tile-based rendering in addition to immediate mode rendering, which helps to reduce memory demands during some rendering workloads. The engineers also improved the memory subsystem by quadrupling the L3 cache to 3MB and separated the shared local memory to promote parallelism. The new design also has enhanced memory compression algorithms.

Other improvements include a new HEVC Quick Sync Video engine that provides up to a 30% bitrate reduction over Gen9 (at the same or better visual quality), support for multiple 4K and 8K video streams within a lower power envelope, and support for Adaptive Sync technology.

Intel Xe Graphics Technology, Now Including a Discrete GPU

Intel shocked the enthusiast community earlier this year when it announced that it would enter the discrete graphics market. Intel is quick to remind us that it "lights up quintillions of pixels across the planet every day," which is a true statement based on the fact that, courtesy of its integrated graphics chips in its CPUs, Intel is the world's largest GPU producer. Now the company is bringing that experience to the discrete GPU market, and yes, that means it is bringing gaming-focused GPUs to market.

Translating that experience in integrated graphics to its new lineup of discrete GPUs isn't going to be an easy task: Their last successful entry into the GPU space occurred 25 years ago. But Intel has an IP war chest (at one point it owned more graphics patents than the other vendors combined) and has been on a full-court press recruiting the right talent for the task.

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9UL1ovODE0OTY3L29yaWdpbmFsLzIwMThfQXJjaGl0ZWN0dXJlRGF5X0Rh.jpg

Intel presented a slide outlining its new Xe architecture that will come after the Gen11 graphics engine. Intel says the next generation of its graphics architecture will denote a transition from the "Gen" naming convention and will scale from integrated on-chip graphics up to discrete GPUs that will span the mid-range, enthusiast, and data center markets. That means it will scale from teraflops of performance integrated into a standard processor up to petaflops of performance with discrete cards.

This announcement certainly hints that both the integrated graphics and discrete cards will share the same underlying architecture, but Intel wouldn't answer further questions. Intel is also on track to deliver on its previously-announced timeline, saying the Xe graphics cards will debut in 2020.

CPU Core Roadmap, Sunny Cove Microarchitecture, 10nm Ice Lake

Intel also took the wraps off its new CPU core roadmap and "Sunny Cove" microarchitecture at the event. The company's dominance in the chip market has long been predicated on process and microarchitecture leadership, but Intel's approach to designing new processor cores has been inextricably tied to its onward march to smaller process nodes. That means that its new CPU core designs (microarchitectures) have traditionally required a move to new, smaller manufacturing processes.

That approach became a liability as Intel encountered massive delays with its 10nm process. Instead of bringing out new core designs, the company was mired on the 14nm process for four years as it constantly refined the process through a cadence of "+" iterations. Each new iteration of the 14nm process found the company offering higher frequencies, and thus more performance, as it marched from 4.2 GHz up to 5.1 GHz. These improvements delivered up to 70% more performance since 14nm's debut in 2014, but the lack of a new microarchitecture, which typically improves the processors' instruction per clock (IPC) throughput, slowed its progress.

After learning a hard lesson exacerbated by the resurgent AMD nipping at its heels, Intel tells us that the company will now design new microarchitectures to be portable between nodes. That will allow the company to move forward even if it encounters roadblocks on its path to smaller transistors.

The Sunny Cove microarchitecture is the first new design that can be used on multiple nodes, and even though Intel has stated the new core will debut on the 10nm node, it hasn't verified that it will come with the Ice Lake chips. In line with its new design ethos, Intel also tells us that it will select different nodes for different products based on the needs of the segment. That's similar to the approach taken by third-party fabs like TSMC and Global Foundries, and it means Intel could choose to use Sunny Cove with 14nm processors as well.

Intel's CPU Core Roadmap

Intel has typically used the same naming convention for its microarchitectures as it does for its processors. Hence, Skylake processors came with the Skylake architecture, and Kaby Lake processors came packing the Kaby Lake architecture. That old paradigm changes now that Intel has decoupled its architectures from the end products, so the company debuted a new roadmap specifically for CPU cores.

222aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9TL1EvODE0OTIyL29yaWdpbmFsLzIwMThfQXJjaGl0ZWN0dXJlRGF5X.jpg

Intel presented its new roadmap for both its Core and Atom lineups. As usual, the Core series addresses the company's bread-and-butter high-performance chips, while the Atom chips serve the low power segment.

Intel's Sunny Cove will debut in 2019, bringing with it higher performance in single-threaded applications, a new instruction set architecture (ISA), and a design geared for scalability. Willow Cove will follow with an improved cache hierarchy, security features, and transistor optimizations. The Golden Cove microarchitecture will debut in 2021 with a focus on yet more single-threaded performance, AI performance, networking improvements, and new security features. Atom will receive a slower cadence of improvements, with Tremont debuting in 2019 and Gracemont in 2021. 'Next' Mont will arrive before 2023.

Intel plans for general performance improvements through three key design tenets of going deeper, wider, and smarter, but it is also improving what it calls 'special purpose' use cases, like AI, cryptography, and compression/decompression workloads.

333aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9ULzYvODE0OTM4L29yaWdpbmFsLzIwMThfQXJjaGl0ZWN0dXJlRGF5X.jpg

Intel presented us a quick refresh on its Skylake architecture that underpins its Skylake, Kaby Lake, Coffee Lake, and Cascade Lake processors. The design (left) processes operations through two reservation stations (RS). It can process seven operations simultaneously and propel them to the integer (INT), vector (VEC), store data, and address generation units (AGU).

Deeper

444aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9ULzgvODE0OTQwL29yaWdpbmFsLzIwMThfQXJjaGl0ZWN0dXJlRGF5X.jpg

The new Sunny Cove design features improvements in every level of the pipeline. Key improvements to the front end include larger reorder, load, and store buffers, along with larger reservation stations. This allows the processor to look deeper into the set of incoming instructions to find operations that are independent of each other but can run simultaneously. The operations are then executed in parallel to improve IPC.

Intel increased the L1 data cache from 32KB, the same capacity it has used in its chips for a decade, to 42KB. The L2 cache is also larger, but the capacity is dependent upon each specific type of product, such as chips designed for either the desktop or server market. Intel also expanded the micro-op cache (UOP) and the second-level translation lookaside buffer (TLB).

Wider

555aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9ULzcvODE0OTM5L29yaWdpbmFsLzIwMThfQXJjaGl0ZWN0dXJlRGF5X.jpg

A key facet of improving performance is to increase parallelism. That starts with the deeper buffer and reservation stations we covered above, but it also requires more execution units to process the operations.

Intel moved from a four-wide allocation to five-wide to allow the in-order portion of the pipeline (front end) to feed the out-of-order (back end) portion faster. Intel also increased the number of execution units to handle ten operations per cycle (up from eight with Skylake). The Store Data unit can now process two store data operations for every cycle (up from one). The address generation units (AGU) can now also handle two loads and two stores every cycle. These improvements are necessary to match the increased bandwidth from the larger L1 data cache, which now does two reads and two writes every cycle. Intel also tweaked the design of the sub-blocks in the execution units to enable data shuffles within the registers.

**Rhialto** · 12-12-2018

Smarter

666aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9ULzkvODE0OTQxL29yaWdpbmFsLzIwMThfQXJjaGl0ZWN0dXJlRGF5X.jpg

Intel reiterated that just increasing the size of the buffers and the number of execution units requires smart algorithmic management to strike a balance between performance and the power budget. That revolves around improving branch prediction accuracy and reducing latency under load conditions. The net effect is a 'significant' increase in IPC. Intel didn't provide specific measurements. but promises to share more information as products come to market.

Intel also designed Sunny Cove to address specific use cases, like cryptography, AI, and compression/decompression workloads. The company accomplished these goals by creating new instructions and features to improve performance.

Exploding Memory Capacity

Intel also improved the amount of memory the processor can address, which is a key consideration given its goal of boosting memory capacity with Optane DC Persistent Memory DIMMs. The speedy Optane memory modules provide up to 512GB of memory-addressable storage per DIMM, meaning memory capacity is set to explode as more data centers transition to the technology.

Intel's Sunny Cove moves to a five-level paging structure (up from a four-level structure). That increases the virtual address space up to 57 bits and the physical address space up to 52 bits, meaning it supports up to 4 petabytes of memory. That's up from 64 TB of addressing capability with Skylake.

Intel's New Course with OneAPI

Intel's Sunny Cove is an innovative design that looks promising, but as with all designs, we won't know the true benefits until we see the silicon in our labs. Intel's new vision to decouple its CPU core designs from its process improvements is the real advance that will help the company remain competitive in the future. Intel surely can't afford another period of stagnation like we've seen during its struggles with the 10nm process.

Third-party fabs have proven to be Intel's greatest competitor. TSMC has taken the process lead with its pending 7nm node, and as a result, new 7nm chips will soon flow from the stalwarts of the semiconductor space, like AMD, Apple, Qualcomm, and Nvidia. These companies work together with TSMC to bring their new designs to market, meaning that Intel isn't just competing with one company -- it faces the combined might of several of the behemoths in the chip market.

Intel does have a plan to outmaneuver its rivals by leveraging its wide-ranging product stack, but it surprisingly revolves around software. Intel is working on its new "One API" software, which is designed to simplify programming across its GPU, CPU, FPGA, and AI accelerators. The software goes by the tagline of "no transistor left behind," and given its goals, that's an accurate statement. The new software provides unified libraries that will allow for applications to move seamlessly between Intel's different types of compute. If successful, this could be a key differentiator that other firms will not be able to match with as many forms of compute.

The company does have its own plans for its resurgence on the process front, though, as evidenced by the brief display of its 10nm Ice Lake data center chip. The company didn't share any details about the new processor.

Intel has been notoriously silent on its roadmap and future plans, but its Architecture Day event seems to signal a new level of openness and disclosure from the company. Several executives from Intel's executive management team were present for the event and were willing to answer our questions openly and frankly. The company also plans to hold future events to drill down more on these topics as it comes closer to delivering products to market.

Intel held roundtable discussions at the event, where Raja Koduri (R) and Jim Keller (J), and Dr. Murthy Renduchintala (M) fielded questions during the last session. Here are some of the questions and answers:

Q: A lot of the CPU microarchitecture at Intel has been hamstrung by delays on process node technology. What went wrong, and what steps have been made to make sure it doesn't happen again?

R/J: Our products will be decoupled from our transistor capability. We have incredible IP at Intel, but it was all sitting in the 10nm process node. If we had had it on 14nm then we would have better performance on 14nm. We have a new method inside the company to decouple IP from the process technology. You must remember that customers buy the product, not a transistor family. It’s the same transformation AMD had to go through to change the design methodology when they were struggling. At Apple it was called the ‘bus’ method.

M: This is a function of how we as a company used to think about process node technologies. It was a frame tick (limiting factor) for how the company moved forward. We've learned a lot about how this worked with 14nm. We now have to make sure that our IP is not node-locked. The ability to have portability of IP across multiple nodes is great for contingency planning. We will continue to take aggressive risks in our designs, but we also will have contingency. We need to have as much of a seamless roadmap as possible in case those contingencies are needed, and need to make sure they are executed on ASAP if needed to keep the customer expectations in line. You will see future node technologies, such as 10/7, have much more overlap than before to keep the designs fluid. Our product portfolio on 14nm could have been much better if our product designs were not node-locked to 10nm.

R: In the future there will be no transistor left behind, no customer left behind, and no IP left behind.

Q: Will we ever see a 10nm monolithic desktop CPU at the high end?

R: Yes.

Q: How is 10nm? Has it changed?

R: It is changing, but it hasn't changed. There are a lot of lessons learned in how Intel approached it to begin with. We are established a much better model between manufacturing and design. We want good abstractions in product and process node going forward. When everything was going well, this issue didn't manifest and so wasn't an issue. There's complexity here when something bad happens on process, so the whole pipeline clogs up -- the rest of the world solves this with abstraction. We need to make sure it won't happen again, and we have a desire to build resilience in the roadmap.

Q: Are there plans for mixed SoCs, combining CPU / GPU / AI / FPGA ?

R: In our roadmap there will be scalable vector/matrix combinations. What our customers want are very scalable solutions. Customers want similar programming models regardless of the silicon.

Q: What has been the effect of hiring Raja/Jim and bringing outsiders to Intel?

M: Intel is very innovative. We want to add to that chemistry and make sure we bring in people who understand Intel but also bring in good ideas. It's about respecting the rest of the market and make sure Intel is competitive. It's balancing the center of internal debates by making sure we are challenging internal beliefs and the status quo by bringing in people who have done this sort of thing before. It shows to Intel's strengths in its ability to absorb interesting ideas from the outside. We went for the very best on the outside because that was what required to join with the very best on inside.

Q: What is Intel’s current approach to 5G, given the topics discussed today?

M: We think about 5G from the datacenter to the network to the edge and to the device. We at Intel believe the transition to 5G and its implications on the network, in terms of accelerating data and catalyzing a software-defined network where bespoke silicon gets replaced by containers, is as transformative as the jump from analogue to digital. It will accelerate the "cloudification" of the network. The edge is important, especially to minimize latency for new services. Sub-millisecond latency for these services is critical. The over-the-air interface is important too. The intelligent cloud domain is going to be the flywheel about how fast the industry evolves. We mentioned in November that our XMM 5G modem will be in the hands of partners in the second half of 2019 with products in early 2020. It is a multi-mode 5G LTE architecture from day one, supporting all 3 mmWave bands, and sub-6 GHz frequencies.

Q: As Thunderbolt 3 requires additional chips, how do you see future OEM adoption?

M: Integrated Type-C Thunderbolt 3 is the first generation. We will refine it in the future - that's the natural genealogy of the technology. We constantly think about how much we integrate into the chip and how much we leave on the board.

R: This is a big IP challenge, not only for TB3, but for other IP. Integrated PHYs are important. For example, by dis-aggregating the transceiver in our FPGA lineup, it has allowed us to focus on that decoupled IP a lot.

Q: In the demo of FOVEROS, the chip combined both big x86 cores built on the Core microarchitecture and the small x86 cores built on the Atom microarchitecture. Can we look forward to a future where the big and little cores have the same ISA?

R: We are working on that. Do they have to have the same ISA? Ronak and the team are looking at that. However I think our goal here is to keep the software as simple as possible for developers and customers. It's a challenge that our architects have taken up to ensure products like this enter the market smoothly. We’ll also have a packaging discussion next year on products like this. The chip you see today, while it was designed primarily for a particular customer to begin with, it’s not a custom product, and in that sense will be available to other OEMs.

M: We've made the first step on a journey. That first step is a leap, and the next step is incremental. As we've said about One API strategy – if we homogenize the API then it'll go into all our CPUs. Foveros is also a new part/product that shows that we had a gap in our portfolio – it has helped us create technologies to solve an issue and we expect to expand on this in the future with new IP.

Q: Are you having fun with Foveros?

J: Because Raja deals in GPUs, he’s having fun with high bandwidth communications between compute elements. It's a new technology and we're having some experimentation with it. What is frustrating is that as an industry we hit a limit for current flux density a year before stacking technology became viable, so for high performance on stacking we're trying a lot of things in different areas. There's no point having to make thermal setbacks if it also removes the reason why you're using the technology. But we're having fun and trying a lot, and we'll see Foveros in a number of parts over the next 5 years. We will find new solutions to problems we don't even know exist yet.

Q: When is Manufacturing Tech Day?

M: We will tell you when it happens! I'm sure you all have opinions on Intel 10nm right now and yes we are looking at what we're doing, eating an amount of humble pie, but we're re-adjusting our process to make sure that we can take the best process no matter what the product is.

Torrent Invites! Buy, Trade, Sell Or Find Free Invites, For EVERY Private Tracker! HDBits.org, BTN, PTP, MTV, Empornium, Orpheus, Bibliotik, RED, IPT, TL, PHD etc!

Thread: Intel Unveils Sunny Cove, Gen11 Graphics, Xe Discrete GPU, 3D Stacking

LinkBack

Thread Tools

Display

Intel Unveils Sunny Cove, Gen11 Graphics, Xe Discrete GPU, 3D Stacking

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions