In Theory: Can next-gen Nvidia tech offer Titan X power for GTX 970 money?

Early spec analysis on the next-gen Pascal architecture - and how it could translate to mainstream GPUs.

Feature by Richard Leadbetter Technology Editor, Digital Foundry

Updated on 28 Apr 2016

If high-performance graphics cards like Titan X, Fury X and GTX 980 Ti aren't enough to satisfy your lust for top-tier PC hardware, this year will see the arrival of new hardware with the potential to take gaming visuals and performance to the next level. Historically, AMD and Nvidia have worked hard to push PC graphics year after year, but the arrival of 14nm and 16nm chip fabrication technology using 3D FinFET transistors offers GPU vendors the first real innovation in manufacturing technology for five years. And recent data released by Nvidia suggests we're in for something really special with its upcoming Pascal architecture.

All the signs suggest that it's Nvidia that will take point with the arrival of new graphics hardware based on the 16nm process provided by long-time partner TSMC, with rumours strongly suggesting that product will be seen at Taiwan's Computex show at the end of May. There's been a range of leaks and rumours reported by the Far Eastern press in recent weeks, but the best indication we have of Pascal's make-up comes from the reveal of the Tesla P100 accelerator at Nvidia's GTC conference earlier this month, complete with an exhaustive list of specs.

The new product is aimed at large datacenters and other consumers of so-called super-computer technology, but crucially, the new Tesla is built on Pascal technology and the specs strongly suggest that this processor will eventually end up as the next generation Titan, or equivalent. The chip's name is GP100, echoing the GM200 of Titan X and the GK110 of the original Titan - and some of the raw stats buried within the data are absolutely remarkable.

What about AMD?

It's not just Nvidia that has access to a new fabrication process. AMD does too, specifically a 14nm FinFET process provided by Global Foundries, Samsung - or perhaps even both. In theory, a 14nm process should be preferable to 16nm, but as we understand it, they are two different implementations of much the same technology. Indeed, for iPhone 6S, Apple actually uses chips using both processes.

AMD's new architecture is codenamed Polaris, with Polaris 10 and 11 chips in development. We would expect Polaris 11 to be the red team's answer to the mooted GP104, using GDDR5 or GDDR5X. Next-gen HBM products are expected to ship at a later date, and they have their own product codename: Vega. A now-deleted LinkedIn profile from AMD R&D manager Yu Zheng suggests that one Vega product will have 4096 shaders, just like Fury X.

Otherwise, aside from an architectural overview, there's little more we can discern right now about Polaris. AMD itself has only offered one tiny glimpse of its capabilities, suggesting that a Polaris product (presumably Polaris 10) can run Star Wars Battlefront at medium settings at 1080p60, with a 61 per cent reduction in power consumption vs Nvidia's GTX 950, which requires 140W.

It's a curious comparison, because recent, compelling leaks suggest that Polaris 10 features 2304 shaders, making it a more expensive product, and a highly capable replacement to the existing R9 380. In fact, it's this piece of silicon that's hotly tipped to be a key component in the make-up of the PlayStation Neo's processor.

It's not a great comparison to be honest - GTX 950 is a die-harvested part running on 28nm on a game that favours AMD hardware. Hopefully more meaningful data will be released at Computex at the tail-end of next month.

First up, check out the size of the chip itself. There have been concerns that the 16nm process may require time to mature, that larger, more difficult to make processors may take years to appear. However, GP100 is actually larger than GM200 - 610mm2 vs 601mm2. Confirmation of 16nm's manufacturing advantage is also confirmed by a 15.3bn transistor count - up from 8bn in today's top-tier product. Perhaps most surprising of all is the boost clock - the peak speed of the chip. It's rated for 1480MHz, which is actually higher than what you can reasonably expect to achieve from Titan X pushed its absolute limits. And this is for an industrial product, which usually has quite conservative clocks compared to the consumer graphics cards.

Cover image for YouTube video — Rich offers an overview of Pascal's advantages in the Tesla P100 accelerator, and how they may translate to consumer graphics cards.Watch on YouTube

	Tesla M40	Tesla P100
GPU	GM200 Maxwell	GP100 Pascal
SMs	24	56
Base Clock	948MHz	1328MHz
Boost Clock	1114MHz	1480MHz
Texture Units	192	224
Memory Interface	384-bit GDDR5	4096-bit HBM2
L2 Cache	3072KB	4096KB
Transistor Count	8bn	15.3bn
Die Size	601mm²	610mm²
Process	28nm	16nmFF
TDP	250W	300W

On paper, GP100's leap over GM200 is absolutely remarkable. Processing power typically scales with transistor count. Not only has the 16nm process delivered this, but the overall speed of the processor has increased too. And there are other reasons to believe we're in for a huge boost in performance - many believed that the Pascal architecture would be a die-shrunk version of Maxwell. That isn't the case, with a restructuring of the CUDA cores along with another big boost in L2 cache. How that translates into enhanced performance remains to be seen, of course.

The Tesla P100 uses 16GB of HBM2 memory too, accessed via an ultra-wide 4096-bit bus - a vast improvement in memory bandwidth compared to the 384-bit GDDR5 utilised in Titan X. We'd expect a next-gen Titan to retain the HBM2 (it's already confirmed for the AMD competitor, codenamed Vega), but the question is how much VRAM we will see in the inevitable cut-down version of the card aimed at the gaming audience - today's equivalent to the GTX 980 Ti.

What's fascinating about Nvidia's GTC announcement is just how much the firm shared, to the point where we are seemingly getting an extremely early preview of a top-tier consumer GPU we're unlikely to see until well into 2017 at the earliest. It's unlikely we'll see a GeForce GP100-based product this year, so what will we be getting instead? It's at this point where the rumours from the Far Eastern press come into focus.

Recent leaks have even extended to showing the casing for the Pascal reference cooler. In a world of 3D-printed fakery, such pictures should be treated with caution. That said, the background suggests that the photo was actually taken on the production line itself.

The leaks suggest that we'll see Pascal gaming cards in July this year, showcased at Computex in Taipei the month before. At least two cards are mooted - seemingly called GTX 1070 and GTX 1080 - designed to replace their Maxwell equivalents. The naming may seem rather odd, but another leak - showing 1070 and 1080 casing on the actual production line does seem to be compelling. Now, here's the thing - each of these products is said to be derived from another, smaller Pascal chip: GP104.

Nvidia has demonstrated that its next-gen 'smaller' chip can outperform its last-gen 'big' chip - exactly what we saw when the GTX 980 outperformed GTX 780 Ti (the ultimate iteration of the original Titan). The real question is just how small GP104 actually is. Another leak, purporting to show the actual die suggests that it's actually smaller than the GTX 980 equivalent, GM204 - anything from around 317mm² to 330mm², compared to the older chip's 398mm².

But it's the GTX 1070 that is almost certain to be the volume card in the line-up. The question is, just how daring will Nvidia be with it? When the GTX 970 was released, the green team redefined the high-end GPU market. It could overclock up to and beyond stock GTX 980 performance. It handily beat everything AMD had to offer - products that were over £200 more expensive at the time. The gambit paid off with phenomenal sales success, to the point where at its peak, the GTX 970 commanded over five per cent of the entire Steam userbase. Indeed, the March 2015 hardware survey still has it at 4.93 per cent overall. Bearing in mind the vast amount of GPUs on the market both old and new, that's a remarkable statistic. Will Nvidia aim to pull off the same trick a second time? Could a factory overclocked GTX 1070 outperform GTX 980 Ti in the same way that the 970 could beat the 780 Ti - the ultimate iteration of the first-gen Titan?

The leaks have been coming thick and fast - this shot purports to show the GP104 chip we'll be seeing soon in consumer-level graphics cards. Those memory chips surrounding the processor may well be our first look at Micron's new GDDR5X modules.

We'd like to think that Nvidia would aim to be just as audacious this time around. As phenomenal as GTX 970 has been, AMD's Radeon R9 390 has made a big comeback for the red corner. Dark Souls 3 aside, it's run most of the big games released this year on par or faster than the GTX 970, with titles like Quantum Break and Far Cry Primal in particular posting some highly significant increases in performance. And there remain question marks over Nvidia's DX12 performance too - we've seen big gains on AMD, but Nvidia's DX12 showing in titles like Hitman and Ashes of the Singularity hasn't exactly been overwhelming.

There are plenty of other question marks we hope to see addressed soon too. For example, we know that GP100 - the 'big Pascal' chip - is designed for next-gen HBM2 memory, but what's the score with the upcoming consumer cards? Titan X and GTX 980 Ti took GDDR5 memory pretty much to its limits with a 384-bit bus paired with 7gbps modules. Will Nvidia stick with tried and tested technology, or will it go for Micron's new, higher bandwidth GDDR5X? The recent leaks of PCB shots of the GP104 see it paired with currently unidentifiable Micron chips strongly suggest that at least one of the consumer level Pascal cards will ship with the upgraded RAM [UPDATE 28/4/15 11:33am: Looks like the Micron chips have indeed been identified as GDDR5X].

Only time will tell, but assuming the rumours and leaks of a June Computex reveal and a July release turn out to be true, we shouldn't have that long to wait - and we'll be on hand to review any and all Pascal products that come our way, with a revised benchmark suite encompassing newer titles running across both DirectX 11 and DX12.

Read this next