GTX 1080 seems like such an odd product name, since it brings up the specter of gaming on a 1080p display. The GTX 1080 kills 1080p gaming dead, makes 1440p gaming the new normal, and finally puts 4K gaming within reach of a single GPU. While the GTX 1080 offers great performance, other attributes make the new GPU attractive for gamers. Let's be clear: the GTX 1080 represents the fastest single GPU graphics card you can buy, but performance may not be the primary reason to buy this card.
By the Numbers
Let's first touch on base specifications. Based on Nvidia's latest Pascal GPU architecture, Nvidia builds the GTX 1080 on a 16nm FinFET process at Taiwan's TSMC fab. This represents the first process shrink for an Nvidia GPU in two architectural generations, since the original Kepler-based GTX 680 moved to 28nm. FinFET technology incorporates transistors which extend vertically (the "fin"). FinFET reduces current leakage, enabling greater power efficiency. This allows Nvidia to build monster GPU chips without creating space heaters, if you will.
That process technology allows Nvidia to create 7.2 billion transistor GPU using a 314mm2 die, considerably smaller than the GTX 980 die while stuffing an additional two billion transistors. This smaller, denser chip clocks at 1.6GHz base clock and 1.73GHz in boost mode; the GPU looks like it offers substantial overclocking headroom, if that floats your boat.
In addition to all the process technology goodness, the GTX 1080 uses Micron's shiny new GDDR5X memory technology, which transfers data at 10 gigatranfers per second, boosting memory bandwidth by 30% over the GTX 1080 and within striking distance of the memory bandwidth of the massive GTX Titan while using a narrower, 256-bit memory bus. Pascal also improves on Maxwell's memory compression with its fourth generation delta color compression. Depending on game title, the new color compression techniques improve bandwidth 15-30%.
The bottom line: the GTX 1080 has almost as many shader cores as the GTX 980 Ti, runs them 60% faster, and can move data almost as quickly. Based on these numbers alone, we'd expect a serious performance uptick.
Nvidia also announced the GTX 1070, a feature-reduced version of Pascal. Most GTX 1070 specs were not disclosed, however. What we currently know is that the GTX 1070 costs $379 and ships June 10. The card still has 8GB of video memory, but uses the lower-cost GDDR5 at 7gbps rather than the more exotic GDDR5X. As for performance, Nvidia only stated that it should perform roughly as well as the $1K Titan X.
|Specs||GTX 1080||GTX 980||GTX 980Ti||GTX Titan X||Radeon Fury Nano|
|Power Connectors||1x8-pin||2x6-pin||1x8-pin, 1x6-pin||1x8-pin,1x6-pin||1x8-pin|
|Process Tech||16nm FinFET||28nm||28nm||28nm||28nm|
|Transistor Count||7.2 billion||5.2 billion||8 billion||8 billion||8.9 billion|
The New Shiny
Every new GPU generation arrives with some shiny new features which make the new hotness seem like the best thing ever. Many of these features never see common use. For example, the GTX 980 included a very cool lighting feature called Voxel Global Direct Illumination, which never really saw the light of day in any shipping games.
What makes the GTX 1080 compelling isn't raw speed. It's some useful new features which allow it to surpass archrival AMD in DirectX 12, improve performance in VR applications, and enable some cool new user-oriented features (some of which may also show up in drivers for older GPUs). These features look like they could be quite useful in certain applications, improving both image quality and increasing performance, unlike shiny, but little-used features in previous releases.
Microsoft's Direct3D 12 aims to improve the efficiency of 3D graphics. One key technique is asynchronous compute. Today's games include many different processes overlapping in time, so waiting around for the GPU to get to some task that could be running creates bottlenecks. Asynchronous compute allows the GPU to run some independent workloads simultaneously. These include GPU-based physics, deferred postprocessing, and asynchronous timewarp (used in VR applications).
Asynchronous compute works best when hardware fully supports it, as AMD's Radeon Fury GPUs do. Nvidia's Maxwell GPUs did offer some support for asynchronous compute, but in a more limited way. Maxwell could pre-allocated resources for specific compute tasks, but if a task completed early, those resources remained unavailable for other work until the full time interval completed.
The GTX 1080 adds several features to improve asynchronous compute. Dynamic load balancing enables completed tasks to release resources as soon as a task completes, allowing those resources (usually a combination of shader ALUs and memory) to be allocated to other tasks. Two related features, Pixel level preemption and thread-level compute preemption give Pascal the ability to rapidly switch workloads. The combination of these features significantly boost the GPU's asynchronous compute ability.
Games render 3D worlds in, well, three dimensions. The GPU then creates a 2D view into the 3D world and projects it onto your display. Appropriately known as viewport projection, you can think of it logically as looking through the window (your display) onto a 3D world. When you run with a single display, viewport projection pretty much works as intended with all 3D graphics cards. It's when you run multiple displays that you may encounter problems.
I've never been a big fan of surround gaming, where three monitors create a wrap-around view of the virtual world. One big reason is the distortion generated at the outside views. The distortion exists because of the need to project the viewport to your eye. Three monitors arranged in a flat view would look right, albeit with an enormously wide field of view. If you angle the outside displays to create a more immersive view, the 3D world is still being projected into a very wide, flat field-of-view, distorting the surround view. You can correct the perspective, but that requires three rendering passes for each viewport.
Simultaneous multi-projection (SMP) allows different viewports, each of which may have a different eyepoint, in a single rendering pass. The GTX 1080 supports up to 16 simultaneous viewports in one pass. Nvidia builds in SMP into the Polymorph engine, which every streaming multiprocessor (SM), which is Nvidia's modular core architecture. A superset of the older viewport transform engine, the SMP engine handles standard viewport transforms as well.
SMP probably won't create a rush to implement surround gaming with most players, but the technology may be more important for an emerging technology: virtual reality. Pascal potentially improves VR rendering two ways: single-pass stereo and lens matched shading.
Single-pass stereo works similarly to the triple-display perspective correction mentioned above. The GTX 1080 can render two different viewports, with two different eyepoints. Since head-mounted displays such as the HTC Vive and Oculus rift use two separate, tiny displays, stereo today requires two passes. The ability to render stereo in a single pass essentially doubles performance.
VR displays incorporate a lens between the integrated displays and your eye, in order to help you focus on these tiny, close-up monitors. Requiring these lenses requires two steps. The rendering pass generates the normal, undistorted image. The second step squeezes and distorts the image so that it looks correct after passing through the lens.
The left-hand image shows what the GPU initially renders; right-hand image represents what gets projected through the lens. The GPU needs to first render a much larger image – nearly double the number of pixels – then distort the image down for proper projection through the lens. Since you want to hit the magic 90fps for a good VR experience, something has to give – usually image quality.
Using SMP, Pascal can use a technique called lens matched shading, using four viewports for each eye, with one eyepoint per eye. The GPU needs to render far fewer pixels, increasing overall performance.
The GTX 1080 now renders far fewer pixels, and the compute step for the final pre-distortion step takes less time as well. This can double performance in VR, allowing VR developers to use more graphics eye candy while still exceeding the magical 90fps number.
Another technique potentially boosted by SMP is foveated rendering. We're now seeing ultra-wide, 21:9 displays, some of which are getting quite large. This begs the question: does your eye see all the detail over the entire screen? In fact, you're mostly looking at the center of the screen, so one technique for improving performance with ultra-wide screens is to dial down visual settings around the periphery – particularly anti-aliasing samples, which consume a lot of memory bandwidth and other GPU resources. The trick is to use different viewports across the display, with the primary, central viewport rendered at full graphics glory while the peripheral viewports use lower sample rates.
Improved Display Pipeline
High dynamic range rendering has garnered a bad rep among many gamers, mostly because HDR rendering usually looks like excessive light bloom, rather than the wider color gamut you'd see with actual HDR. HDR-capable displays offer wider color gamuts, higher color saturation, and improved contrast ratios. An HDR display showing off true HDR content looks substantially better than simply adding more pixels.
The display controller in the GTX 1080 can handle 12-bits-per-pixel color and supports advanced color modes, including BT.2020 and SMPTE 2084 color quantization. Standards for Ultra HD displays, including PQ10 from the Ultra HD forum, use a combination of BT.2020, SMTE 2084, and 10-bit already exist and Ultra HD TVs using the new HDR standards are already in the pipeline. PC monitors utilizing these standards will likely arrive by early 2017. Pascal also supports HEVC 10/12-bit encode and decode.
You need the right connections to support these emerging HDR standards. The GTX 1080 delivers HDMI 2.0b as well as DisplayPort 1.4, which became an official standard in March. The GTX 1080 also includes a single, dual-link DVI-D connector, which means that analog video output, aka the VGA connector, is finally dead.
Trading off between V-SYNC on and V-SYNC off has always been problematic. Enabling V-SYNC yields better image quality, but increases latency because the rendering pipeline slows down to the refresh rate. V-SYNC off allows the game to run as fast as possible, but image tearing on the display looks terrible.
Nvidia's answer is Fast SYNC, which decouples the rendering pipeline from the display engine. Fast SYNC disables monitor V-SYNC, but will only display completely rendered frames. This works best if the game already runs at high refresh rates, which is often the case in competitive gaming, many indie games which aren't highly graphics intensive, or older games. Fast SYNC is independent of high refresh rate technologies like Nvidia's own G-Sync.
Just for Fun: Ansel
Nvidia also showed off a specialized screen capture application called Ansel, named after famous landscape photographer Ansel Adams. Ansel's goal is to give complete control over the screen capture process to the player, rather than forcing players to hit the screen capture button at exactly the right time. Some well-known artists even get special game builds which free the 3D camera, allowing them to capture frames wherever they want.
Ansel requires game developers to modify their code, but it's the equivalent of a plug-in, typically requiring only a few lines of code to implement. Ansel does all the heavy lifting, taking over the game's camera to allow positioning and capture. Ansel adds a bunch of effects, fully controllable by the user, and also enables insanely high capture resolutions for people who may want large-format prints. Ansel can also capture a full 360-degree surround image for use in VR.
Now that we've seen the new bells and whistles, let's talk performance. Bear in mind that games shipping today don't incorporate most of these new features, so the reason to buy a GTX 1080 today would be improved performance. I looked at 3DMark, a pair of DirectX 11 games, and a pair of DirectX 12-enabled games to give some guidance on performance. I also fired up Steam's VR Performance Test to see just how VR games might fare.
I ran all the tests on a mainstream system that resembles hardware most users might own, rather than an extreme high-end system:
Intel Core i7 6700K
Gigabyte Z170X Gaming 7 motherboard
32GB DDR4 2600 running at 2133MHz
Samsung EVO 850 SSD
750W Seasonic PSU
Windows 10 Pro
The system lives inside a Corsair Obsidian 550D case instead of an open-air setup, also more representative of actual systems. Since I wanted to test 4K performance, I used an LG31MU97. Although the LG display is 4K DCI (4,096 x 2,160 pixels), the 4K benchmarks all ran at 4K UHD (3,840 x 2,160 pixels).
Each game ran at four resolution / setting combinations: 2,560 x 1,440 high and maxed out, as well as 4K UHD high and maxed out. "High" means I used the in-game high preset, which is usually one below "ultra" for most games. If a game supported more than one setting above the High preset, I stuck with High for consistency. "Max" means I started with the highest game setting, then hunted down every graphics slider option and pushed them up to eleven. Even at the highest preset, most games don't seem to really max out everything. Here "max" means "maxed out".
I benchmarked four different GPUs: Nvidia's GTX 980, GTX Titan X, GTX 1080, plus AMD's Radeon Fury Nano. The Nano may be marginally slower than AMD's higher-end Fury X, running at a slightly lower clock frequency but uses the same GPU and 4GB of HBM memory. Inside a big case with good airflow, the Nano shouldn't throttle too badly.
The latest iteration of 3DMark now adds 4K support; I ran 3DMark (Standard), 3DMark Extreme, and 3DMark Ultra. 3DMark still uses only DirectX 11, however, so that generally gives Nvidia an edge.
I use 3DMark more as a sanity check than a predictor. As I expected, most of the performance difference seems to lie in the higher clock frequencies and improved memory bandwidth. Running at 1.6GHz allows the GTX 1080's 2560 shader cores to outperform the GTX Titan, with its 3,072 shader ALUs.
DirectX 11 Games
First up is a slightly older game. Since Dragon Age: Inquisition runs on EA's Frostbite Engine, it's predictive of how Frostbite-based games might fare.
At 2,560 x 1,440 with everything maxed out, the GTX 1080 beats the original GTX 980 by 70% -- not a bad showing. Note how even in DX11, AMD's Fury Nano edges out the GTX 980.
Ubisoft's Tom Clancy's The Division represents the cutting edge of DirectX 11 titles, incorporating a raft of features which improve image quality, all at the cost of frame rate. I decided to make this test completely unfair, so I turned on all the Nvidia-based eye candy, such as HFTS soft shadows and HBAO+ ambient occlusion when running on max settings. This no doubt gives the AMD card a slight advantage. But will it make a difference?
Once you add in all the extra Nvidia graphics goodness, the Radeon Nano even outperforms the GTX Titan X. However, it still falls behind the GTX 1080 by 30% at 2,560 x 1,440 and almost 26% at 4K UHD. If you dial down the settings a bit, The Division becomes quite playable at 4K with a single card.
DirectX 12 Games
I used two very different games to test Direct3D 12 on the 1080: Ashes of the Singularity, a classic, old-school RTS that throws hundreds of units and lots of real-time effects on-screen, and the DX12 update of Rise of the Tomb Raider.
Ashes generated some notoriety as one of the first benchmark-capable DirectX 12 games. AMD scored some serious points by substantially outperforming Nvidia in Ashes of the Singularity running DX 12 when the betas first appeared. How does the GTX 1080 fare?
At 1440p, the Radeon Nano actually does pretty well, even edging out the GTX 1080 by a small margin. At first, I thought I might have made a rookie mistake, but multiple retests remained consistent across AMD and Nvidia GPUs. When you push resolution up to 4K UHD, the picture changes. The GTX 1080 pretty much crushes at 4K UHD, though the Nano still does pretty well against Maxwell-based GPUs. At higher resolutions, the Nano's 4GB video memory may be its Achilles' Heel.
Turning to Rise of the Tomb Raider, it really does feel that the era where 4GB may no longer be enough if you want to run at the highest possible settings. As with The Division, I cranked up everything, even Nvidia-specific settings, for the "max" settings.
Now that we have more than one DX12 benchmark available, it's worth noting that the Nano still fares reasonably well against the GTX 980. Rise of the Tomb Raider turns out to be pretty demanding, but the GTX 1080 sweeps at all resolutions and settings. When you look at the 4K benchmarks, the Nano seems to really fall off the pace. It could be throttling due thermal issues, but the GTX 980 fares worse than the Nano at max settings. Most of the issue may simply be the 4GB video RAM limiting performance. The GTX Titan and GTX 1080 both outperform the 4GB cards across the board.
AMD once again finds itself playing catch-up. The Radeon Fury cards look pretty good against Maxwell, but can't really compete on performance alone against the GTX 1080. The GTX 1070, when it ships, will only make matters worse. AMD can play the pricing game, but the high cost of HBM may make that difficult.
I believed AMD would have announced GPUs using company's next generation Polaris architecture by now, though Polaris may be better referred to as "GCN 1.5". Rumors suggest Polaris may finally launch in June, targeting the midrange market ($300 or less). That's where the volume is, so if Polaris performs reasonably well, AMD can stay in the game, even if their high-end products can't really compete. Nvidia has yet to hint at midrange Pascal-based GPUs yet, but those are likely not far behind.
AMD's true next-gen GPU, code-named Vega, likely won't ship until 2017, so the company will need to remain content with more volume-oriented products. If AMD can get Polaris-based cards out in June, then AMD should remain relevant, even though the company will lack a flagship product.
The GTX 1080 is pretty damned fast in today's generation of games, and will likely only improve over time.
In the end, the GTX 1080 is pretty damned fast in today's generation of games, and will likely only improve over time. Most of the performance improvement in the current title crop result from the higher core and memory clocks of the GTX 1080, but features such as simultaneous multi-projection will likely make a major impact as VR titles come to the fore. Nifty toys like Ansel also make Nvidia's new card more attractive. What's surprising is just the superiority gap between the GTX 1080 and the GTX 980.
On the other hand, if you're running on mainstream hardware and don't have an infinite budget, you might wait and see how the $379 GTX 1070 performs. Titan X performance at $379 looks pretty attractive. As usual, prices will vary depending on factory overclocks – the 1080 seems to have a ton of headroom for overclocking. Should you drop $599 ($699 for Nvidia's "Founder's Edition" reference card) for a GTX 1080? What you get is essentially GTX 980 SLI performance in a single card, so that may be worth the price of admission.