Table of Contents:
AMD’s Cayman architecture is part of their Northern Islands family of chips and is a new chip based upon TSMC’s 40nm process and has a die size of 389mm2 compared to the 520mm2 found on NVIDIA’s GeForce GTX580/570 boards. The new chip has 2.64 billion transistors which is a nice jump up from the 2.17 billion found on the HD 5800 series and 1.7 billion on the HD 6800 series.
AMD has included two graphics engines on a single chip with the 69xx series chips with dual rasterizers, Tessellators, Hierarchical Z, Vertex and Geometry Assemblers and Ultra Threaded Dispatch Processors. AMD has also moved from the 5D architecture of previous cards (including the recently released HD 6870) to a VLIW 4 architecture with up to 1536 Stream Processors per card.
If you remember the HD 5800 series that preceded this architecture the chip had 1600 SPs split into 20 blocks of 16 SPs each (SIMDs). Each block could do four simple operations+ special function so in essence, AMD called their architecture 1600 SPs. With the HD 6970 the chip has a maximum of 24 SIMDs with blocks of 16 ALUs per SIMD.
Each ALU can do up to 4 32-bit FMA, MAD, MUL or ADD Floating point operations per clock, 2 64-bit ADD, 1 64-bit FMA or MUL, or 1 Special Function (transcendental) operation per clock. In essence, there are 64 more ALUs than on the Redwood series and the way AMD counts the SPs is to count the instructions so the HD 6970 has 24x16x4 (24 SIMDs x16 blocks x 4 SPs per block) or 1536 SPs.
The VLIW4 thread processor is a 4-way co-issue processor with all stream processing units having equal capabilities. If a block is used for a special function operation it now occupies 3 of 4 issue slots. This design allows for 10% improvement in performance/mm2 than the previous generation, simplified scheduling and register management and extensive logic re-use, providing a more efficient architecture than Cypress.
AMD has upgraded the Render back-ends of their new Cayman architecture with a coalescing of write ops. 16-BIT integer operations are now 2x faster than Cypress. 32-bit FP (single/double component) ops are 2-4xs faster depending on the operation. The of Z/Stencil units is 128 and they have 96 texture units and 32 color ROP units on the 6970.
- VLIW4 architecture
- 24 SIMDs
- 1536 SPs
- Dual Graphics engines
- 96 texture units
- upgraded back ends
- 256-bit memory interface
- 5.5Gbps memory speed
- VLIW4 thread processors
- 4-way co-issue
- All stream processing units have equal capabilities
- Special functions (transcendental) occupy 3 of 4 issue slots
- Dual-rate geometry setup
- off-chip geometry buffering
- 8th generation Tessellator
- Morphological AA
- Enhanced Quality AA
- Enhanced AF and Texture filtering
- AMD PowerTune Technology
- 40nm process
- 2.64 billion transistors
- 880MHz Core clock
- 2.7 TeraFLOPS compute performance
- 84.5Gtexels/second texture Fillrate
- 28.2 Gigapixels/second pixel fill rate
- Z/Stencil ROPs 128
- 256-bit GDDR5 memory
- 176GB/s memory bandwidth
- 250W PowerTune maximum power
- 190W Typical gaming power
- 20W typical idle power
One of the issues that all modern graphics architectures face is the need for computational efficiency. The Cayman has asynchronous dispatch capabilities meaning it can execute multiple compute kernels simultaneously and each kernel has its own command queue and protected virtual address domain. The dual graphics engines also translate into dual bidirectional DMA engines for faster system memory reads and writes, with direct fetching to LDS and faster double precision ops (1/4 SP rate versus 1/8 on HD 5870.
One of the things NVIDIA has been pushing with their GF architecture has been that their Tessellation performance is much higher than AMD’s. AMD addresses this in Cayman with dual-rate geometry setup, off-chip geometry buffering and their 8th generation tessellation engine. As each graphics engine has their own tessellation unit the Cayman can process two primitives per clock and has 2x transform and backface cull rate. The Cayman has up to 3x performance of HD 5870 and also has two rasterizers with up to 32 pixels per clock.
Cayman introduces a new form of anti-aliasing to the AMD chips, Enhanced Quality Anti-Aliasing. EQAA uses the 8 color samples from MSAA and uses up to 16 coverage samples per pixel. If you might remember NVIDIA introduced Coverage Sample Anti-aliasing a while ago with their cards. The number of color and coverage samples can be independently controlled with custom sample patterns and filters. Its compatible with Adaptive AA, Super-Sample AA and Morphological AA which is a post-process filtering technique accelerated with DirectCompute.
OK, now for the big one PowerTune. In previous cards the card either delivered full power or idle power depending on the load. While this was good the Green initiative and the need to cut power costs has forced companies like NVIDIA and AMD to come up with power-saving technologies. AMD has included an integrated control processor that monitors GPU activity in real-time and dynamically adjusts clock to enforce TDP maximum. It provides direct control over GPU power draw instead of clock/voltage tweaks The Overdrive utility allows you to set the maximum power limits of the HD 6970 or HD 6950 from 20% lower than default to 20% higher than default. The game power draw is typically 190W on the HD 6970 with 20% increase that goes to up to 250W. Idle power draw is 20W.