GPU Computing 101: Harnessing the Power of the GPU to Accelerate a Variety of Applications
In evaluating computer performance, a lot of attention is paid to the CPU, memory size, storage system, and various caches. Not everyone is aware of the impact the graphics card can have on a computer system’s performance, or the details about how and why the graphics card matters beyond 3D gaming applications. To that end, we’re going to take an introductory look at graphics cards which support GPU computing.
A GPU, or Graphics Processing Unit, is present in all but the most entry-level video cards. “The GPU enables software to offload computational work (typically graphics-oriented) from the CPU to the graphics card,” describes Hector Guevarez, portfolio manager for Lenovo workstations.
GPUs are specialized computational processors which are designed to work with large arrays or matrixes of data, where the processing to be performed is parallel in nature—meaning, the same calculation is being performed on the entire block of data. “CPUs are very serial—one task at a time,” states Sean Kilbride, technical marketing manager in NVIDIA’s professional solutions group. “A GPU is very parallel in nature. Whereas a CPU may be six or eight cores, the GPU may have hundreds or thousands of cores, more typically referred to as stream processors, also referred to more generically as compute cores, or for NVIDIA, CUDA cores.” At the current high-end of the market, the AMD FirePro W9000 boasts 2,048 stream processors, and NVIDIA’s Quadro K6000 GPU sports 2,880 CUDA cores.
AMD also has CPU/GPU combinations on one chip (called APUs) which provides faster interaction between the CPU and GPU. Instead of a separate video card, the video circuitry is in the same chip as the CPU.
GPU cards have access to memory shared with the host computer’s CPU, and also have internal memory. The amount of shared memory that the card can use, the amount of internal memory, the number of compute cores and the clock speeds of the various components all interact to dictate the processing throughput of the GPU system. And as you might expect, the quantity of each of these will dictate the cost of the cards.
“There are several different classes of graphics card,” adds Guevarez, “ranging from ones designed to handle 2D and entry-level 3D, through mid- and high-end 3D. The higher end you go, the more memory and processing cores you get.”
While GPU graphics cards can be had for under $100, the processing power of these cards is very limited (around 40 cores) and best for 2D graphics acceleration. The afore-mentioned cards from AMD and NVIDIA both have a price tag of over $3,000. But, there are a lot of options available between these two extremes.
Any video-oriented application is likely to benefit greatly from the use of GPUs. Video games, 2D and 3D modeling, ray tracing and animation, video production and editing all have data that can be processed in parallel, making them ideal for GPU integration.
“When doing real-time editing and compositing,” comments Andrew Baum, senior strategic alliance manager for AMD, “GPU processing enables the computer system to see the end result quickly and smoothly. Without a GPU, your video may not play back accurately frame-for-frame in your editing program, or your 3D model may take significantly longer to render. A GPU lets the technical user make good decisions quickly and proceed with their work, and gamers enjoy higher-resolution video and faster frame rates.”
How They Work
At a simplistic level, the cores of a GPU are designed to perform one multiply-accumulate operation for each clock cycle. In the cycle, the stream multiples two values together, and adds that result into the value in an accumulation register.
The workflow of a GPU system follows the same basic path for any GPU-enabled application, with some minor variations on the theme. (Note that this article isn’t intended to get into the nitty–gritty details of GPU systems, but to give you a overview of the concepts behind GPU computing.) Data is read into the computer’s main memory from a source—for instance, a frame of video from an MP4 file. Any non-GPU initial processing is performed by the CPU, and then the program copies the data into memory shared with the graphics card.
The GPU code to be executed is loaded into the graphics card by the host program, and the GPU is turned loose on the data. The GPU program reads the data from shared system memory (which is the slower memory for the GPU to access), and intermediate results are stored in the graphics card’s VRAM built-in memory (a much faster memory system for the GPU to access). When all the processing on the data is complete, the final results are written back out to shared memory for the host program to access, and the host system is informed that the processing is complete on that block of data.
To make use of this processing paradigm, the programmer needs to design their program with GPU parallel processing in mind, and choose what types of API (application programming interface) they will use to communicate with the GPU.
APIs: Programming for GPU Compute
While the programmable hardware was ready and available for programmers to utilize, there was no standard software API defined to allow programmers to make use of the flexible power of the first programmable GPU chips. While the OpenGL (Open Graphics Library) standard existed, it was not sophisticated enough to allow the user to define their own program to run on the GPU. And while updates to OpenGL to support user programmability were in the formative stages, it would be some time before a standard was agreed upon by all interested parties, and NVIDIA had vendors like Adobe eager to harness the power of the GPU in their applications.
“When we first introduced the concept of programmable pixel shaders,” states Kilbride, “it enabled game developers to program complex operations on pixels. Some programmers decided to use this to do mathematics on other data. This was the beginning of general-purpose computing on GPUs. NVIDIA hired some of the people who did the pioneering work in this field, and they developed NVIDIA’s CUDA programming language.” CUDA is based on the C programming language.
CUDA, however, is NVIDIA proprietary, and programs written using CUDA will only make use of NVIDIA GPUs.
A few years after the development of CUDA, the updates to the OpenGL standard were ratified, allowing programmers to develop hardware-agnostic programs that would run on any GPU system supporting OpenGL. This opened the door for users to select AMD cards, as well as cards from other venders, for running programs developed with OpenGL.
More recently, Apple developed an open API called OpenCL (Open Computing Language), which was handed off to the non-profit Khronos Group for administration and maintenance. OpenCL is a computing API developed to more generically support parallel computing that may have nothing to do with graphics. Intel, AMD, NVIDIA and others have adopted support of OpenCL. Fluid mechanics, weather pattern modeling, and seismic data processing are all non-graphics related applications that would make excellent use of GPU processing regardless of the specific API interface used, and both CUDA and OpenCL are being adopted to offer significant performance gains in these types of operations.
You don’t have to be a nerd or a programmer to benefit from GPU computing. As mentioned, gamers are a huge market for consumer-level GPU systems, which provide similar power to workstation-class video cards but typically in a larger form-factor, louder cooling noise levels and shorter warranty time period. Today’s intensive first-person 3D games would be pretty much unplayable due to low frame rates and low video resolution without the power of the GPU.
The SETI (search for extraterrestrial intelligence) at Home project, which enables individuals to lend out their computer during low-usage times to the SETI program for data processing purposes, now makes use of GPUs to increase the processing speed. Bitcoin, a form of online currency, saw major thanks to significant use of multi-GPU systems (particularly AMD GPUs which have more integer operations available).
GPUs have also increased performance in big data processing such as image detection and the implementation of artificial neural networks. Basically, anywhere there’s a large amount of data where processing of that data can happen in parallel will benefit—medical imaging, bioinformatics (the processing of biological data), fluid dynamics, finance, seismic exploration, and defense all make great use of the power that GPUs can bring to bear on processing.
Titan, an NVIDIA-based supercomputer at Oak Ridge National Laboratory, uses 18,688 GPUs coupled with 299,008 AMD Opteron CPU cores to achieve over 20 petaflops of processing power.
The other huge winner of GPU computing is today’s video and graphic-design professionals.
“Adobe has really picked up the torch with seeing what can be done with GPUs—how can we make the entire process more efficient,” states Kilbride. “Once it starts getting displayed on the screen, processing happens in the GPU if one is available. Color correction, blend mode, and many transitions and effects are all accelerated.”
“Adobe is rewriting code to take maximum advantage,” adds Michael Kaplan, manager of business development efforts in entertainment for NVIDIA. “Adobe applications have become very GPU-centric. While the initial GPU utilization in Adobe Premiere was written in CUDA, as of the CC release, all applications utilizing the GPU use OpenGL and OpenCL, making them GPU-vendor agnostic.” Cards with at least 1 GB of RAM can be used by Adobe CC.
While GPUs can help greatly with performance, they also aren’t the panacea that some may think.
“There’s a big misconception about GPUs,” states Dave Helmly, senior manager for pro video solution consulting, Americas for Adobe. “It doesn’t help everything. The CODECs (encoding and decoding software) do not take advantage of GPUs, nor does just basic playback. However, intrinsic effects like scaling, compositing mode, transparency, and more than 75 effects are all GPU accelerated.”
Helmly continues to describe what parts of Adobe Creative Suite make use of the GPU. Premiere is the largest beneficiary from the GPU, with many bundled effects written to take advantage of GPUs with OpenCL support. Premiere’s color-coded timeline gives you a good idea of the GPU advantage. Sections of timeline that were color-coded in red (indicating a need for high CPU utilization and potential risk of lower frame-rate playback) prior to a GPU installation will now see a lot more yellow and green (rendered video), indicating lower expected CPU loads and smoother playback.
Today After Effects makes little use of the GPU, as the current program architecture does not lend itself to GPU integration. Its new 3D ray-tracing functions, however, are GPU-enabled, so those using After Effects to do 3D ray-tracing work will see significant performance improvement. Adobe is keenly aware that After Effects users are looking for more processing power using all available system resources, so watch for announcements from Adobe in the future. Photoshop also makes use of the GPU, and is implemented using OpenCL and OpenGL. Therefore, Photoshop users will benefit from both NVIDIA, AMD and GPU products.
With other company’s applications, the benefit varies as well.
“Avid media composer is about seven times faster if a graphics card is available,” states Kaplan. “Sony Vegas makes use of the GPU, and Final Cut X has implemented some degree of OpenCL integration. Of all of these, Final Cut takes the least advantage of the GPU at the moment.”
Live video platforms also take advantage of GPU systems. Miranda multi-viewers for showing multiple windows of video on one display are based on GPU systems from NVIDIA to get the video processing throughput needed, as well as live video switchers and production workstations.
Your computing platform also can make a difference.
“On the Mac side, there is very limited choice,” states Helmly. “Apple controls the device drivers, and Apple doesn’t provide CUDA on the system—you have to install it separately. Apple puts in safeguards to make their systems safer from a stability standpoint, but this also limits how the GPU can be utilized. Apple users don’t get 100% of the benefit from the same hardware as a PC user, and so Apple users are at a disadvantage. Apple makes it safer, but less useful.”
Helmly adds you can’t simply take an Apple Thunderbolt chassis, add an NVIDIA Maximus solution (which consists of a Quadro graphics card with a Tesla compute card), and get the same performance improvement a PC would see. Thunderbolt 1 is 4x lanespeed and Thunderbolt 2 is 8x lanespeed, whereas PCI Express is 16x. Therefore, access to system resources like memory is slower.
“If you need raw computing power, buy a PC,” concludes Helmly. “A high-end PC with an NVIDIA 690 card will annihilate anything around it.”
Putting GPU Computing to the Test - Observations
NVIDIA sent me a Quadro K4000 card for evaluation, and I was excited to see what sort of performance improvement I’d experience from my primary application: Adobe Premiere. As video production is my primary occupation, I spend a lot of time in Premiere and After Effects.
My base system is an AMD Phenom II six-core machine with 16 GB of memory and two 1TB RAID0 arrays as internal hard-disk storage. Until now, I had two graphics cards installed to support my three video monitors, with one card being an old NVIDIA card with 32 processing streams.
Playing back straight video, even when scaled, has never been a problem on my system. However, whenever I’d start compositing things, using transparency to overlay a color on top of video, or adding effects, my frame rate would definitely drop below the project frame rate – sometimes lower than 1 frame per second depending on the complexity of the project.
With the K4000 card replacing my two low-end cards (the K4000 supports three monitors directly), I have the follow observations.
The first thing I noticed right off the bat is the sound – or lack of it, to be more precise. This card is quiet! Even under heavy GPU workloads, the cooling system makes very little noise. I had tried a 192-core consumer-class graphics card manufactured by Gigabit a couple years ago to see how the Sony Vegas video editing application would make use of it (and unfortunately, Vegas at that time gave so many wrong results with GPU utilization enabled I wasn’t able to make use it), and that card sounded like a jet flying through my office. I was expecting some noise with the K4000, and am blown away by the quietness of the card.
Rerunning the Windows Experience Index caused a doubling of the index for graphics on my system, which was nice to see. 3DMark11 (run with the default settings) scored an overall P4098, with a graphics score of 4003 and a physics score of 5222.
For practical evaluation, Adobe Premiere CS6 ran noticeably faster with the K4000. For one project that I have underway, the 25 minute timeline used to be color-coded almost 100% red, with stilted playback. Many parts of the timeline have a semi-transparent red slide composited over H.264 video, with color correction added to the video, and another slide with drop-shadowed text composited on top. When playing the timeline with my old graphics cards, playback was very stilted, and all six cores were maxed out at 100%. With the K4000, playback was smooth, with all six cores under 50% utilization due to offloading the work to the GPU.
Will a serious GPU card help your system perform better? That depends. First, is the software you run written to take advantage of a GPU? If not, there’s no point in making the investment. For example, if you use Adobe After Effects all day with little use of the new 3D features, don’t waste your money on a GPU card. Second, what other bottlenecks impact your performance? You may benefit more from an upgrade to SSD storage over the addition of a GPU, if access to your data is your bottleneck. And if you’re doing basic 2D work that would benefit from a GPU, the $3,000 card may not give you any more performance boost than a $400 card if the processing needs of the application are modest.
Gamers don’t need much convincing that a graphics card upgrade will generally improve performance, but so few non-gamers realize how an investment in a good graphics processing has the power improve so many other kinds of applications, and like in the case of Adoble’s video editing program, the GPU can make a huge difference.