NVIDIA GeForce GTX480 Video Card Review :: Streaming Multiprocessor

03-26-2010 · Category: Hardware - Video Cards

By Benjamin Sun

NVIDIA GeForce GTX480 Video Card Review

The GT200 chip which preceded the GF100 had 240 Stream Processors grouped into ten texture processing clusters three Shader multiprocessors per TPC and eight Stream processors per Shader multiprocessor. The GF100 chip is a different beast altogether. Iíve already described the GPC which is the equivalent of the TPC on the GT200. Now itís time to describe the Streaming multiprocessor or SM.

Each SM comprises sixteen CUDA cores. This is double the number of processors that is found on the SM of the GT200 chip. There are also 16 load/store units which allow a total of 16 threads per clock to be processed. Above the cores is 32,768 32-bit registers or 128KB of register memory per SM. There are also two Warp schedulers per SM and two Dispatch Units per SM. There is 64KB of Shared memory/L1 cache per SM. This can be configured as 16KB of L1 Cache/48KB of Shared memory or the reverse.

Each SM has four Texture units meaning that with 16 SMs on a GF100 there are 64 texture units on a theoretical 512 Core GF100. Each texture address unit can output four samples in a clock, meaning that the GF100 can have 256 samples in a single clock. This compares to the 80 texture units/80 samples on the GT200. The texture units are clocked at Ĺ of the Shader clock and not the Core clock as was the case with the GT200 chip. The texture units support DX11 texture compression meaning that the texture filtering on the GF100 should be much more efficient than on GT200.