NVIDIA GeForce GTX480 Video Card Review :: PolyMorph engine and Raster Engine

03-26-2010 · Category: Hardware - Video Cards

By Benjamin Sun

The PolyMorph Engine has five stages: Vertex Fetch, Tessellation, Viewport Transform, Attribute Setup, and Stream Output. The first step begins by fetching vertices from a global vertex buffer. These vertices are sent to the SM for vertex shading and hull shading. In these two stages the vertices are transformed from object space to world space and tessellation factors are calculated. The tessellation factors are then sent to the Tessellator. The PolyMorph engine reads the tessellation factors and the Tessellator dices the patch and outputs a mesh of vertices.

The new vertices are sent to the SM where the Domain Shader and Geometry Shader are executed. The Domain Shader calculates the position of each vertex based upon input from the Hull Shader and the Tessellator. At this stage a displacement map is usually applied to add detailed features to the patch. The Geometry Shader conducts post processing adding and removing vertices and primitives where needed. The final results are sent to the Tessellator for the final pass. In the third stage, the PolyMorph Engine performs viewport transformation and perspective correction. Attribute setup follows, transforming post-viewport vertex attributes into plane equations for efficient Shader evaluation. Finally, vertices are optionally “streamed out” to memory making them available for additional processing.

Raster Engine

The Raster Engine is composed of three pipeline stages. In the edge Setup stage vertex positions are fetched and triangle edge equations are computed. Triangles not visible on the screen are removed via back face culling. Each edge setup unit processes up to one line, point or triangle per clock. This means that the GF100 can do four triangles per clock maximum.

The Rasterizer takes the edge equations for each primitive and computes pixel coverage. If anti-aliasing is enabled coverage is performed for each multisampling and coverage sample. Each Rasterizer outputs eight pixels per clock for a total of 32 rasterized pixels per clock across the chip. The Raster Engine performs Z-Cull which allows hidden surfaces not to be rendered saving bandwidth.