X2
05-19-2005, 21:56
http://www.anandtech.com/tradeshows/showdoc.aspx?i=2423
This year's E3 has been, overall, a pretty big letdown. The show itself hasn't been very interesting simply because it's mostly current-gen titles and hardware. For the E3 just before the launch of Microsoft's Xbox 360, we were a bit disappointed not to see any working hardware at the show outside of the ATI booth.
With a relatively light schedule thanks to the small size of the show, we were able to spend quite a bit of time digging deeper on the two highlights of this year's E3 - ATI's Xbox 360 GPU, and NVIDIA's RSX, the GPU powering the PlayStation 3.
Given that both of the aforementioned GPU designs are very closely tied to their console manufacturers, information flow control was dictated by the console makers, not the GPU makers. And unfortunately, neither Microsoft or Sony were interested in giving away more information than their ridiculously light press releases.
Never being satisfied with the norm, we've done some digging and this article is what we've managed to put together. Before we get started, we should mention a few things:
1) Despite our best efforts, information will still be light because of the strict NDAs imposed by Microsoft and Sony on the GPU makers.
2) Information on NVIDIA's RSX will be even lighter because it is the more PC-like of the two solutions and as such, a lot of its technology overlaps with the upcoming G70 GPU, an item we currently can't talk about in great detail.
With those items out of the way, let's get started, first with what has already been announced.
The Xbox 360 GPU, manufactured by ATI, is the least PC-like of the two GPUs for a number of reasons, the most obvious being its 10MB of embedded DRAM. Microsoft announced that the 10MB of embedded DRAM has 256GB/s of bandwidth availble to it; keep this figure in mind, as its meaning isn't as clear cut as it may sound.
The GPU operates at 500MHz and has a 256-bit memory interface to 512MB of 700MHz GDDR3 system memory (that is also shared with the CPU).
Another very prominent feature of the GPU is that it implements ATI's first Unified Shader Architecture, meaning that there are no longer any discrete pixel and vertex shader units, they are instead combined into a set of universal execution units that can operate on either pixel shader or vertex shader instructions. ATI is characterizing the width of the Xbox 360 GPU as being 48 shader pipelines; we should caution you that these 48 pipelines aren't directly comparable to current 16-pipeline GPUs, but rest assured that the 360 GPU should be able to shade and texture more pixels per clock than ATI's fastest present-day GPU.
Now let's move on to NVIDIA's RSX; the RSX is very similar to a PC GPU in that it features a 256-bit connection to 256MB of local GDDR3 memory (operating at 700MHz). Much like NVIDIA's Turbo Cache products, the RSX can also render to any location in system memory, giving it access to the full 256MB of system memory on the PS3 as well.
The RSX is connected to the PlayStation 3's Cell CPU by a 35GB/s FlexIO interface and it also supports FP32 throughout the pipeline.
The RSX will be built on a 90nm process and features over 300 million transistors running at 550MHz.
Between the two GPUs there's barely any information contained within Microsoft's and Sony's press launches, so let's see if we can fill in some blanks.
ATI has been working on the Xbox 360 GPU for approximately two years, and it has been developed independently of any PC GPU. So despite what you may have heard elsewhere, the Xbox 360 GPU is not based on ATI's R5xx architecture.
Unlike any of their current-gen desktop GPUs, the 360 GPU supports FP32 from start to finish (as opposed to the current FP24 spec that ATI has implemented). Full FP32 support puts this aspect of the 360 GPU on par with NVIDIA's RSX.
ATI was very light on details of their pipeline implementation on the 360's GPU, but we were able to get some more clarification on some items. Each of the 48 shader pipelines is able to process two shader operations per cycle (one scalar and one vector), offering a total of 96 shader ops per cycle across the entire array. Remember that because the GPU implements a Unified Shader Architecture, each of these pipelines features execution units that can operate on either pixel or vertex shader instructions.
Both consoles are built on a 90nm process, and thus ATI's GPU is also built on a 90nm process at TSMC. ATI isn't talking transistor counts just yet, but given that the chip has a full 10MB of DRAM on it, we'd expect the chip to be fairly large.
One thing that ATI did shed some light on is that the Xbox 360 GPU is actually a multi-die design, referring to it as a parent-daughter die relationship. Because the GPU's die is so big, ATI had to split it into two separate die on the same package - connected by a "very wide" bus operating at 2GHz.
The daughter die is where the 10MB of embedded DRAM resides, but there is also a great deal of logic on the daughter die alongside the memory. The daughter die features 192 floating point units that are responsible for a lot of the work in sampling for AA among other things.
Remember the 256GB/s bandwidth figure from earlier? It turns out that that's not how much bandwidth is between the parent and daughter die, but rather the bandwidth available to this array of 192 floating point units on the daughter die itself. Clever use of words, no?
Because of the extremely large amount of bandwidth available both between the parent and daughter die as well as between the embedded DRAM and its FPUs, multi-sample AA is essentially free at 720p and 1080p in the Xbox 360. If you're wondering why Microsoft is insisting that all games will have AA enabled, this is why.
ATI did clarify that although Microsoft isn't targetting 1080p (1920 x 1080) as a resolution for games, their GPU would be able to handle the resolution with 4X AA enabled at no performance penalty.
ATI has also implemented a number of intelligent algorithms on the daughter die to handle situations where you need more memory than the 10MB of DRAM on-die. The daughter die has the ability to split the frame into two sections if the frame itself can't fit into the embedded memory. A z-pass is done to determine the location of all of the pixels of the screen and the daughter die then fetches only what is going to be a part of the scene that is being drawn at that particular time.
On the physical side, unlike ATI's Flipper GPU in the Gamecube, the 360 GPU does not use 1T-SRAM for its on-die memory. The memory on-die is actually DRAM. By using regular DRAM on-die, latencies are higher than SRAM or 1T-SRAM but costs should be kept to a minimum thanks to a smaller die than either of the aforementioned technologies.
Remember that in addition to functioning as a GPU, ATI's chip must also function as a memory controller for the 3-core PPC CPU in the Xbox 360. The memory controller services both the GPU and the CPU's needs, and as we mentioned before the controller is 256-bits wide and interfaces to 512MB of unified GDDR3 memory running at 700MHz. The memory controller resides on the parent die.
It feel so good to be a gamer right now. It sounds good on paper. I can't wait to get it in my hands.
This year's E3 has been, overall, a pretty big letdown. The show itself hasn't been very interesting simply because it's mostly current-gen titles and hardware. For the E3 just before the launch of Microsoft's Xbox 360, we were a bit disappointed not to see any working hardware at the show outside of the ATI booth.
With a relatively light schedule thanks to the small size of the show, we were able to spend quite a bit of time digging deeper on the two highlights of this year's E3 - ATI's Xbox 360 GPU, and NVIDIA's RSX, the GPU powering the PlayStation 3.
Given that both of the aforementioned GPU designs are very closely tied to their console manufacturers, information flow control was dictated by the console makers, not the GPU makers. And unfortunately, neither Microsoft or Sony were interested in giving away more information than their ridiculously light press releases.
Never being satisfied with the norm, we've done some digging and this article is what we've managed to put together. Before we get started, we should mention a few things:
1) Despite our best efforts, information will still be light because of the strict NDAs imposed by Microsoft and Sony on the GPU makers.
2) Information on NVIDIA's RSX will be even lighter because it is the more PC-like of the two solutions and as such, a lot of its technology overlaps with the upcoming G70 GPU, an item we currently can't talk about in great detail.
With those items out of the way, let's get started, first with what has already been announced.
The Xbox 360 GPU, manufactured by ATI, is the least PC-like of the two GPUs for a number of reasons, the most obvious being its 10MB of embedded DRAM. Microsoft announced that the 10MB of embedded DRAM has 256GB/s of bandwidth availble to it; keep this figure in mind, as its meaning isn't as clear cut as it may sound.
The GPU operates at 500MHz and has a 256-bit memory interface to 512MB of 700MHz GDDR3 system memory (that is also shared with the CPU).
Another very prominent feature of the GPU is that it implements ATI's first Unified Shader Architecture, meaning that there are no longer any discrete pixel and vertex shader units, they are instead combined into a set of universal execution units that can operate on either pixel shader or vertex shader instructions. ATI is characterizing the width of the Xbox 360 GPU as being 48 shader pipelines; we should caution you that these 48 pipelines aren't directly comparable to current 16-pipeline GPUs, but rest assured that the 360 GPU should be able to shade and texture more pixels per clock than ATI's fastest present-day GPU.
Now let's move on to NVIDIA's RSX; the RSX is very similar to a PC GPU in that it features a 256-bit connection to 256MB of local GDDR3 memory (operating at 700MHz). Much like NVIDIA's Turbo Cache products, the RSX can also render to any location in system memory, giving it access to the full 256MB of system memory on the PS3 as well.
The RSX is connected to the PlayStation 3's Cell CPU by a 35GB/s FlexIO interface and it also supports FP32 throughout the pipeline.
The RSX will be built on a 90nm process and features over 300 million transistors running at 550MHz.
Between the two GPUs there's barely any information contained within Microsoft's and Sony's press launches, so let's see if we can fill in some blanks.
ATI has been working on the Xbox 360 GPU for approximately two years, and it has been developed independently of any PC GPU. So despite what you may have heard elsewhere, the Xbox 360 GPU is not based on ATI's R5xx architecture.
Unlike any of their current-gen desktop GPUs, the 360 GPU supports FP32 from start to finish (as opposed to the current FP24 spec that ATI has implemented). Full FP32 support puts this aspect of the 360 GPU on par with NVIDIA's RSX.
ATI was very light on details of their pipeline implementation on the 360's GPU, but we were able to get some more clarification on some items. Each of the 48 shader pipelines is able to process two shader operations per cycle (one scalar and one vector), offering a total of 96 shader ops per cycle across the entire array. Remember that because the GPU implements a Unified Shader Architecture, each of these pipelines features execution units that can operate on either pixel or vertex shader instructions.
Both consoles are built on a 90nm process, and thus ATI's GPU is also built on a 90nm process at TSMC. ATI isn't talking transistor counts just yet, but given that the chip has a full 10MB of DRAM on it, we'd expect the chip to be fairly large.
One thing that ATI did shed some light on is that the Xbox 360 GPU is actually a multi-die design, referring to it as a parent-daughter die relationship. Because the GPU's die is so big, ATI had to split it into two separate die on the same package - connected by a "very wide" bus operating at 2GHz.
The daughter die is where the 10MB of embedded DRAM resides, but there is also a great deal of logic on the daughter die alongside the memory. The daughter die features 192 floating point units that are responsible for a lot of the work in sampling for AA among other things.
Remember the 256GB/s bandwidth figure from earlier? It turns out that that's not how much bandwidth is between the parent and daughter die, but rather the bandwidth available to this array of 192 floating point units on the daughter die itself. Clever use of words, no?
Because of the extremely large amount of bandwidth available both between the parent and daughter die as well as between the embedded DRAM and its FPUs, multi-sample AA is essentially free at 720p and 1080p in the Xbox 360. If you're wondering why Microsoft is insisting that all games will have AA enabled, this is why.
ATI did clarify that although Microsoft isn't targetting 1080p (1920 x 1080) as a resolution for games, their GPU would be able to handle the resolution with 4X AA enabled at no performance penalty.
ATI has also implemented a number of intelligent algorithms on the daughter die to handle situations where you need more memory than the 10MB of DRAM on-die. The daughter die has the ability to split the frame into two sections if the frame itself can't fit into the embedded memory. A z-pass is done to determine the location of all of the pixels of the screen and the daughter die then fetches only what is going to be a part of the scene that is being drawn at that particular time.
On the physical side, unlike ATI's Flipper GPU in the Gamecube, the 360 GPU does not use 1T-SRAM for its on-die memory. The memory on-die is actually DRAM. By using regular DRAM on-die, latencies are higher than SRAM or 1T-SRAM but costs should be kept to a minimum thanks to a smaller die than either of the aforementioned technologies.
Remember that in addition to functioning as a GPU, ATI's chip must also function as a memory controller for the 3-core PPC CPU in the Xbox 360. The memory controller services both the GPU and the CPU's needs, and as we mentioned before the controller is 256-bits wide and interfaces to 512MB of unified GDDR3 memory running at 700MHz. The memory controller resides on the parent die.
It feel so good to be a gamer right now. It sounds good on paper. I can't wait to get it in my hands.