Hello I haven't posted here for a long long time. In fact I was most posted most during pre and post launch of PS3 although I love the PS3 for what it is a games machine the actual hardware specs really intrested me and I spent hours reading everything I could on the Cell and anything else I could get my hand on even reading links to detailed spec sheats and discussions on the IBM forums in the engineers section. I then discussed my thoughts and impressions and made predictions on what the Cell could mean for games with the PS3 (it would be very intresting to read those posts again some time to see how much I got right/wrong).

The time is with us again and I have taken a keen intrest in what Sony is doing with the PS4 it is much harder to gauge this as there are not much in techincal documentation or cpu benches like there was with the Cell to base any of my theories or predicitons on. Therefore allot of my theories this time is more supposition and not supported by hard facts but based on what I have managed to read on the varius aspects of both the hardware and software from AMD, Cerny (Sony), Some of the more reputible Beyond 3D information on things like huma and tidbits from devs on forums and twitter.

Why am I posting this after all this time. Well I have just read lots of forums including Neogaf and its getting to me that people are writing off the PS4 as only as powerfull as a medium range PC (or even less powerfull in some cases) and the often addage that the hardware because it is based on 86x 64 base processor and Southern Island GPU's are out of date and old tech (even before the PS4 hits the shelves).

Now if your PC centric and you look at the equivilant AMD GPU the Jaguar 8 core chipset look at the pure Gflop performance this conclusion seems on the surface to be a reasonable one. However this is a huge mistake this is not a CPU a GPU a Sound Processor and it is a APU. That means that every peace of hardware is linked together on the same die therefore while the individual componants can be compared to its PC equivilants the fact that every single componant is on the same die means that that there is in order of maginitude efficency between all of its parts.

When your building a PC you can fit any number of CPU's into your MB similalry any number of Memory sticks or many types of strengths of GPU's. The mother board is designed therefore to support many different configurations and similarly the software must also support all these types of configurations from drivers to OS to be able to make all these pieces working together to achieve the desired goal of playing a game.

Both Sony and MS had design goals decided what they could achieve taking into account their aims (e.g. media, kenect, games for MS, ganes, media, video streaming for Sony (these are just examples)) the amount of power they could use taking into consideration heat disepation and cost.

So once Sony new how much power and cost they could budget for they went about building (with AMD) the best peace of kit taking in mind the demands of both internal and external studio's. Based on the project constraints (power, cost, heat dispertion, developer demands, project goals) they then arrived with AMD with their solution.

They also took what they learnt what went wrong or write the PS3 and factored this with the PS4 to.

What they liked about the PS3 was the shear power and flexibilty of the Cell to do remarkable things (Beyond 2 Souls, TLOU, God of War etc) what they didn't like was that the developers outside of Sony found it so hard to use and program for. Also the split memory pools caused developers issues and together with the weaker GPU (compared to Xbox 360) this caused most multiplats to not be able to leverage the PS3's potential and meant that most multiplats were better on the Xbox 360.

In short a strong processor capable of flebxility in GPU, Compute, Physics good, Split memory card, weaker GPU, difficult hardware to master bad.

Sorry for rambling but the insights that Cerny has given us into the design of the PS4 informs you of what they (Sony) are trying to achieve with the PS4 and that background does give you some insight into what the PS4 is all about.

Ok we have allready covered that the PS4 is a APU and that means all the componants or on one die. What does this mean exactly.

Well what it means is that the degree of seperation is much smaller. High bandwidth connect all these componants together and the bandwith/bus(es) is designed so that it takes advantage of the fact that all the componants are fixed and effiecent. The way that the PS4 can use its buses ONION, ONION+ and Garlic (to CPU to GPU via Cache (10 GB Sec) to CPU to GPU avoiding GPU cache (the same 10 GB sec) and CPU and GPU to GDDR5 (20 GB sec CPU 176 GPU) all read/write. )

In addition they have added volatile tag so that the CPU and GPU can work on asynchronous compute for the system memory.

What does this mean exactly?

WIth the PC the componants are further apart and the path between the GPU and CPU is small in addition in order for the CPU or GPU to comunicate they use cache. Also the CPU language and the GPU language is different meaning that there is a layer there that needs to be done in order on both a hardware and software level that needs to work before the CPU and GPU can work together in order to produce the desired result. Cache misses are the bane of the PC world its why people scoff at tflop or gflop numbers because most processors relie on cache and cache misses are very costly terms of the amount of time it takes to process the task. In order to get around this the CPU for example is OOE or out of order which means that it can work on things while it is waiting for the piece of information it needs is "found" on cache.

Without going into to much detail the steps between the CPU sending a job to the GPU and asking for the results back is in computer terms very time consuming because of bandwidth, cache misses / cache flushes, coverting instrustions in memory from what the cpu understands to what the GPU understands doing the job and going bak to the cpu repeating the process.

The way the PC mitigates this is by being OOE and also by "brute force" in that the CPU and GPU are so quick and powerfull that they process the information very quickly once they have the instruction or job to do. Thus any time wasted due to cache misses / cache flushes are mitigated the faster and more powerfull the CPU and GPU are the more they "hide" this. However this is why people scoff at theoretical performances. Due to infecancys in the way this works and the fact that a job rarely uses 100 percentof the silicon real estate of the CPU/GPU the theoretical maximum of the PC are very rarely reached.

It's obviously more complicated than that because you also have to take into account software layers in that the PC also has software layers. For example Windows OS has to cater for all possible componants and have the drivers to tell the componants what to do. Due to how high the software is working it is only telling the hardware how to proceed with a job/instruction in a general way and not telling it how to do it in the best way (most efficient).

It is speculated on the other hand that the PS4 has Huma across its APU. This means that not only is the componants close to each other it can work in conjunction with each other in an efficent manner. So you tell the PS4 to do a collision detection job for example and it goes to the CPU which does its bit (usuing the CPU to do the bit that its good at) it passes the rest of the job to the GPU straight or via a unified memory which is flagged for a job. The GPU understands what it needs to do without any translation and does its bit and either sends it to the memory or sends the result to the CPU straight using the onion/onion + bandwidth. It can even write the result to the memory flagging it for the CPU the CPU does extra work and writes the result flagging this bit in the memory and GPU can take this back again. This is what is meant by asynchonus compute it can work together on a problem in a way that current PC's just not able to do.

So what does this mean in real world performance.

This is cutting edge technology which is why I am a little annoyed when I see PC gamers call the PS4 set up old. It's not the fact is the PCI exrpres variations and the way that the CPU and GPU work are old techonolgy with only incremental bandwidth changes and its basics have been the same for a very long time. The amount of power that the CPU and GPU are using (per watts and per how much real estate they use) is covering up how terribly ineffcient the process is.

If the PS4 is huma and many of the talks that Mark Cerny has given together with how Huma works and the way in which the PS4 is set up seems to suggest that it is. Then due to the fact that the CPU and GPU can communicate and work on projects in a way in which has not been done before means that it in order of times more efficient that its PC equivilants how many times in order remains to be seen.

The other piece of how powerfull the PS4 is is the GPU itself this is not just a 1.8tflop GPU it is modified extensivly above and beyond what GPU's are today so stacking up a stock AMD 2tflop GPU and putting up against the PS4 GPU and saying the AMD is more powerfull is very far from the whole picture.

A normal AMD tahiti 7900 GPU has 2 ACE's and only 4 ques (not sure on this) however I believe the one on the Xbone has between 2 and 8 ques per ACE (Asynchronous Compute Engines) for a total of 4 or 16 ques to be worked on at any one time (Sorry I can only find four ques at the moment via google e.g. 2x2 but im sure I read a later spec sheet that said it had 8 to make it 16 so I cant be sure which is correct maybe someone here can confirm which is correct).

Sorry I am so vauge on the PC and Xbone specs on AC's soon as I find the relevant information I will update the OP.

Anyway the PS4 has 8 ACE's and has 64 ques. These are mostly used for the mangment of using the GPU for compute or GPGPU physics, collision detection, audio raytracing are examples of compute.

What does this mean?

Ok I have read that shadowmaps (from a dev) are terribly ineficient on GPU's as a job. Now I don't now exactly what this means in terms of how much real resources used on a GPU to do this job but lets say for the sake of argument that this job is 50 percent give or take on GPU's as a whole (to make things simple). Now a Nvidia Titan will still do this job really quickly even though its only using 50 percent of its recources to do this job due to brute power but effectivly the efficency of this task for this (made up) example means that its only able to do this job at an effective rating of 50 percent of its tflop rating (6tflop).

In the mean time the CPU wants to offload a compute task (collision detection on mutliple objects for example) and sends that to the titan. This is not straight forward due to the seperation as detailed earlier. Due to the fact that the designer doesnt know that a state of the art GPU is on the other end they may even decide that collision detection is best done on the CPU due to the bandwidth/communication/OS overhead/high API wrapper and the fact that a less powerfull GPU might be present.

(optimised for Nvidia or AMD may mean that the job is done on the that brands GPU due to the correct software drivers/hardware being present on the GPU).

Anyway the upshot is that both the CPU and GPU on the PC are more than capable of doing whatever compute jobs are there its just there are issues meaning that PC is inheritly inefficent for CPU to GPU communications. PCIe 3 is 16 Gig sec have all the other issues mentioned earlier.

The PS4 has 30 Gig sec transfer of information between it. Also due to direct commuinicaiton and Volatile tags cache misses and ineffcies are cut down.

In addition to this while the PS4 GPU is running the shadow map GPU taks at 50 percent of its capabilities it can use (theoretically) 50 percet of the rest of the GPU to many compute jobs with 8 ACE's and 64 queues ready to fill any slack in the GPU for use as compute.

Lets be clear the Cell had a gflop rating of about 216 if I remember correctly to use on compute and gpu tasks (but also had to do audio AI etc etc) this monster has shared resources of up to 1.8 Tflops for GPU and compute tasks. In the 50 perecent scenario that means that 900 gflops 4.5 times that of cell can be used for compute jobs.

And its even better than that its not just 1 job at 50 percent. It can do many jobs balancing across what the CPU is good at and what the GPU is good at at the same time the CPU and the GPU constantly reading writing and updating fine grained compute spread over multiple jobs.

In conclusion:

What does this mean

While on paper the componants may look middle of the road the actual design from a sofrware and hardware point means that the tech is state of the art do not confuse a lower tflop rating with old techonolgy. This is brand new technology in the following ways:

The most powerful APU in existance
A GPU with huge bandwidth resources to a very large amount of GDDR5
A GPU that in theory can use all its 1.8 tflop resources ot both GPU and GPGPU at the same time in a way that no other GPU on the market is capable of at the moment especally when you consider that the API is "close to the metal"
The CPU and GPU are not only part of a APU i.e. on the same die they look to support huma in that both CPU and GPU can freely share jobs on the fly giving code to the strengths of system and not having a bottleneck in either comunicaitons between each other nor issues with bandwidth between each other for a system whose processors are bound tightly together.

Speculation in what that means in games

Basicly just like Cell was able to do some pretty amazing things despite the fairly week GPU in first party hands. It produced beautifull games like TLOU, God of War, Uncharted for example. It also has beautiful games like GT6 and Beyond 2 souls coming soon.

When Mark Cerney is talking about easy to use (access as pure GPU) the GPU is easy to use due to unified memory without any concern for edram or hard to use custom hardware (Cell).

When he was talking about having something in the tank or explotiable resources down the line he was talking about the fine grained computing with huma and the modfied GPU.

Having a 1.8tflop GPU means that we should see and are seeing a noticeable increase in visual fedility as we have seen in Killzone Shadow fall at 1080p 30fps or 1080p at 60fps for mulitplayer.

What we might see in the future is less to do with common GPU things like tessalation, shadows and visual fedelty but distructable environments not seen before, effects that are just impossible on old consoles.

PS4 may even push the PC's in ways that have not been done due to its use of physical effects that have not been seen in games before because the reources of the GPGPU are aviable in a very real way and not part of just a one off effect that happens just in one game.

I believe we are allready seeing glimpses of this future. With collisions, physics and use of voxels in Resogun, the particle effects from Infamous Second Son, And volumetric effects from Deep Down.

I am very exited by the future of the PS4 and will be very intrested in the techical aspects of the games of the current generation.

(NOTE I will re-edit this post for mistakes etc but want to post it just in case I lose the text please bare with me)