nVidia announced the GeForce Titan Z today (03-25-2014). It is based on 2 GK-110 GPU's, uses 5760 CUDA cores and is equipped with 12 GB VRAM memory. However, it may require CC to take full advantage of all that power.

We have added AND / OR selections to our results...

On the Benchmark Results - Detailed Results page we have added a few new filter and advanced search options to make it easier to navigate through all the results, exactly as you may want to. Simply click on the 'Filter' button to start a search (it is toggable),

or click on the 'Binocular' button to start an advanced search.

Remember to clear all filters when you want to return to the default overview.

CS6 anomalies and AME problems...

For the attentive reader, it is clear that CS6 has serious problems with caching of material in a timeline, or more accurately, the use of interprocess memory versus regular memory. Take a look at the Top-20 Chart and closely look at the 4-th chart, named MPEG2-DVD Time in seconds. Notice there are 14 observations in the range of 18 to 29 seconds. The rest of the observations are in the 44+ second range. It looks like there is a clear dichotomy in this chart. That is correct, because when you scroll down to the 10-th chart, named MPEG2-DVD Time in seconds, CS6 only you will see that all 14 observations from the other chart have mysteriously disappeared. In fact only 3 are left from the previous chart.

Investigating this further revealed that identical systems running the same test took a performance penalty of more than 250% in CS6 over CS5/5.5.

But that is not all. The performance delta between Direct Export and AME Queue has never been that big.

Testing on the same machine with CS6 showed huge differences in performance, with a delta of nearly 500% in extended tests. In other words, using Direct Export is almost five times faster than using the AME Queue, at least with Disk I/O test. If I take my own results from PPBM5 as a starting point and then correct the results for both of these problems, the score would have been widely different. This is called the 'Hypothetical Monster'. When using a different .dll for the video output, I could reduce the Disk I/O results to 24 seconds.

Computer ID
Total
RPI
Disk I/O
MPEG DVD
H.264 BR
MPE Off
MPE On
Hypothetical Monster 61 54 10 17 33 39 1
Harm's Monster 111 100 33 44 33 39 1
Monster + modified .dll 102 96 24 44 33 39 1

The conclusion is obvious, CS6 carries severe limitations in the Disk I/O and MPEG2-DVD department. The same computer with the same setup and using the same tests is nearly twice as fast, had Adobe not introduced these severe limitations in CS6. To show it graphically, with the RPI as index numbers and the rest in seconds,

Video card performance

With the number of observations available now (1200+) we see a clear tendency that more CUDA cores do help performance, at least with the older architecture. The usual performance increase of hardware acceleration over software is around 12 x, with the cards with fewer CUDA cores lagging behind and the ones with more cores pulling ahead. The increase in performance over software shows that indeed the 9800 GT and GTX 260 are clearly lagging when compared to the GTX 480/580/680. The newer architecture shows its benefits.

On the other hand, with the introduction of Kepler cards with around three times the number of CUDA cores over the Fermi cards, people expected triple the performance and they were disappointed to be shown wrong in their expectations.

With all the results we have from our PPBM5 benchmark it sure looks like memory bandwidth may be the decisive factor to impact performance. It is no longer just number of CUDA cores. Here are some results from our testing with an i7-2600K system, overclocked to 4.4 GHz, 32 GB memory and the project on two Samsung 840 PRO SSD's, in Raid0. Specifically note that the memory bandwidth figures in the chart are displayed as
(250 GB/s - Effective Bandwidth GB/s), so the grey line is similar to time, lower is better in the display. The only reason for this approach was to improve legibility of the chart.

There is a clear relation between MPE performance and the memory bandwidth of the video card. The number of cores show a much muddier picture, because there are three generations of GPU's in this overview, but there are obvious differences between the 4xx, 5xx and 6xx generations of GPU's. They are displayed in slightly different colors for easier recognition.

A rough guideline on where to spend your money with a new system

Many questions are asked about building a new system that will perform well with PR CS5+/6. Of course the answer depends on budget, but if we take a typical $ 2,000 system as an Economical choice, a $ 5000 system as a Warrior choice and a $ 7,500 system as a Monster choice, you would typically see a cost distribution over the major components of a system like this:

 

*The cost of Case, PSU is high for the Monster, because of the drive cages included for hot-swappable disks.

Notice that whether you go for a $ 2K system, or a $ 5K or even higher system, more than 35% of your investment should go into your disk setup to get good performance. Use this graph to build your own 'balanced' system without overspending on a single component. Other components comprise the cost of BD-R burners, fans, cables, power splitters and the like.


For more information about Balanced Systems and what you should strive for, see the Benchmark Results page.