The source material is heavily mixed, it comprises DV AVI in PAL, HDV 1080i PAL, XDCAM-EX HQ PAL, Canon 422 MXF, Red 4K and AVCHD 1080i NTSC. The source is exported to MPEG2-DVD NTSC High Quality Widescreen. It is loaded with lots of effects and transitions, a lot of them keyframed with bezier curves and up to 6 tracks in use.

This means that on export there is a lot of scaling and rendering, as well as field reversal from UFF to LFF for some sources. In all, a nightmarish timeline to export.

Time for the MPE to get off it's exalted fence and get to work.

We are talking about compression to MPEG2, which is not very hard on the CPU. It is only moderately compressed, so threading can resolve most of the processing required by the CPU in a few steps, before it hands off the results to the RAM, which hands it over to the MPE for rendering and scaling, which in turn hands it back over to RAM and then is burst to the disk(s). However, while waiting for the disk(s) to finish writing, the RAM memory is also holding frames from the source material that the CPU still needs to process. The more memory is available, the more frames can be held there for faster processing.

The basic ingredients here are the amount of RAM, number of cores and the clock speed.

More RAM means more frames in memory, more cores means faster processing by the CPU and getting data out of the queue in RAM and a higher clock speed means everything will go faster. If the amount of RAM is limited, the speed of the disk(s) can be the bottleneck.

The difference between hardware or software assisted MPE encoding.

If hardware assisted MPE is enabled, there is a lot of traffic from RAM to the CUDA card, to VRAM and back, which causes delays. For this test there is a lot of scaling and rendering going on and the CUDA card always uses maximum quality settings, in contrast to software only. So, software only leaves out the latency of RAM - GPU - VRAM communication, which means less overhead and it does not by default use maximum quality.

It shows the effectiveness of hardware CUDA/MPE that, despite the latency overhead and the maximum quality, the performance penalty is limited to 30 - 40% on very fast systems. The slower the system, the smaller the performance penalty and it may even become a performance and quality benefit using CUDA/MPE.

Adding memory makes a huge difference!!

Performance gains of 50% or more when doubling memory are not uncommon.

The impact of clock speed decreases with more memory installed.