Wednesday, November 02, 2005
Exploring the performance impact of memory latency Is 2-2-2-5-1T really worth it?
Lower latencies are a good thing, of course, but how much can they really improve system performance? Are exotic, low-latency DIMMs worth the price premium? Read on as we explore the effects of memory latency on Athlon 64 performance in synthetic memory benchmarks, games, and real-world applications.

Low latency DIMMs: Worth the premium?
Memory latency?
Before diving into our benchmark results, it's worth taking a moment to go over how memory access works and where the various latencies come into play. Memory is organized like a spreadsheet, with data stored in cells that can be identified by a corresponding column and row. Spreadsheets can also be made up of multiple sheets, and similarly, memory can be made up of multiple banks. If we want to access a specific cell of memory, the system must first activate the sheet, or bank, containing the desired row. Next, the system sends an active command to the desired row. Once the row is activated, the system can issue read or write commands to specific columns in the row. When reading or writing has been completed, a precharge command is sent to close the row.
There are delays between each of the steps in memory access. These delays are referred to as latencies and expressed as a number of clock cycles. Here's a brief explanation of some of the most common, and important, memory timing parameters that affect access latencies:
- RAS-to-CAS delay (tRCD) — The RAS-to-CAS delay occurs between the time a row is activated and when the first read or write operation is performed.
- CAS latency (CL) — CAS latency refers to the delay between when a read operation is issued and when the data returned by that read is considered valid.
- RAS precharge (tRP) — The RAS precharge is the delay between when a precharge command is issued to close a row and when the next active command can be issued.
- Active-to-precharge delay (tRAS) — This latency actually spans several steps in the memory access process. The active-to-precharge delay refers to the minimum number of cycles that must elapse between an active and precharge command.
Since latencies refer to delays, lower is better. That doesn't mean you should hop into your motherboard's BIOS and set each memory timing option to its lowest possible value, though. Memory modules are rated for a specific set of latencies at a given clock speed, and they're generally not stable with lower latencies. A DIMM's latencies are usually expressed as a series of four hyphenated numbers corresponding to the CAS latency, RAS-to-CAS delay, RAS precharge, and active-to-precharge delay. Low latency DDR400, for example, is generally rated for 2-2-2-5 timings at 400MHz. That refers to two cycles of CAS latency, RAS-to-CAS delay, and RAS precharge, and five cycles of active-to-precharge delay.

OCZ's Enhanced Latency Platinum Rev 2 DDR400 rated for 2-2-2-5 latencies
Our testing methods
We've tested several different memory configurations to illustrate the performance impact the key memory timings settings, including DRAM command rate. Tests were conducted with a set of low-latency OCZ DIMMs rated for 2-2-2-5 timings at 400MHz. We also tested with 2.5-4-4-8 timings to simulate the performance of more affordable "value" memory. Some budget memory is rated with CAS latencies as high as three cycles, but since CAS 2.5 memory is already quite affordable, we've limited our testing to CAS 2 and 2.5. In addition to testing system performance with 2-2-2-5 and 2.5-4-4-8 memory timings, we've also tested each configuration with both 1T and 2T command rates.
All tests were run three times, and their results were averaged, using the following test system.
Processor | AMD Athlon 64 FX-53 2.4GHz | |||
System bus | HyperTransport 16-bit/1GHz | |||
Motherboard | DFI LANParty UT NF4 Ultra-D | |||
BIOS revision | N4D623-3 | |||
North bridge | NVIDIA nForce4 Ultra | |||
South bridge | ||||
Chipset drivers | ForceWare 6.66 | |||
Memory size | 1GB (2 DIMMs) | 1GB (2 DIMMs) | 1GB (2 DIMMs) | 1GB (2 DIMMs) |
Memory type | OCZ PC3200 EL Platinum Rev 2 DDR SDRAM at 400MHz | |||
CAS latency (CL) | 2 | 2 | 2.5 | 2.5 |
RAS to CAS delay (tRCD) | 2 | 2 | 4 | 4 |
RAS precharge (tRP) | 2 | 2 | 4 | 4 |
Cycle time (tRAS) | 5 | 5 | 8 | 8 |
Command rate | 1T | 2T | 1T | 2T |
Hard drives | Western Digital Raptor WD360GD 37GB SATA | |||
Audio | nForce4/ALC850 | |||
Audio driver | Realtek 3.75 | |||
Graphics | NVIDIA GeForce 6800 GT with ForceWare 77.77 drivers | |||
OS | Microsoft Windows XP Professional | |||
OS updates | Service Pack 2, DirectX 9.0c |
Our test system was powered by OCZ PowerStream power supply units. The PowerStream was one of our Editor's Choice winners in our latest PSU round-up.
We used the following versions of our test applications:
- WorldBench 5.0
- Far Cry v1.3
- DOOM 3 1.3 with trdelta1 and trdemo2 demos
- Quake 4 with trhangar1 and trtram demos
- Far Cry 1.30 with tr1-volcano demo
- Splinter Cell: Chaos Theory 1.05 with trpenthouse demo
- Battlefield 2 1.03
- FutureMark 3DMark05 Build 120
- FRAPS 2.6.4
- Cinebench 2003
- Sphinx 3.3
- SiSoft Sandra Standard 2005 SR2a
The test systems' Windows desktop was set at 1280x1024 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Memory performance
We begin with some synthetic memory subsystem benchmarks that should easily expose any performance differences between the various settings.
Cinebench 2003
Sphinx
WorldBench overall performance
WorldBench uses scripting to step through a series of tasks in common Windows applications. It then produces an overall score. WorldBench also spits out individual results for its component application tests, allowing us to compare performance in each. We'll look at the overall score, and then we'll show individual application results alongside the results from some of our own application tests.
Multimedia editing and encoding
MusicMatch Jukebox
Windows Media Encoder
Adobe Premiere
VideoWave Movie Creator
Image processing
Adobe Photoshop
ACDSee PowerPack
Multitasking and office applications
Microsoft Office
Mozilla
Mozilla and Windows Media Encoder
Other applications
WinZip
Nero
Gaming performance
We conducted our gaming tests with two sets of in-game quality settings. First, we tested at low resolutions with medium quality levels and antialiasing and anisotropic filtering disabled. We then tested at higher resolutions and detail levels, with antialiasing and aniso, to better reflect how most users would play games on a system of this caliber. The latter settings may bottleneck performance at the graphics card, but that's how things are with the vast majority of today's games.
3DMark05
Since you won't find anyone playing 3DMark05, we've limited our testing to the app's default settings.
Far Cry
DOOM 3
Quake 4
Timedemos in Quake 4 don't appear to render all of the game's eye candy effects, but since we're only changing memory latencies and command rates, that shouldn't impact our results.
Unreal Tournament 2004
Splinter Cell: Chaos Theory
Battlefield 2
Conclusions
Although tighter memory timings and a 1T command rate can certainly improve the performance of the Athlon 64's memory subsystem, that improvement doesn't always translate to better application performance. In fact, with the exception of the Sphinx speech recognition engine, moving to tighter memory timings or a more aggressive command rate generally didn't improve performance by more than a few percentage points, if at all, in our tests. Lower latencies only improved WorldBench's overall score by a single point, and performance gains in games were generally limited to lower resolutions and detail levels.
So how much does the modest performance improvement brought by tighter memory latencies cost? Close to twice as much. As I write, a single 512MB stick of OCZ Value DDR400 memory rated at 2.5-4-4-8 sells for between $45 and $52 online, while a 512MB Platinum Rev 2 2-2-2-5 DDR400 module sells for between $81 and $94. Looking at dual-channel kits, a pair of 512MB OCZ Value DDR400 DIMMS rated for 2.5-4-4-8 timings sells for between $91 and $103 online, while a pair of 512MB Platinum Rev 2 sticks rated for 2-2-2-5 costs between $155 and $191.
OCZ isn't the only DIMM maker charging that sort of premium for ultra-low-latency modules. In fact, it's common. To cite another example, a pair of 512MB Corsair Value DDR400 DIMMs rated for 2.5-4-4-8 will set you back between $80 and $159, while a couple of the company's 512MB TWINX1024-3200XL 2-2-2-5 DDR400 modules run from $189 all the way up to $325.
For most users, the price premium associated with exotic 2-2-2-5 memory won't be worth the relatively modest performance gains that it offers. Low-latency memory does have an ace up its sleeve for overclockers, though. Most low-latency modules are capable of running at much higher clock speeds if you back off on their latencies a little. We've had our OCZ Platinum Rev 2 DIMMs, which are rated for 2-2-2-5 latencies at 400MHz, cranked all the way up to 560MHz with more relaxed 2.5-4-4-8 timings. Overclocking success is never guaranteed, of course, but low-latency memory modules tend to use higher quality chips that respond better to overclocking.
At the end of the day, the appeal of low-latency memory modules may be limited to overclockers and enthusiasts intent on squeezing every last drop of performance from a system. More pedestrian "value" memory should be plenty fast enough for everyone else, especially since you can practically afford twice as much.
source:http://techreport.com/etc/2005q4/mem-latency/index.x?pg=1