Sunday, November 08, 2009  
Google
Web pcquest.com

CIOL Network sites

Search by Issue | Sitemap | Advanced Search

• For most updated version of DQ TOP 20 issue, visit dqindia.com • Ad : Play and Plug ERP by IBM
 Home > Enterprise

Intel's Dunnington Six-core Processor

We tested the most powerful X-86 based server for data centers till date, built on Intel's latest Dunnington platform. And yes it really rocks

Rakesh Sharma

Saturday, November 01, 2008

Print Comment Email DiggDigg DeliciousDel.icio.us RedittReddit TwitterTwitter

Just a few months ago we reviewed the Harpertown processor. This processor was launched just after the launch of 45 nm Penryn based Xeon 5400 series model. This time we had the opportunity to test the next in line from Intel: Dunnington processor. The server we received had 4 distinct processors with 6 cores each. As there are not many applications that use all available cores, this processor is meant for very high end computing and for virtualization in large enterprises. It can also be used in cloud computing or rack optimized and ultra-dense SKUs.

Technology in Dunnington
To understand and appreciate the tech used in Dunnington, we'll start with a bit of history of the previous generation server processors. The Xeon 5200 series codenamed Woodcrest, based on the Intel's core-micro architecture was the server and workstation version of the Intel Core 2 processor. The fastest processor in this category operated at 3.0 GHz, claiming better performance and also less energy consumption than previous processors. In Jan 2007, Intel launched its quad-core, Core2quad, as the 3200 series which comprised of two separate dual core dies placed next to each other in one CPU package. This was targeted at blade servers. The 3300 series was similar to 3200 series but was manufactured using 45 nm process and featured XD bit and virtualization technology.

Direct Hit!
Applies To: Data centers
Price: Yet to be released
USP: How six-core processors rev up the performance of your server
Primary Link: www.intel.com
Keywords: dunnington

True to Intel's tick-tock release cycle of processors, where tick means a refresh of the current architecture and tock means a brand new architecture, the clock ticked and the Harpertown Xeons were released in late 2007. This family of processors consisted of dual-die Quad-core processors manufactured on a 45 nm process and featured 1333 to 1600 MHz front side bus with lesser TDPs, rated between 50W to 150W depending upon the model.

And now Intel has become the first in the x86 processor market to launch a processor with six cores. Their offering before this was the 7300 series, code named Tigerton and consisted of two dual core architecture silicon chips on a single ceramic module. Boasting of greater processing capabilities, the Tigerton was based on Intel's Caneland (Clarksboro) platform. But now the Dunnington, or the 7400 series, features a single-die six core design and is based on Intel's 45 nm Penryn processor. Like the Harpertown Xeon processor it has three dual cores clubbed together. Compared to its predecessor, this processor has significantly more cache, ie 16 MB L3 cache which is shared among all six cores, 3 MB L2 cache which is shared among two cores and 96 KB L1 cache. The increase in the size of cache will lead to improvement of performance, mainly by reducing the latency in accessing frequently used data. However, the processor speed remains approximately the same, ranging from 2.13 to 2.66 MHz. However, the FSB of Dunnington (1066 MHz) compared to Harpertown which has 1600 MHz (reviewed in Jan 2008) is much lower, which can be a bottleneck but the 3 MB cache size is said to reduce this to great extent. One good thing is that if you already have a Tigerton's nPGA604 socket, then you just have to plug this CPU into that. And it is compatible with Caneland chipsets too.

The above image shows six cores in the Dunnington CPU, having 16 MB L3 cache shared among all the cores.

This 7400 series processor also supports VT-x technology, ie Intel Flex migration and Flex priority technology. Earlier successful live virtual machine migration was dependent upon the compatibility of the two CPUs between which the migration is being done. And also to ensure that the VM is stable after the migration is done. All these issues have been taken care of with the new Intel Flex migration technology. Now this also solves the requirement of buying a compatible resource pool across multiple generations of Xeon processors. This gives you the option of choosing the right server platform with respect to performance, cost and power for your enterprise. Flex Priority is another such hardware feature which helps in optimizing virtualization by improving virtual machine access to the task priority register.

How we tested
To test the performance of the server we ran different benchmarks such as SunGard, Linpak, POVRay and Cinebench. We ran these benchmarks on Windows Server 2008 64-bit OS, first with 2 processors and 8GB RAM, and then with 4 processors and 16 GB RAM. The HDD was configured on RAID 0 so that the IO doesn't create any bottleneck during the benchmarking process. Initially we simply took out 2 processors, 8 GB RAM and ran the benchmarks. Then we placed the processor and RAM back again and then ran the benchmarks to get full system performance. For checking the power consumption, we connected this device via a 'wattmeter' to the main power supply and then calculated the maximum, minimum and average power consumption.

The performance graph while running SunGard on Dunnington (left) and Harpertown (right). Dunnington took 95 secs less than Harpertown.

Benchmark results
Initially we started the test with Cinebench 10, which measures the performance of processor and graphics card, and finally we gave a Cinebench score. This test process consists of two different parts: the first part is processor intensive and second is graphics intensive. Initially it makes use of a single CPU for running the test whereas the latter part of the test uses all the cores. In the second test, ie the graphics test, the test runs inside a 3D window. An animated scene is played starting with a low demand for graphics which is increased later on. Finally a score is generated, when the processor works on maximum speed for the scene to be displayed properly. The higher the scores the better will be the server performance.

Results: With 2 processors and 8GB RAM, Cinebench gave scores of 3262 CB-CPU while rendering 1 CPU and gave 26816 CB-CPU rendering with all the CPUs. The GPU score ticked to 190 CB-GFX which is good for a server processor like this one. Now with all the 4 CPUs and 16 GB RAM i.e. with full blown configuration, this monster gave scores of 3266 CB-CPU for rendering 1 CPU which is of course the same as the earlier case. But when it rendered all the CPUs then the score ticked to 31372 CB-CPU which means 14% increase in the performance compared to earlier configurations. However, pls note that this benchmark 'CINEBENCH 10 64-bit' didn't use more than 16 cores.

As the next benchmark we used a ray tracing program POVRay which is used for CPU benchmarking. It uses the raytracing rendering technique to calculate an image, by simulating how light travels in the real world. For benchmarking with POVRay, we used the standard 'benchmark.pov' as this file uses every internal feature of POVRay and stresses the CPU to limits. One more reason for using this benchmark file is that as it is the standard for all processors and it becomes easier for others to compare scores.

All 24 cores being utilized while running the SunGard benchmark. Apart from Linpack, this benchmark was the only one to stretch all 24 cores.

Results: With 2 processors and 8 GB RAM, POVRay rendered an average 120.38 PPS over 147456 pixels and with 4 processors and 16 GB RAM, it rendered an average 120.25 PPS. POVRay used a maximum of 3 cores for executing the benchmark.

Then we used SunGard Adaptive Analytics as a component of SunGard's Suite of risk management products. More precisely, it is the stripped down version of the actual product. This benchmark utilizes Monte Carlo method financial engine to predict the future of fictitious portfolio. It requires two different files to run, the first one contains a sample data that represents the actual market condition and the second file contains the sample customer's investment portfolio. The benchmark scores are calculated on the base of time in seconds, so the lesser the time it will take to run, the better the server performed.

Results: In the first test, with 2 processors and 8 GB RAM, the total time taken to run the benchmark is 156.2 seconds and with 4 processors and 16 GB RAM, it took only 105.9 seconds. Harpertown with 8 cores and 16 GB RAM took around 200 seconds which is 47% less than what this Dunnington processor took.

Comparison of the time taken by different machines to execute SunGard.

Next we ran Linkpack which takes down almost any server to its feet. It basically measures a system's floating point computing power by making the system solve an N by N linear equation (i.e. Ax = b). It calculates how much amount of GFlops can be generated. The greater the number of GFLops generated the better the system is.

Results: With 2 processors and 8 GB RAM the system generated 53.69 GFlops and with 4 processors and 16 GB RAM, the system gave 62.02 GFlops which is lower than GFlops generated by Harpertown (65 GFlops). We got lower score for Dunnington as Linpack that we had was customized for Harpertown.

For checking the min power consumption we kept the system idle which came to be 438 W, whereas in the case of max power drawn, the wattmeter showed 715 W.

Page(s)   1  

Print Comment Email DiggDigg DeliciousDel.icio.us RedittReddit TwitterTwitter


Untitled Document



ZTE:Leading CDMA Technology


Extraordinary Networks:Freedom of Choice


   
 

 
 

Magazine Subscription | RQS | Contact Us | Team PCQuest | Advertising - Print | jobs@cybermedia