Monday, November 23, 2009  
Google
Web pcquest.com

CIOL Network sites

Search by Issue | Sitemap | Advanced Search

• For most updated version of DQ TOP 20 issue, visit dqindia.com • Ad : Play and Plug ERP by IBM
 Home > Technology

Breaking the Barriers of Cores

There has been a lot of buzz in the field of microprocessors, what with dies getting smaller and smaller, and more and more cores being added. We take a look at the technologies behind these developments

Saurangshu Kanunjna

Thursday, January 03, 2008

Print Comment Email DiggDigg DeliciousDel.icio.us RedittReddit TwitterTwitter

In 1965, Gordon Moore predicted that the number of transistors on a chip doubles about every 2 years. More than three decades have passed since the prediction, and we still see Moore's Law being followed by all leading chip manufacturers. This one prediction has revolutionized the way processor manufacturers look at the future of processor technology.

Today, breaking all possible technological barriers we have processors based on 45 nm technology, a huge development since the initial 800 nm processors. Over the last few years, we have also seen processor frequency reaching a saturation point, exceeding which doesn't seem feasible . This is mainly due to excessive power consumption and heat generation. Hence the only feasible alternative was to add more cores to a single die. This made it more power efficient and increased the performance as well. Single core ruled the market for a long period but the jump from single to dual and now to quad has been rather fast. Trying to cater to the heavy demand from all quarters of the industry, vendors were made to think out of the box and to dish out solutions that can meet all ends. Does adding more cores help? Will reducing the die of the processor and increasing the number of transistors on a chip be of any use?

Will 45 nm suffice or will we shrink the die further? Moreover,how much can we shrink the die further? Which applications will be able to utilize so much processing power? These are the questions that plague most individuals, and in this story, we'll make things more clear.

The shrinking die
One of the current trends that have gained rapid speed for the least few years is shrinking the die and cramming more and more transistors in it. If we look back, 90 nm was all that we had couple of years back, but the transition from 90 nm to half of it has been really fast and now we are already moving to 32 nm in the near feature. Does reducing the die and adding more transistors on the die make sense? Yes it does and for good reason too. Smaller dies means more processing power in lesser area. It also means more efficient power consumption, and lesser heat dissipation. Today's data centers control hundreds of servers powered by thousands of processor cores. Even an incremental increase in power per core translates to a large overall power consumption. Enough has been said about shrinking processor dies and the move to multiple cores. But does it impact the processors from various vendors? Let's find out.

The server champs
The server domain is not new to multi-core CPUs. In fact, vendors like Sun and IBM have had multi-core processors for a long time. It's only recently that the x86 CPU giants Intel and AMD have introduced their multi-core offerings. Therefore, we'll focus on the latest developments in server processors by all the key players.

Sun's Niagara
In 2005, SUN's UltraSPARC T1 processor (codenamed Niagara) was launched with 8 cores, each supporting 4 threads. This year Sun released a sequel to Niagara, the UltraSPARC T2 processor (codenamed Niagara 2), which also has 8 SPARC cores, like the previous one. All the 8 cores are connected to 4MB of shared L2 cache.

Each of the cores is capable of eight-way simultaneous multithreading, enabling a total of 64 simultaneous threads of execution.

All this thread processing power enables twice the throughput and performance per/watt gain over Niagara 1. The UltraSPARC T2 processor is amongst the first 'system on chip,' having the most cores and threads. Niagara2 is fabricated on a 65 nm process and has 503 million transistors, though we won't be surprised if Sun also decides to launch their 45nm processor, ie Niagara 3, sometime soon. What is expected from Niagara 3 is more processing power with emphasis on power consumption and better memory bandwidth. It has been designed in such a way that delays in accessing the memory are avoided. For the time being, Sun is planning to provide the OpenSPARC T2 RTL (register transfer level) processor design to the Open Source community under the GPL license.

IBM's POWER
POWER 6 is the latest in series from IBM for its servers, which was launched in the middle of 2007. Running at 3.5, 4.2 and 4.7 GHz, the POWER6 promises to deliver twice the speed of the previous generation POWER5 CPU.

The most impressive part of this new processor is the memory bandwidth of about 300 Gbps. It can download the entire iTunes catalog in about 60 seconds. The server based on POWER6 processor comes with specialized hardware and software which allows it to create many 'virtual' servers on a single box.

It is the first UNIX microprocessor with the ability to calculate decimal floating point arithmetic in hardware. There is a vast improvement in the way instructions are executed inside the chip. The performance has been enhanced by keeping the number of pipeline stages static, making each stage faster and doing more work in parallel. Hence, lesser execution time and ultimately less energy consumed.

Another major advantage with POWER6 is that the processor clock can be dynamically turned off or on depending upon the requirement. IBM also has plans to provide customers with the ability to move live virtual machines from one physical UNIX server to another while maintaining continuous availability. Known as POWER6 Live Partition Mobility function, this technology will be an added advantage to have. As has been the norm, after POWER 6 which is based on the 65nm technology, it's very much possible that POWER7 will be based on 45nm technology.

Intel's Xeon
This CPU has now become a household name in the server domain. Since the first Xeon based processor, 'Pentium II Xeon' in 1998 (codenamed “Drake”), there has been a huge demand for Xeon processors in the server domain. Intel has designed the Xeon processors family in such a manner that each family has a specific target segment in mind. If the 3000 (3040, 3050, 3060 etc) sequence is for the SMBs, then the 5000 (5100, 5110, 5120) sequence is the most commonly used amongst enterprises. Then there is the 7000 (7100, 7200) sequence which is meant for large scale enterprise computing and server consolidation. If you want even more computing power within your server you can opt for the Itanium 9000 sequence meant for massive, mission-critical computing and RISC replacement. Itanium can scale up to 512 dual–core processors and a whopping 1000 TB RAM.

The advantage of multi-core over single core is the fact that different applications can be handled by dedicated threads hence enabling faster processing of tasks.

Future of server processors
It's not possible to drill down into the details of each server processor, as each upgrade brings with it a different codename or an additional socket; different FSB or cache design; or a different micro architecture and so on. So we'll concentrate on the future plans of processor majors and see what architecture they come up with.

Intel released the relabeled versions of its Core 2 Quad processor as Xeon 3200-series this year, codenamed Kentsfield. This 2x2 'quad-core' comprised two separate dual core dies next to each other in one CPU package. Even the 7300 series which is codenamed Tigerton was also announced this year. It has four sockets and more capable Quad core processor, consisting of two dual core 2 architecture silicon chips on a single ceramic module. It uses Intel's Caneland platform and twice the performance compared to the previous generation of processors. With the launch of the 45 nm Penryn processor, Intel has announced their Penryn based Xeon (5400 series) models codenamed Harpertown, with a higher FSB of 1600 MHz.

The dual core version of the CPU, codenamed Wolfdale will be available from 1.89 to 3.4 GHz. Intel plans to launch their Quad Core processor codenamed Gainestown based on their new Nehalem architecture, which will also be based on 45 nm technology but will be based on a new micro architecture. Soon we will see both quad and dual core processors based on Westmere and Gesher architecture which will mark the arrival of 32 nm technology.

Like Intel, AMD too jumped into the Quad Core bandwagon, though a little late. Their new Barcelona is the first 'native' Quad core processor, as it is not made up of two dual core dies like Intel's Kentsfield. It is based on 65 nm technology and the major change in it is the inclusion of what AMD is terming as SSE128. In the initial K8 architecture, it can execute two SSE operations in parallel, although the SSE execution units are only 64-bits wide. So when a 128-bit SSE operation drops in, K8 architecture handles them as two 64-bit operations. With Barcelona the execution units have been widened, as now 128-bit SSE operations don't have to be broken up into two 64-bit operations resulting into more decode bandwidth.

AMD Opteron 2300 series, which supports up to two processor configuration, and Opteron 8300 series, which supports up to eight processor configuration, both are Quad Core offerings from AMD and are based on 65 nm technology. With the launch of Phenom in the coming month, we will notice Quad core Opteron processor based on 45 nm technology from AMD as well. The future processor will see an implementation of the Montreal core based on 45 nm fabrication node, manufactured using MCM (Multi-Chip Module) techniques. They even have plans to incorporate Bulldozer core in the upcoming server processor having support for SSE5 instruction set allowing enhanced HPC and cryptographic computation. Bulldozer is AMD's codename for its next generation CPUs that will improve performance per watt ratio of the processor.

The desktop kings
We spoke about Quad Core for the server domain, but as soon as it was popularized in the server domain, there was demand even in the desktop domain. Intel Core 2 Extreme Quad Core QX6700 is much faster than Core 2 Extreme X6800. This new processor doesn't save on power by any means and is meant for heavy computational usage, engineering analysis, and other financial applications which require heavy computation power.

AMD was also not far behind with their Quad FX platform that includes two sockets, four core processors designed for extreme multi-tasking, megatasking for PC enthusiasts, and power users who run the the most demanding tasks simultaneously. There was demand for more processing power with the chip from specialized segments and hence Intel Core 2 Extreme found its presence in most gamer machines. AMD's Athlon has always been a favorite when it comes to gaming and they too came up with a limited edition 6400+ black edition processor running at 3.2 GHz, probably the only one with such high frequency within its range. Gamers, power workers need more processing power and hence there is a high demand for a processor with higher frequency and processing power. The upcoming desktop processors will be based on 45 nm technology similar to servers and soon we may see quad core ruling the desktop market as well.

45 nm and its benefits
Pre-45 nm technologies used Silicon dioxide as the transistor material. With the introduction of 45 nm processors, an entirely different transistor material has been used. Both Intel and AMD have devised their own solutions and are using different materials to replace Silicon dioxide.

Intel has devised a combination of Hafnium based high-k (Hi-k) gate dielectrics and a new metal material for the gates. Hafnium is a metal that significantly reduces electrical leakage and provides high capacitance necessary for good transistor performance. This would help the current lot of processors to attain higher performance while reducing the amount of electrical leakage from transistors that can hamper chip and PC design, size, power consumption, and costs. The 45 nm processor will see increased transistor switching speed, enabling higher core and bus clock frequencies and more performance in the same power and thermal envelop.

As compared to 65 nm, the 45 nm technology provides 30% reduction in transistor switching power, and double the transistor density. Many people believe that ever since the introduction of polysilicon gate MOS transistor in the 1960s, this is the biggest change in transistor technology.

Did we hear Triple Core?
So far we have heard about dual cores and quad cores, now we hear about as many as 8 cores, so the trend has been to double the number of cores in a single die. There is no such logic that says only doubling of cores is possible. The demand of more versatile and unique products have finally prompted processor manufacturers to think out of the box and hence the emergence of Triple Core Processors. AMD decided to launch a triple core Phenom processor along with their current dual core and quad core processors. Whether it makes sense to launch a tri core processor or not, time will tell, but for the time being AMD is first to cash on to this new concept. Technologically there is nothing new or exclusive, just that it will be a compromised version of the Quad Core processor, with one core disabled. The initial Socket AM2+ tri core processor will be identical in specs to Phenom Quad Core processor, including 512 KB L2 per core, 2 MB shared L3 cache, architecture enhancements to processing resources and many other quad core advantages, only difference being that its one core would be disabled. For the time being it will find its presence only in desktops, but could be a useful processor for notebooks also. With one core being disabled and especially if it is electrically isolated from the others in the parent native quad core design, implies that the processor could be a power-efficient multi-core processor for mobile designs. What lies in future is yet to be seen, but the exclusivity of a triple core makes it an interesting segment to look out for.

Intel SSE4 Instruction with 45 nm CPUs
45 nm processors from Intel will come with Intel's Streaming SIMD extensions 4 (SSE4) instructions. This new instruction set will deliver further performance gains for SIMD (single instruction, multiple data) software and will enable the new microprocessor to deliver superior performance and energy efficiency to a broad range of 32 and 64 bit software. Applications involving graphics, video encoding and processing, 3-D imaging, and gaming will surely benefit from this new instruction set. It will also boost the high-performance applications like audio, image, and compression algorithms. It would be interesting to see how it performs, as it promises to provide dramatic performance gains.

SOI: AMD's next choice
While Intel opted for high-k material to replace silicon dioxide, AMD opted for Silicon on Insulator. Here, the conventional silicon substrate is replaced by a layered silicon-insulator-silicon substrate mainly to reduce parasitic device capacitance and hence improve performance. SOI substrates are compatible with most conventional fab processes. The only barrier to SOI implementation is an increase in substrate cost that will increase the overall manufacturing costs.

Improved cache design
The next lot of processors from Intel and AMD will witness better cache design. Intel's Penryn processor will include a 50% larger L2 Cache with 24-watt power consumption, to further improve the hit rate and to maximize utilization. So dual cores will have up to 6MB of L2 cache and quad cores will have upto 12MB. AMD also plans to have shared L3 cache in addition to the 512K L2 cache per core. It will have up to 2MB of L3 cache shared across 4 cores. The key benefit being touted for this cache is that it would improve the probability of data access by each core, thereby improving performance.

Tech beyond multi-cores
Virtualization is a key driver behind multi-core CPUs, and both Intel and AMD have independently developed their own virtualization extensions. Intel has further plans to add Virtualization for Directed I/O (VT-d), which will provide a way of configuring interrupt delivery to individual virtual machines and an IOMMU for preventing a virtual machine from using DMA to break isolation.

AMD too has similar plans to add specifications for an I/O Memory Management Unit (IOMMU) that would provide a way of configuring interrupt delivery to individual virtual machines. It will also play an important role in advanced OSes. HyperTransport, whose primary use is to replace the FSB, is mainly a bidirectional serial/ parallel high bandwidth, low latency point to point link.

A HTX (Hyper Transport eXpansion) plug in card was developed to support direct access to a CPU and DMA access to the system RAM. It was designed mainly to tackle the issue of bandwidth between the CPU and co-processor. AMD has already announced an initiative named Torrenza to promote the usage of Hyper Transport for plug-in cards and coprocessors. This technology is widely used by AMD, Transmeta, NVIDIA, VIA and SiS.

Another technology which finds varied usage in servers is called CPU based VMX (AltiVec), which is an instruction set that can apply a single processing instruction to multiple data elements. Macro Fusion, a term coined by Intel refers to a processor's ability to combine several instructions into one, thus optimizing it and making for a faster execute. Other than SMP (Symmetric Multiprocessing), SMT (Simultaneous Multithreading) along with instruction sets like 3DNow, SIMD, L3 cache, etc have contributed to the success of multi-core processors.

In the multi-core domain, research in tera-scale computing is on, where terabytes of data must be handled by a platform capable of teraflops of computing performance. Tera-scale computing is the way to bring massive compute capabilities of super computers to devices of everyday use such as servers, desktops and notebooks. So, we will have processors capable of dishing tera-scale of computing power to desktops and servers.

Page(s)   1  

Print Comment Email DiggDigg DeliciousDel.icio.us RedittReddit TwitterTwitter


Untitled Document



ZTE:Leading CDMA Technology


Extraordinary Networks:Freedom of Choice


   
 

 
 

Magazine Subscription | RQS | Contact Us | Team PCQuest | Advertising - Print | jobs@cybermedia