HOME BUYING
ZONE
One2surf Logo TECH
SUPPORT
FOR
SALE
Intel Platform Roadmap Update
Labs - Home Introduction

Intel Platform Roadmap


Featured Product
Intel Platform Roadmap Update

Manufacturers Web Site


Sections
P4 Pricing Strategies
P4 Performance Issues
Tualatin Re-Positioning
Re-Positioning of Almador
The Almador Feature Set
Celeron: A New Life
Summary

Intel Platform Roadmap Update

P4 Performance Issues

Intel has remained strangely silent about the uncertainties regarding the P4's clock-for-clock performance vs PIII and Athlon. On this subject Intel has merely stated that the 20-stage pipeline is "not an issue" and that it will be able to outperform the P3 by virtue of its faster clock speeds. Since the P4 will now launch at 1.5GHz and because the 1.13GHz P3 has been temporarily removed from the equation, we believe that Intel's claim will prove true in many cases. A 50% clock speed difference can conceal many potential weaknesses in the P4.

Willamette has 256K cache, while Northwood will increase that to 512K. It appears that both will use a 128byte cache line as compared to the 32byte cache line of the P3 processor. This means that all or most P4 external bus transactions will be long bursts. Lack of granularity in the cache line length can results in a performance hit due to increased cache thrashing. Conversely, rapid context switching could benefit from the P4's longer burst. Also, long bursts can be an advantage for some data intensive scientific, workstation and media creation applications. Intel generously supplied the example of Windows Media Encoder as one application that benefits in this manner. We speculate that some of the tests in Viewperf might also demonstrate the advantage of a long burst, but this advantage will not hold out for most mainstream business and personal productivity applications.

At launch, Intel will show numerous P4 optimized demonstration programs that significantly outperform Gigahertz P3 platforms. Some of these were actually on display at IDF, though not widely discussed. One example is Cycore's Cult3D Interactive 3D object viewer for the Internet. By optimizing for SSE2 and the P4 pipeline, this application is reported to outperform a 1GHz P3 by about 70%.

In order to show well at product launch, Intel is hard at work getting some benchmarks recompiled using P4 optimized compilers. One in particular is 3DMark. Intel and MadOnion are coordinating a new release of the benchmark called 3DMark 2001 which will debut, or at least demo around the time of the P4 Launch. This is similar in many respects to AMD's occasional practice of offering an Athlon optimized DLL for use with specific benchmarks. Of course Intel publicly frowns on this. Rather, Intel chooses to exercise its clout to induce a whole new benchmark update release by third party companies, instead of merely offering a modified DLL.

Compiler optimization will play a major role in getting good benchmark results in other environments as well. Another example is the SpecInt and SpecFP benchmarks. Strong P4 results from these compiler optimized benchmark have already been leaked to the world by Intel.

Recently, several third party P4 benchmark results have been released on the web. One case is a set of CliBench scores released by 2CPU.com. This web site released CliBench results from a beta 1GHz P4 platform. The P4 test system in question might have been an early beta with less than optimal performance, but the figures could be useful in discerning some of the potential performance differences between the P4 and Athlon.

CliBench MKLLL 0.7.10 Athlon 1GHz
PC 133-222
(InQuest)
P4 1GHz
RD RAM
(2cpu.com)
Athlon vs P4
Dhrystone 2.1 mips 2139 1263 1.7x
Whetstone mflops 571 240 2.4x
8 queens problem pps 3496 2477 1.4x
Matrix operations K ops 47251 46405 1.0x
Number crunch K ops 82167 61744 1.3x
Floating point K ops 8571 3622 2.4x
Mem throughput MB/s 151.9 303.8 0.5x
Overall Performance Delta 1.5x

CliBench indicates a remarkable performance difference between these two platforms. The production 1GHz Athlon beats the 1GHz P4 beta platform by up to 2.4x. On average, all tests taken together, the delta is 1.5x in favor of Athlon.

A Japanese website which is host to a public domain MP3 audio file encoder has tested numerous platforms using their high-speed encoder. Among the test systems recorded are high speed Athlons, P3s and a 1.4GHz P4. Tests were conducted by different individuals under different operating systems, so the comparison is a little less than scientific as a platform benchmark - but the data could be useful to get an idea of what to expect from the P4.

GOGO-no- coda ver 2.36 (MP3) Athlon
1.0GHz
P4
1.4GHz
P4 vs
Athlon
77.3 91.1 17.8%

The results indicate that a 1.4GHz P4 outperforms a 1GHz Athlon by about 18%. This is not a large margin considering the clock speed deltas involved. If we were to assume that this encoder scales well with clock speed and/or with memory bandwidth improvements, it is quite possible that a 1.2GHz Athlon with DDR might easily overtake the 1.4GHz P4 under this type of application load.

Prev | Next