Until recently Intel had always been the dominant force in the desktop CPU market, especially amongst power users, as their chips have always been renowned for their high performance.
Unfortunately these high performing chips carried a significant price tag. Intel's main rival, AMD eventually began to churn out chips that were the equals of their Intel counterparts. With the inclusion of "3DNow!", AMD chips were finally proving themselves in the hardest of all performance arenas - games. And thanks to lower pricing, AMD's K6 chips were selling extremely well, posing a threat to Intel's market dominance.
Intel's response was to release the Celeron. These budget CPUs would turn out to be one of the most popular chips available, but not for the reasons that Intel had intended...
Thanks to an impeccably high manufacturing quality and the lack of the off-die Level 2 cache found on Pentium II CPUs, the Celeron was incredibly easy to overclock. The Celeron normally ran with a 66Mhz front side bus speed, but it was possible to push the chip to run at 100Mhz instead. This provided a 50% increase in core clock speed, which in turn gave rise to some astounding performance increases.
Of course, the early Celerons were limited by their lack of L2 cache, which seriously hampered performance as most of the data required by the CPU had to be brought in from the main memory all the time. At 66MHz this is a very slow procedure, and it hurt the chips performance. Even at 100Mhz there was a noticeable difference in speed between a Celeron and the equivalent Pentium II.
Enter The 300A
Eventually Intel managed to squeeze 128Kb of L2 cache on to the processor die itself though, producing the famous Celeron 300A processor. Thanks to this on-die cache, the performance of the overclocked Celerons was now equal to, and sometimes better than, that of similarly clocked Pentium II CPUs!
This was all thanks to the smaller but much faster cache on the Celeron. The Pentium II has 512Kb of cache that runs at half the speed of the CPU core. So at 300Mhz, the cache on a Pentium II is running at 150MHz. On a 300MHz Celeron though the cache is running at the full 300MHz.
This difference in clock speed, along with an increased associativity (a term that describes how data is handled by the CPU), made the Celeron an extremely good performer.
It wasn't until recently that Intel made changes to its processors to redress this balance. With the move from the old 0.25 micron technology to a new 0.18 micron process, it was now possible to pack more transistors into a smaller space.
The Coppermine core was born, and used in the Pentium III E. This featured 256Kb of on-die L2 cache, which had a higher associativity and a wider data bus. Instead of the old 64-bit bus width, the new Coppermine cache could use a 256-bit transfer bus, which increases the efficiency of the caching process.
The Celeron II has also been forged from this process, and as we saw in our article last week, it is just as overclockable as ever. As with the Celeron 300A, the front side bus speed of the new 566MHz Celeron II can be increased from the standard 66Mhz to 100Mhz, providing a 50% increase in clock speed, in this case to 850Mhz.
Strangely though, we have not seen the same kind of performance from the new Celeron II as we saw with the Celeron 300A. A Celeron II running at 850MHz can barely match a Pentium III 600E in some real world tests. Lets take a look at why this might be the case...
The cores themselves are both forged on the same 0.18 micron process, and it has been said that the Celeron II core is in fact a Pentium III E core with half the cache disabled.
This has been hinted at by Intel, and it would make a great deal of sense. Instead of requiring a new production line, they can just take chips from the existing Coppermine process, and modify them so that half the cache doesn't work. This may seem like a waste of money, but it is far more efficient for Intel to be able to churn out one chip and modify it later, than it is for them to have two separate production lines.
Okay, so they have disabled half the cache. But the associativity and bus width of the remaining cache is unchanged. It is therefore possible to say that in the majority of tests the Celeron II should perform at a level near that of the equivalent Pentium III E, especially in applications that aren't particularly heavy on the cache.
With SiSoft's SANDRA benchmarking utility, the raw performance figures come out at the same levels when comparing the two chips. This particular test has no need to utilise the L2 cache on either chip, and so proves that, excluding the L2 cache, both processor cores are essentially the same.
The use of "Quake 3 : Arena" as a benchmark has shown that the Celeron II is significantly slower than an equivalent Pentium III E though. This tends to imply that Quake 3 is potentially a more cache happy application, and seems to favour 256Kb over 128Kb. Using 3DMark 2000 has also shown that there is some speed difference between the two chips, with the Celeron II overclocked to 850Mhz performing at about the same level as a Pentium III 700E. Once again this shows that potentially 3DMark 2000 is happier with a larger cache.
Cache In Hand
If we move away from performance orientated benchmarks towards diagnostic programs, we can see that despite this real world performance difference, there is virtually no difference between the two CPUs apart from the size of their L2 cache.
Using the CacheMem benchmark, the following numbers were obtained -
It is interesting to see here that there is very little difference in terms of cache bandwidth between the Pentium III E and the Celeron II. Both L1 and L2 caches are equally effective on the two chips. At 256Kb though it becomes very clear that, while the Pentium III E can still pull about 3Gb/sec from its cache, the Celeron II has run out of memory and must now go to the main memory on the motherboard. This explains why the read bandwidth of the Celeron II drops to about 750Mb/sec, and the number of clock cycles required to complete the operation increases to eight.
Looking at the latency results, it is also very plain to see that there is no difference between the two chips until they reach the magic 256Kb mark. Both caches are operating with exactly the same latency until the Celeron II has to jump to using the much slower main memory, at which point the increase in latency is both obvious and expected.
It really looks like Intel have just halved the cache on the chip, and that's it. There doesn't seem to be anything more mysterious that explains the huge performance difference, at least nothing that Intel has done.
And the larger the cache the better. We have recently seen that the "SETI@Home" client occupies 384Kb, which is too large for the Pentium III E cache, but not for the old 512Kb cache found on earlier chips. It is hardly surprising then that the chips with larger caches perform better at SETI, even at lower clock speeds.
So does this explain why the Celeron II is slower? To some extent, yes. With programs like SETI highlighting the performance difference that exists due to cache size differences, it is entirely possible that some day-to-day apps and games also have certain minimum cache expectations, most of which seem to surpass the 128Kb of the Celeron II but not the 256Kb of the Pentium III E core. It certainly seems to be the case that the average program requires 256Kb for efficient operation.
It is interesting to note the Quake 3 results though. The original Celeron 300A, when overclocked to 450MHz, managed to rival the equivalent Pentium II in Quake 2, yet strangely the Celeron II at 850Mhz can't even match a Pentium III 700E, which certainly highlights the difference in cache requirements for the two games.
If this is any indication of things to come, how long will it be before 256Kb isn't enough, and today's Pentium III E processors become tomorrow's Celerons?