Next Previous Contents
The 21064 and the 21066 have the same (EV4) CPU core. If the same
program is run on a 21064 and a 21066, at the same CPU speed,
then the difference in performance comes only as a result of
system Bcache/memory bandwidth. Any code thread that has a high
hit-rate on the internal caches will perform the same.
There are 2 big performance killers:
- Code that is write-intensive. Even though the 21064 and the
21066 have write buffers to swallow some of the delays, code that
is write-intensive will be throttled by write bandwidth at the
system bus. This arises because the on-chip caches are
write-through.
- Code that wants to treat floats as integers. The Alpha
architecture does not allow register-register transfers from
integer registers to floating point registers. Such a conversion
has to be done via memory (And therefore, because the on-chip
caches are write-through, via the Bcache). (Editor's note: it
seems that both the EV4 and EV45 can perform the conversion
through the primary data cache (Dcache), provided that the memory
is cached already. In such a case, the store in the conversion
sequence will update the Dcache and the subsequent load is, under
certain circumstances, able to read the updated d-cache value,
thus avoiding a costly roundtrip to the Bcache. In particular, it
seems best to execute the stq/ldt or stt/ldq instructions
back-to-back, which is somewhat counter-intuitive.)
If you make the same comparison between a 21064A and a 21066A,
there is an additional factor due to the different Icache and
Dcache sizes between the two chips.
Now, the 21164 solves both these problems: it achieve
much higher system bus bandwidths (despite having the
same number of signal pins - yes, I know it's got about
twice as many pins as a 21064, but all those extra ones are power
and ground! (yes, really!!)) and it has write-back caches. The
only remaining problem is the answer to the question "how much
does it cost?"
Next Previous Contents