==Pentium III's SSE implementation==
[[ResimDosya:Pentium III on motherboard.jpg|thumb|Pentium III CPU mounted on a motherboard]]
Since Katmai was built in the same 0.25 µm process as Pentium II "Deschutes", it had to implement SSE using as little silicon as possible. To achieve this goal, Intel implemented the 128-bit architecture by double-cycling the existing 64-bit data paths and by merging the SIMD-FP multiplier unit with the x87 scalar FPU multiplier into a single unit. To utilize the existing 64-bit data paths, Katmai issues each SIMD-FP instruction as two μops. To compensate partially for implementing only half of SSE’s architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.<ref>Diefendorff Keith ([[March 8]], [[1999]]). "Pentium III = Pentium II + SSE: Internet SSE Architecture Boosts Multimedia Performance". Microprocessor Report. Volume 13, Number 3.</ref>