2002-07-14 08:21:08

by BALBIR SINGH

[permalink] [raw]
Subject: Pentium IV cache line size

Hello, All,

Dave Jones sent out a patch about Pentium IV cacheline sizes,
please refer to

http://marc.theaimsgroup.com/?l=linux-kernel&m=100297450316163&w=2

to which Manfred Spraul responded

http://marc.theaimsgroup.com/?l=linux-kernel&m=100299763026680&w=2

I think the patch is correct and should be applied.

>From Pentium IV, System Programming Guide, Section 9.1, Page 9-2,
Table 9-1. Order # 245472


L1 Data Cache - Pentium 4 and Intel Xeon processors: 8 KBytes, 4-way set
associative, 64-byte
cache line size.


L2 Unified Cache - Pentium 4 and Intel Xeon processors: 256 KBytes 8-way
set associative,
sectored, 64-byte cache line size.

The point is that according to the specs both L1 and L2 cacheline sizes
are 64-byte.

Comments,
Balbir


Attachments:
Wipro_Disclaimer.txt (490.00 B)

2002-07-14 20:49:29

by Mark Hahn

[permalink] [raw]
Subject: Re: Pentium IV cache line size

> I think the patch is correct and should be applied.

P4 SMP alignment should be 128B, and that seems to be the main
use of this constant. I'd *love* to see the spurious "L1" removed -
in fact, there's already SMP_CACHE_BYTES, which should be used instead.
maybe just CACHE_SHIFT/BYTES, since alignment is used for uni, too.

perhaps a janitor/trivial patch for 2.5?


> >From Pentium IV, System Programming Guide, Section 9.1, Page 9-2,
> Table 9-1. Order # 245472

dueling references! mine are from the p4 and xeon opt guide. page 7-9:
Key Practices of Thread Synchronization:
...
- Place each synchronization variable alone,
separated by 128 byte or in a separate cache line.

see also table 1.1. I'm not sure it matters whether you consider lines 128B
or 64B; the fact that cacheline reads always happen at 128B is probably
the dominant concern. table 1.1 page 7-18 ("placement of shared
synchonization variable") repeats this.

2002-07-17 00:50:51

by Mikael Pettersson

[permalink] [raw]
Subject: Re: Pentium IV cache line size

On Sun, 14 Jul 2002 16:58:11 -0400 (EDT), Mark Hahn wrote:
> - Place each synchronization variable alone,
> separated by 128 byte or in a separate cache line.
>
>see also table 1.1. I'm not sure it matters whether you consider lines 128B
>or 64B; the fact that cacheline reads always happen at 128B is probably
>the dominant concern. table 1.1 page 7-18 ("placement of shared
>synchonization variable") repeats this.

For SW synchronisation variables this makes sense.

However, I've been in contact with some people doing high-speed routing
with Linux boxes, and they had major performance problems on a dual P4/Xeon.
Somehow, the 128 byte alignment affected how the gigabit NIC driver they
used programmed the NIC, with the effect that buffering of PCI writes
(or something like that, this is from memory) didn't work and performance
dropped like a rock. They manually set the alignment back to 64 bytes in
the NIC driver and performance increased to expected levels.

/Mikael