2009-06-22 14:05:39

by Erik Jacobson

[permalink] [raw]
Subject: Re: [PATCH 0/3]: Discard reserved PXM bits for SRAT v1

I just wanted to post in to the thread that this patch set solved both
the node numbering (where there was a node0 and a node2) and the issue
where all CPUs were showing up one the first node.

I applied the three patches against 2.6.30-git14.

We look forward to seeing the changes integrated.

On this 2-socket Intel 5500 based system (Nehalem), the /sys stuff looks
good:

$ ls /sys/devices/system/node/node?
/sys/devices/system/node/node0:
cpu0 cpu2 cpulist distance numastat
cpu1 cpu3 cpumap meminfo scan_unevictable_pages

/sys/devices/system/node/node1:
cpu4 cpu6 cpulist distance numastat
cpu5 cpu7 cpumap meminfo scan_unevictable_pages


Where as, before, it looked like this (non-ordered, all CPUs showing up
in node 0).

# ls -ld /sys/devices/system/node/node*
drwxr-xr-x 2 root root 0 2009-06-18 09:30 /sys/devices/system/node/node0
drwxr-xr-x 2 root root 0 2009-06-18 09:30 /sys/devices/system/node/node2

[root@cct201 ~]# ls /sys/devices/system/node/node0
cpu0 cpu2 cpu4 cpu6 cpulist distance numastat
cpu1 cpu3 cpu5 cpu7 cpumap meminfo scan_unevictable_pages


System details:
- SGI Altix XE 270
- Supermicro X8DTN v 1.1 mainboard
- 2 sockets of Xeon X5570


/proc/cpuinfo snip

[root@cct201 erikj]# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 5866.85
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 8212.05
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 5865.76
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 5865.76
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 1
siblings : 4
core id : 0
cpu cores : 4
apicid : 16
initial apicid : 16
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 5828.64
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 1
siblings : 4
core id : 1
cpu cores : 4
apicid : 18
initial apicid : 18
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 5865.81
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 1
siblings : 4
core id : 2
cpu cores : 4
apicid : 20
initial apicid : 20
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 5865.80
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz
stepping : 5
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 1
siblings : 4
core id : 3
cpu cores : 4
apicid : 22
initial apicid : 22
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 5865.80
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:


2009-07-14 16:54:35

by Kurt Garloff

[permalink] [raw]
Subject: [PATCH 0/3] Resend: Discard reserved PXM bits for SRAT v1

Erik,

On Mon, Jun 22, 2009 at 09:05:22AM -0500, Erik Jacobson wrote:
> I just wanted to post in to the thread that this patch set solved both
> the node numbering (where there was a node0 and a node2) and the issue
> where all CPUs were showing up one the first node.
>
> I applied the three patches against 2.6.30-git14.
>
> We look forward to seeing the changes integrated.

So do I.
The patches are obviously already in SUSE kernels, I'd like to see them
in mainline too.

> On this 2-socket Intel 5500 based system (Nehalem), the /sys stuff looks
> good:
>
> $ ls /sys/devices/system/node/node?
> /sys/devices/system/node/node0:
> cpu0 cpu2 cpulist distance numastat
> cpu1 cpu3 cpumap meminfo scan_unevictable_pages
>
> /sys/devices/system/node/node1:
> cpu4 cpu6 cpulist distance numastat
> cpu5 cpu7 cpumap meminfo scan_unevictable_pages
>
>
> Where as, before, it looked like this (non-ordered, all CPUs showing up
> in node 0).
>
> # ls -ld /sys/devices/system/node/node*
> drwxr-xr-x 2 root root 0 2009-06-18 09:30 /sys/devices/system/node/node0
> drwxr-xr-x 2 root root 0 2009-06-18 09:30 /sys/devices/system/node/node2
>
> [root@cct201 ~]# ls /sys/devices/system/node/node0
> cpu0 cpu2 cpu4 cpu6 cpulist distance numastat
> cpu1 cpu3 cpu5 cpu7 cpumap meminfo scan_unevictable_pages
>
>
> System details:
> - SGI Altix XE 270
> - Supermicro X8DTN v 1.1 mainboard
> - 2 sockets of Xeon X5570

OK, you really observe the same here as did FZJ.
Thanks for reporting!

So let me resend the patches, rediffed against current 2.6.31rc git
in followup mails.

Len, Linus, please adopt.

Best,
--
Kurt Garloff, VP OPS Partner Engineering -- Novell Inc.


Attachments:
(No filename) (1.66 kB)
(No filename) (189.00 B)
Download all attachments

2009-07-14 16:56:48

by Kurt Garloff

[permalink] [raw]
Subject: [PATCH 1/3] Resend: [PATCH 1/3]: Store SRAT revision

Hi,

In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregard reserved fields.
In order to know whether to disregard or not, we must know what
version the SRAT table has.

This patch stores the SRAT table revision for later consumption
by arch specific __init functions.

This is patch 1/3.

Signed-off-by: Kurt Garloff <[email protected]>
--
Kurt Garloff, VP OPS Partner Engineering -- Novell Inc.


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2009-07-14 16:57:57

by Kurt Garloff

[permalink] [raw]
Subject: [PATCH 2/3] Resend: x86-64: Handle SRAT v1 and v2 consistently

Hi,

In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregard reserved fields.

x86-64 was rather inconsistent prior to this patch; it used 8 bits
for the pxm field in cpu_affinity, but 32 bits in mem_affinity.
This patch makes it consistent: Either use 8 bits consistently (SRAT
rev 1 or lower) or 32 bits (SRAT rev 2 or higher).

This is patch 2/3.

Signed-off-by: Kurt Garloff <[email protected]>
--
Kurt Garloff, VP OPS Partner Engineering -- Novell Inc.


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2009-07-14 16:58:44

by Kurt Garloff

[permalink] [raw]
Subject: [PATCH 3/3] Resend: Consider SRAT rev on ia64

Hi,

In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregard reserved fields.

ia64 did handle the PXM fields almost consistently, but depending on
sgi's sn2 platform. This patch leaves the sn2 logic in, but does also
use 16/32 bits for PXM if the SRAT has rev 2 or higher.

The patch also adds __init to the two pxm accessor functions, as they
access __initdata now and are called from an __init function only anyway.

Note that the code only uses 16 bits for the PXM field in the processor
proximity field; the patch does not address this as 16 bits are more than
enough.

This is patch 3/3.

Signed-off-by: Kurt Garloff <[email protected]>
--
Kurt Garloff, VP OPS Partner Engineering -- Novell Inc.


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2009-07-17 22:10:51

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 2/3] Resend: x86-64: Handle SRAT v1 and v2 consistently

On Tue, Jul 14, 2009 at 9:58 AM, Kurt Garloff<[email protected]> wrote:
> Hi,
>
> In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
> 32bits for these. The new fields were reserved before.
> According to the ACPI spec, the OS must disregard reserved fields.
>
> x86-64 was rather inconsistent prior to this patch; it used 8 bits
> for the pxm field in cpu_affinity, but 32 bits in mem_affinity.
> This patch makes it consistent: Either use 8 bits consistently (SRAT
> rev 1 or lower) or 32 bits (SRAT rev 2 or higher).
>
> This is patch 2/3.
>
> Signed-off-by: Kurt Garloff <[email protected]>
> --

for 1, 2

Acked-by: Yinghai Lu <[email protected]>

YH