Subject: x86/NUMA: Reason for ignoring too small NUMA nodes?

Hi Andi,

while experimenting with a system with a memory-less NUMA node I
stumbled upon code in the Linux kernel which ignores nodes containing
less than a certain amount of RAM, obviously to fix systems with a buggy
BIOS.
Can you elaborate on this? What kind of incorrect entry have you seen?
To correctly map the memory less node I did a patch to accept at least
nodes with exactly zero bytes of memory (read: no SRAT memory entry),
was this special condition also present in the buggy machines?
Another comments reads:
/*
* Don't confuse VM with a node that doesn't have the
* minimum amount of memory:
*/
Is that still a valid statement? How can the VM get confused by a node
with already exhausted memory resources?
(found in arch/x86/mm/{srat,numa}_64.c)

I'd be grateful for some hints!

Thanks,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


2009-11-09 12:38:21

by Andi Kleen

[permalink] [raw]
Subject: Re: x86/NUMA: Reason for ignoring too small NUMA nodes?

On Mon, Nov 09, 2009 at 01:27:15PM +0100, Andre Przywara wrote:
> while experimenting with a system with a memory-less NUMA node I stumbled
> upon code in the Linux kernel which ignores nodes containing less than a
> certain amount of RAM, obviously to fix systems with a buggy BIOS.
> Can you elaborate on this? What kind of incorrect entry have you seen?
> To correctly map the memory less node I did a patch to accept at least
> nodes with exactly zero bytes of memory (read: no SRAT memory entry), was
> this special condition also present in the buggy machines?

It was a misparsed numa node, not zero. I don't remember if
the bug was in Linux or in the BIOS. This was a sanity check
to catch all such cases. I haven't seen misparsed nodes for quite some
time, so in theory it could be removed I guess.

Zero size node were back then not supported in the VM. I still think
the concept doesn't make too much sense: a memory range without
memory (and it bitrots all the time even today, see recent patches)

-Andi