2020-10-21 09:05:40

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes


Hi,

Nit on the subject: this only increases the default, the max is still 2¹⁰.

On 20/10/20 18:34, Vanshidhar Konda wrote:
> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
> reach or exceed 16. Increase the number to 64 (matching x86_64).
>
> Signed-off-by: Vanshidhar Konda <[email protected]>
> ---
> arch/arm64/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 893130ce1626..3e69d3c981be 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -980,7 +980,7 @@ config NUMA
> config NODES_SHIFT
> int "Maximum NUMA Nodes (as a power of 2)"
> range 1 10
> - default "2"
> + default "6"

This leads to more statically allocated memory for things like node to CPU
maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
issue.

AIUI this also directly correlates to how many more page->flags bits are
required: are we sure the max 10 works on any aarch64 platform? I'm
genuinely asking here, given that I'm mostly a stranger to the mm
world. The default should be something we're somewhat confident works
everywhere.

> depends on NEED_MULTIPLE_NODES
> help
> Specify the maximum number of NUMA Nodes available on the target


2020-10-21 12:25:16

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes



On 10/20/2020 11:39 PM, Valentin Schneider wrote:
>
> Hi,
>
> Nit on the subject: this only increases the default, the max is still 2¹⁰.

Agreed.

>
> On 20/10/20 18:34, Vanshidhar Konda wrote:
>> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
>> reach or exceed 16. Increase the number to 64 (matching x86_64).
>>
>> Signed-off-by: Vanshidhar Konda <[email protected]>
>> ---
>> arch/arm64/Kconfig | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 893130ce1626..3e69d3c981be 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -980,7 +980,7 @@ config NUMA
>> config NODES_SHIFT
>> int "Maximum NUMA Nodes (as a power of 2)"
>> range 1 10
>> - default "2"
>> + default "6"
>
> This leads to more statically allocated memory for things like node to CPU
> maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
> issue.

The smaller systems should not be required to waste those memory in
a default case, unless there is a real and available larger system
with those increased nodes.

>
> AIUI this also directly correlates to how many more page->flags bits are
> required: are we sure the max 10 works on any aarch64 platform? I'm

We will have to test that. Besides 256 (2 ^ 8) is the first threshold
to be crossed here.

> genuinely asking here, given that I'm mostly a stranger to the mm
> world. The default should be something we're somewhat confident works
> everywhere.

Agreed. Do we really need to match X86 right now ? Do we really have
systems that has 64 nodes ? We should not increase the default node
value and then try to solve some new problems, when there might not
be any system which could even use that. I would suggest increase
NODES_SHIFT value upto as required by a real and available system.

>
>> depends on NEED_MULTIPLE_NODES
>> help
>> Specify the maximum number of NUMA Nodes available on the target
>

2020-10-21 13:43:55

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes

On Wed, 21 Oct 2020 09:43:21 +0530
Anshuman Khandual <[email protected]> wrote:

> On 10/20/2020 11:39 PM, Valentin Schneider wrote:
> >
> > Hi,
> >
> > Nit on the subject: this only increases the default, the max is still 2¹⁰.
>
> Agreed.
>
> >
> > On 20/10/20 18:34, Vanshidhar Konda wrote:
> >> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
> >> reach or exceed 16. Increase the number to 64 (matching x86_64).
> >>
> >> Signed-off-by: Vanshidhar Konda <[email protected]>
> >> ---
> >> arch/arm64/Kconfig | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index 893130ce1626..3e69d3c981be 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -980,7 +980,7 @@ config NUMA
> >> config NODES_SHIFT
> >> int "Maximum NUMA Nodes (as a power of 2)"
> >> range 1 10
> >> - default "2"
> >> + default "6"
> >
> > This leads to more statically allocated memory for things like node to CPU
> > maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
> > issue.
>
> The smaller systems should not be required to waste those memory in
> a default case, unless there is a real and available larger system
> with those increased nodes.
>
> >
> > AIUI this also directly correlates to how many more page->flags bits are
> > required: are we sure the max 10 works on any aarch64 platform? I'm
>
> We will have to test that. Besides 256 (2 ^ 8) is the first threshold
> to be crossed here.
>
> > genuinely asking here, given that I'm mostly a stranger to the mm
> > world. The default should be something we're somewhat confident works
> > everywhere.
>
> Agreed. Do we really need to match X86 right now ? Do we really have
> systems that has 64 nodes ? We should not increase the default node
> value and then try to solve some new problems, when there might not
> be any system which could even use that. I would suggest increase
> NODES_SHIFT value upto as required by a real and available system.

I'm not going to give precise numbers on near future systems but it is public
that we ship 8 NUMA node ARM64 systems today. Things will get more
interesting as CXL and CCIX enter the market on ARM systems,
given chances are every CXL device will look like another NUMA
node (CXL spec says they should be presented as such) and you
may be able to rack up lots of them.

So I'd argue minimum that makes sense today is 16 nodes, but looking forward
even a little and 64 is not a great stretch.
I'd make the jump to 64 so we can forget about this again for a year or two.
People will want to run today's distros on these new machines and we'd
rather not have to go around all the distros asking them to carry a patch
increasing this count (I assume they are already carrying such a patch
due to those 8 node systems)

Jonathan

>
> >
> >> depends on NEED_MULTIPLE_NODES
> >> help
> >> Specify the maximum number of NUMA Nodes available on the target
> >
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


2020-10-22 05:18:18

by Vanshidhar Konda

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes

On Tue, Oct 20, 2020 at 07:09:36PM +0100, Valentin Schneider wrote:
>
>Hi,
>
>Nit on the subject: this only increases the default, the max is still 2?????.
>
>On 20/10/20 18:34, Vanshidhar Konda wrote:
>> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
>> reach or exceed 16. Increase the number to 64 (matching x86_64).
>>
>> Signed-off-by: Vanshidhar Konda <[email protected]>
>> ---
>> arch/arm64/Kconfig | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 893130ce1626..3e69d3c981be 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -980,7 +980,7 @@ config NUMA
>> config NODES_SHIFT
>> int "Maximum NUMA Nodes (as a power of 2)"
>> range 1 10
>> - default "2"
>> + default "6"
>
>This leads to more statically allocated memory for things like node to CPU
>maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
>issue.
>
>AIUI this also directly correlates to how many more page->flags bits are
>required: are we sure the max 10 works on any aarch64 platform? I'm

I created an experimental setup in which I enabled 1024 NUMA nodes in
SRAT, SLIT and configured NODES_SHIFT=10 for the kernel. 1022 of these
nodes were memory-only NUMA nodes. This configuration booted and
recognized the NUMA nodes correctly.

>genuinely asking here, given that I'm mostly a stranger to the mm
>world. The default should be something we're somewhat confident works
>everywhere.
>
>> depends on NEED_MULTIPLE_NODES
>> help
>> Specify the maximum number of NUMA Nodes available on the target

2020-10-22 07:39:03

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes


Hi,

On 21/10/20 12:02, Jonathan Cameron wrote:
> On Wed, 21 Oct 2020 09:43:21 +0530
> Anshuman Khandual <[email protected]> wrote:
>>
>> Agreed. Do we really need to match X86 right now ? Do we really have
>> systems that has 64 nodes ? We should not increase the default node
>> value and then try to solve some new problems, when there might not
>> be any system which could even use that. I would suggest increase
>> NODES_SHIFT value upto as required by a real and available system.
>
> I'm not going to give precise numbers on near future systems but it is public
> that we ship 8 NUMA node ARM64 systems today. Things will get more
> interesting as CXL and CCIX enter the market on ARM systems,
> given chances are every CXL device will look like another NUMA
> node (CXL spec says they should be presented as such) and you
> may be able to rack up lots of them.
>
> So I'd argue minimum that makes sense today is 16 nodes, but looking forward
> even a little and 64 is not a great stretch.
> I'd make the jump to 64 so we can forget about this again for a year or two.
> People will want to run today's distros on these new machines and we'd
> rather not have to go around all the distros asking them to carry a patch
> increasing this count (I assume they are already carrying such a patch
> due to those 8 node systems)
>

I agree that 4 nodes is somewhat anemic; I've had to bump that just to
run some scheduler tests under QEMU. However I still believe we should
exercise caution before cranking it too high, especially when seeing things
like:

ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid")

To give some numbers, a defconfig build gives me:

SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0
BITS_PER_LONG=64 NR_PAGEFLAGS=24

IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be
plenty, however this can get cramped fairly easily with any combination of:

CONFIG_SPARSEMEM_VMEMMAP=n (-18)
CONFIG_IDLE_PAGE_TRACKING=y (-2)
CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8)

Taking Arnd's above example, a randconfig build picking !VMEMMAP already
limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within
the flags (it gets a dedicated field at the tail of struct page
otherwise). If that is something we don't care too much about, then
consider my concerns taken care of.


One more thing though: NR_CPUS can be cranked up to 4096 but we've only set
it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with
Anshuman in that we should set it to match the max encountered on platforms
that are in use right now.

> Jonathan
>
>>
>> >
>> >> depends on NEED_MULTIPLE_NODES
>> >> help
>> >> Specify the maximum number of NUMA Nodes available on the target
>> >
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2020-10-22 12:45:44

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes

On 2020-10-21 12:02, Jonathan Cameron wrote:
> On Wed, 21 Oct 2020 09:43:21 +0530
> Anshuman Khandual <[email protected]> wrote:
>
>> On 10/20/2020 11:39 PM, Valentin Schneider wrote:
>>>
>>> Hi,
>>>
>>> Nit on the subject: this only increases the default, the max is still 2¹⁰.
>>
>> Agreed.
>>
>>>
>>> On 20/10/20 18:34, Vanshidhar Konda wrote:
>>>> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
>>>> reach or exceed 16. Increase the number to 64 (matching x86_64).
>>>>
>>>> Signed-off-by: Vanshidhar Konda <[email protected]>
>>>> ---
>>>> arch/arm64/Kconfig | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 893130ce1626..3e69d3c981be 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -980,7 +980,7 @@ config NUMA
>>>> config NODES_SHIFT
>>>> int "Maximum NUMA Nodes (as a power of 2)"
>>>> range 1 10
>>>> - default "2"
>>>> + default "6"
>>>
>>> This leads to more statically allocated memory for things like node to CPU
>>> maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
>>> issue.
>>
>> The smaller systems should not be required to waste those memory in
>> a default case, unless there is a real and available larger system
>> with those increased nodes.
>>
>>>
>>> AIUI this also directly correlates to how many more page->flags bits are
>>> required: are we sure the max 10 works on any aarch64 platform? I'm
>>
>> We will have to test that. Besides 256 (2 ^ 8) is the first threshold
>> to be crossed here.
>>
>>> genuinely asking here, given that I'm mostly a stranger to the mm
>>> world. The default should be something we're somewhat confident works
>>> everywhere.
>>
>> Agreed. Do we really need to match X86 right now ? Do we really have
>> systems that has 64 nodes ? We should not increase the default node
>> value and then try to solve some new problems, when there might not
>> be any system which could even use that. I would suggest increase
>> NODES_SHIFT value upto as required by a real and available system.
>
> I'm not going to give precise numbers on near future systems but it is public
> that we ship 8 NUMA node ARM64 systems today. Things will get more
> interesting as CXL and CCIX enter the market on ARM systems,
> given chances are every CXL device will look like another NUMA
> node (CXL spec says they should be presented as such) and you
> may be able to rack up lots of them.
>
> So I'd argue minimum that makes sense today is 16 nodes, but looking forward
> even a little and 64 is not a great stretch.
> I'd make the jump to 64 so we can forget about this again for a year or two.
> People will want to run today's distros on these new machines and we'd
> rather not have to go around all the distros asking them to carry a patch
> increasing this count (I assume they are already carrying such a patch
> due to those 8 node systems)

Nit: I doubt any sane distro is going to carry a patch to adjust the
*default* value of a Kconfig option. They might tune the actual value in
their config, but, well, isn't that the whole point of configs? ;)

Robin.

>
> Jonathan
>
>>
>>>
>>>> depends on NEED_MULTIPLE_NODES
>>>> help
>>>> Specify the maximum number of NUMA Nodes available on the target
>>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2020-10-29 13:39:25

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes

On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote:
> On 21/10/20 12:02, Jonathan Cameron wrote:
> > On Wed, 21 Oct 2020 09:43:21 +0530
> > Anshuman Khandual <[email protected]> wrote:
> >> Agreed. Do we really need to match X86 right now ? Do we really have
> >> systems that has 64 nodes ? We should not increase the default node
> >> value and then try to solve some new problems, when there might not
> >> be any system which could even use that. I would suggest increase
> >> NODES_SHIFT value upto as required by a real and available system.
> >
> > I'm not going to give precise numbers on near future systems but it is public
> > that we ship 8 NUMA node ARM64 systems today. Things will get more
> > interesting as CXL and CCIX enter the market on ARM systems,
> > given chances are every CXL device will look like another NUMA
> > node (CXL spec says they should be presented as such) and you
> > may be able to rack up lots of them.
> >
> > So I'd argue minimum that makes sense today is 16 nodes, but looking forward
> > even a little and 64 is not a great stretch.
> > I'd make the jump to 64 so we can forget about this again for a year or two.
> > People will want to run today's distros on these new machines and we'd
> > rather not have to go around all the distros asking them to carry a patch
> > increasing this count (I assume they are already carrying such a patch
> > due to those 8 node systems)
>
> I agree that 4 nodes is somewhat anemic; I've had to bump that just to
> run some scheduler tests under QEMU. However I still believe we should
> exercise caution before cranking it too high, especially when seeing things
> like:
>
> ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid")
>
> To give some numbers, a defconfig build gives me:
>
> SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0
> BITS_PER_LONG=64 NR_PAGEFLAGS=24
>
> IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be
> plenty, however this can get cramped fairly easily with any combination of:
>
> CONFIG_SPARSEMEM_VMEMMAP=n (-18)
> CONFIG_IDLE_PAGE_TRACKING=y (-2)
> CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8)
>
> Taking Arnd's above example, a randconfig build picking !VMEMMAP already
> limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within
> the flags (it gets a dedicated field at the tail of struct page
> otherwise). If that is something we don't care too much about, then
> consider my concerns taken care of.

I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be
disabled but the option is in the core mm/Kconfig file. We could make
NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well
but hopefully that's going away soon).

> One more thing though: NR_CPUS can be cranked up to 4096 but we've only set
> it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with
> Anshuman in that we should set it to match the max encountered on platforms
> that are in use right now.

I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If
distros have a 10-year view, they can always ship a kernel configured to
64 nodes, no need to change Kconfig (distros never ship with defconfig).

It may have an impact on more memory constrained platforms but that's
not what defconfig is about. It should allow existing hardware to run
Linux but not necessarily run it in the most efficient way possible.

--
Catalin

2020-10-29 19:51:31

by Vanshidhar Konda

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes

On Thu, Oct 29, 2020 at 01:37:10PM +0000, Catalin Marinas wrote:
>On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote:
>> On 21/10/20 12:02, Jonathan Cameron wrote:
>> > On Wed, 21 Oct 2020 09:43:21 +0530
>> > Anshuman Khandual <[email protected]> wrote:
>> >> Agreed. Do we really need to match X86 right now ? Do we really have
>> >> systems that has 64 nodes ? We should not increase the default node
>> >> value and then try to solve some new problems, when there might not
>> >> be any system which could even use that. I would suggest increase
>> >> NODES_SHIFT value upto as required by a real and available system.
>> >
>> > I'm not going to give precise numbers on near future systems but it is public
>> > that we ship 8 NUMA node ARM64 systems today. Things will get more
>> > interesting as CXL and CCIX enter the market on ARM systems,
>> > given chances are every CXL device will look like another NUMA
>> > node (CXL spec says they should be presented as such) and you
>> > may be able to rack up lots of them.
>> >
>> > So I'd argue minimum that makes sense today is 16 nodes, but looking forward
>> > even a little and 64 is not a great stretch.
>> > I'd make the jump to 64 so we can forget about this again for a year or two.
>> > People will want to run today's distros on these new machines and we'd
>> > rather not have to go around all the distros asking them to carry a patch
>> > increasing this count (I assume they are already carrying such a patch
>> > due to those 8 node systems)
>>
>> I agree that 4 nodes is somewhat anemic; I've had to bump that just to
>> run some scheduler tests under QEMU. However I still believe we should
>> exercise caution before cranking it too high, especially when seeing things
>> like:
>>
>> ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid")
>>
>> To give some numbers, a defconfig build gives me:
>>
>> SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0
>> BITS_PER_LONG=64 NR_PAGEFLAGS=24
>>
>> IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be
>> plenty, however this can get cramped fairly easily with any combination of:
>>
>> CONFIG_SPARSEMEM_VMEMMAP=n (-18)
>> CONFIG_IDLE_PAGE_TRACKING=y (-2)
>> CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8)
>>
>> Taking Arnd's above example, a randconfig build picking !VMEMMAP already
>> limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within
>> the flags (it gets a dedicated field at the tail of struct page
>> otherwise). If that is something we don't care too much about, then
>> consider my concerns taken care of.
>
>I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be
>disabled but the option is in the core mm/Kconfig file. We could make
>NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well
>but hopefully that's going away soon).
>
>> One more thing though: NR_CPUS can be cranked up to 4096 but we've only set
>> it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with
>> Anshuman in that we should set it to match the max encountered on platforms
>> that are in use right now.
>
>I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If
>distros have a 10-year view, they can always ship a kernel configured to
>64 nodes, no need to change Kconfig (distros never ship with defconfig).
>
>It may have an impact on more memory constrained platforms but that's
>not what defconfig is about. It should allow existing hardware to run
>Linux but not necessarily run it in the most efficient way possible.
>

From the discussion it looks like 4 is an acceptable number to support
current hardware. I'll send a patch with NODES_SHIFT set to 4. Is it
still possible to add this change to the 5.10 kernel?

Vanshi

>--
>Catalin

2020-10-30 10:25:14

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH] arm64: NUMA: Kconfig: Increase max number of nodes

On Thu, Oct 29, 2020 at 12:48:50PM -0700, Vanshidhar Konda wrote:
> On Thu, Oct 29, 2020 at 01:37:10PM +0000, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote:
> > > On 21/10/20 12:02, Jonathan Cameron wrote:
> > > > On Wed, 21 Oct 2020 09:43:21 +0530
> > > > Anshuman Khandual <[email protected]> wrote:
> > > >> Agreed. Do we really need to match X86 right now ? Do we really have
> > > >> systems that has 64 nodes ? We should not increase the default node
> > > >> value and then try to solve some new problems, when there might not
> > > >> be any system which could even use that. I would suggest increase
> > > >> NODES_SHIFT value upto as required by a real and available system.
> > > >
> > > > I'm not going to give precise numbers on near future systems but it is public
> > > > that we ship 8 NUMA node ARM64 systems today. Things will get more
> > > > interesting as CXL and CCIX enter the market on ARM systems,
> > > > given chances are every CXL device will look like another NUMA
> > > > node (CXL spec says they should be presented as such) and you
> > > > may be able to rack up lots of them.
> > > >
> > > > So I'd argue minimum that makes sense today is 16 nodes, but looking forward
> > > > even a little and 64 is not a great stretch.
> > > > I'd make the jump to 64 so we can forget about this again for a year or two.
> > > > People will want to run today's distros on these new machines and we'd
> > > > rather not have to go around all the distros asking them to carry a patch
> > > > increasing this count (I assume they are already carrying such a patch
> > > > due to those 8 node systems)
> > >
> > > I agree that 4 nodes is somewhat anemic; I've had to bump that just to
> > > run some scheduler tests under QEMU. However I still believe we should
> > > exercise caution before cranking it too high, especially when seeing things
> > > like:
> > >
> > > ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid")
> > >
> > > To give some numbers, a defconfig build gives me:
> > >
> > > SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0
> > > BITS_PER_LONG=64 NR_PAGEFLAGS=24
> > >
> > > IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be
> > > plenty, however this can get cramped fairly easily with any combination of:
> > >
> > > CONFIG_SPARSEMEM_VMEMMAP=n (-18)
> > > CONFIG_IDLE_PAGE_TRACKING=y (-2)
> > > CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8)
> > >
> > > Taking Arnd's above example, a randconfig build picking !VMEMMAP already
> > > limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within
> > > the flags (it gets a dedicated field at the tail of struct page
> > > otherwise). If that is something we don't care too much about, then
> > > consider my concerns taken care of.
> >
> > I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be
> > disabled but the option is in the core mm/Kconfig file. We could make
> > NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well
> > but hopefully that's going away soon).
> >
> > > One more thing though: NR_CPUS can be cranked up to 4096 but we've only set
> > > it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with
> > > Anshuman in that we should set it to match the max encountered on platforms
> > > that are in use right now.
> >
> > I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If
> > distros have a 10-year view, they can always ship a kernel configured to
> > 64 nodes, no need to change Kconfig (distros never ship with defconfig).
> >
> > It may have an impact on more memory constrained platforms but that's
> > not what defconfig is about. It should allow existing hardware to run
> > Linux but not necessarily run it in the most efficient way possible.
> >
>
> From the discussion it looks like 4 is an acceptable number to support
> current hardware. I'll send a patch with NODES_SHIFT set to 4. Is it still
> possible to add this change to the 5.10 kernel?

I think we can but I'll leave the decision to Will (and don't forget to
cc the arm64 maintainers on your next post).

--
Catalin