2005-09-30 15:24:07

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> These patches implement NUMA memory node emulation for regular i386 PC:s.
>
> NUMA emulation could be used to provide coarse-grained memory resource control
> using CPUSETS. Another use is as a test environment for NUMA memory code or
> CPUSETS using an i386 emulator such as QEMU.

This patch set basically allows the "NUMA depends on SMP" dependency to
be removed. I'm not sure this is the right approach. There will likely
never be a real-world NUMA system without SMP. So, this set would seem
to include some increased (#ifdef) complexity for supporting SMP && !
NUMA, which will likely never happen in the real world.

Also, I worry that simply #ifdef'ing things out like CPUsets' update
means that CPUsets lacks some kind of abstraction that it should have
been using in the first place. An #ifdef just papers over the real
problem.

I think it would likely be cleaner if the approach was to emulate an SMP
NUMA system where each NUMA node simply doesn't have all of its CPUs
online.

-- Dave


2005-10-03 02:08:39

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

On 10/1/05, Dave Hansen <[email protected]> wrote:
> On Fri, 2005-09-30 at 16:33 +0900, Magnus Damm wrote:
> > These patches implement NUMA memory node emulation for regular i386 PC:s.
> >
> > NUMA emulation could be used to provide coarse-grained memory resource control
> > using CPUSETS. Another use is as a test environment for NUMA memory code or
> > CPUSETS using an i386 emulator such as QEMU.
>
> This patch set basically allows the "NUMA depends on SMP" dependency to
> be removed. I'm not sure this is the right approach. There will likely
> never be a real-world NUMA system without SMP. So, this set would seem
> to include some increased (#ifdef) complexity for supporting SMP && !
> NUMA, which will likely never happen in the real world.

Yes, this patch set removes "NUMA depends on SMP". It also adds some
simple NUMA emulation code too, but I am sure you are aware of that!
=)

I agree that it is very unlikely to find a single-processor NUMA
system in the real world. So yes, "[PATCH 02/07] i386: numa on
non-smp" adds _some_ extra complexity. But because SMP is set when
supporting more than one cpu, and NUMA is set when supporting more
than one memory node, I see no reason why they should be dependent on
each other. Except that they depend on each other today and breaking
them loose will increase complexity a bit.

> Also, I worry that simply #ifdef'ing things out like CPUsets' update
> means that CPUsets lacks some kind of abstraction that it should have
> been using in the first place. An #ifdef just papers over the real
> problem.

Maybe. CPUSETS has two bitmaps, one for cpus and one for mems. So
depending on SMP or NUMA seems logical to me. Regarding the #ifdef, it
was added because partition_sched_domain() is only implemented for
SMP. That symbol has no prototype or implementation when CONFIG_SMP is
not set. Maybe it is better to add an empty inline function in
linux/sched.h for !SMP?

> I think it would likely be cleaner if the approach was to emulate an SMP
> NUMA system where each NUMA node simply doesn't have all of its CPUs
> online.

Absolutely. And that removes the need for some of my patches. QEMU
runs SMP kernels. It is possible to run SMP kernels on UP hardware.
But there is of course a certain performance loss introduced by all
the SMP locks. I'd rather not force !SMP users to run SMP kernels if
they want coarse-grained memory resource control.

Thanks for your input!

/ magnus

2005-10-03 03:22:11

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

Dave wrote:
> Also, I worry that simply #ifdef'ing things out like CPUsets' update
> means that CPUsets lacks some kind of abstraction that it should have
> been using in the first place.

In the abstract, cpusets should just assume that the system has one or
more CPUs, and one or more Memory Nodes. Ideally, it should not
require either SMP nor NUMA. Indeed, if you (Magnus) can get it
to compile with just one or the other of those two:

config CPUSETS
bool "Cpuset support"
- depends on SMP
+ depends on SMP || NUMA

then I would hope that it would compile with neither. The cpuset
hierarchy on such a system would be rather boring, with all cpusets
having the same one CPU and one Memory Node, but it should work ... in
theory of course.

In practice of course, there may be details on the edges that depend on
the current SMP/NUMA limitations, such as:

Magnus wrote:
> Regarding the #ifdef, it
> was added because partition_sched_domain() is only implemented for
> SMP. That symbol has no prototype or implementation when CONFIG_SMP is
> not set. Maybe it is better to add an empty inline function in
> linux/sched.h for !SMP?

An empty inline partition_sched_domain() would be better than ifdef's
in cpuset.c, yes. Or at least, that's usually the case. Probably here
too.

In theory at least, I applaud Magnus's work here. The assymetry of the
SMP/NUMA define structure has always annoyed me slightly, and only been
explainable in my view as a consequence of the historical order of
development. I had a PC with a second memory board in an ISA slot,
which would qualify as a one CPU, two Memory Node system.

Or what byte us in the future (that PC was a long time ago), the kinks
in the current setup might be a hitch in our side as we extend to
increasingly interesting architectures.

Aside - for those reading this thread on lkml, it originated
on linux-mm. It looks like Dave added lkml to the cc list.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-10-03 05:05:36

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

On 10/3/05, Paul Jackson <[email protected]> wrote:
> Dave wrote:
> > Also, I worry that simply #ifdef'ing things out like CPUsets' update
> > means that CPUsets lacks some kind of abstraction that it should have
> > been using in the first place.
>
> In the abstract, cpusets should just assume that the system has one or
> more CPUs, and one or more Memory Nodes. Ideally, it should not
> require either SMP nor NUMA. Indeed, if you (Magnus) can get it
> to compile with just one or the other of those two:
>
> config CPUSETS
> bool "Cpuset support"
> - depends on SMP
> + depends on SMP || NUMA
>
> then I would hope that it would compile with neither. The cpuset
> hierarchy on such a system would be rather boring, with all cpusets
> having the same one CPU and one Memory Node, but it should work ... in
> theory of course.

I just tested this on top of my patches:
@@ -245,7 +245,6 @@ config IKCONFIG_PROC

config CPUSETS
bool "Cpuset support"
- depends on SMP || NUMA
help

and it seems to work ok in practice too. On a regular !SMP !NUMA PC
anyway. As you note, the hierarchy is not that exciting. =) Anyway,
both SMP || NUMA or nothing seems to work as dependencies. After
partition_sched_domain() gets fixed that is.

> In practice of course, there may be details on the edges that depend on
> the current SMP/NUMA limitations, such as:
>
> Magnus wrote:
> > Regarding the #ifdef, it
> > was added because partition_sched_domain() is only implemented for
> > SMP. That symbol has no prototype or implementation when CONFIG_SMP is
> > not set. Maybe it is better to add an empty inline function in
> > linux/sched.h for !SMP?
>
> An empty inline partition_sched_domain() would be better than ifdef's
> in cpuset.c, yes. Or at least, that's usually the case. Probably here
> too.

I agree.

> In theory at least, I applaud Magnus's work here. The assymetry of the
> SMP/NUMA define structure has always annoyed me slightly, and only been
> explainable in my view as a consequence of the historical order of
> development. I had a PC with a second memory board in an ISA slot,
> which would qualify as a one CPU, two Memory Node system.
>
> Or what byte us in the future (that PC was a long time ago), the kinks
> in the current setup might be a hitch in our side as we extend to
> increasingly interesting architectures.

Nice to hear that you like the idea.

Maybe I should have broken down my patches into three smaller sets:

1) i386: NUMA without SMP
2) CPUSETS: NUMA || SMP
3) i386: NUMA emulation

If people like 1) then it's probably a good idea to convert other
architectures too. Both 2) and 3) above are separate but related
issues. And now seems like a good time to solve 2).

So, Paul, please let me know if you prefer SMP || NUMA or no
depencencies in the Kconfig. When I know that I will create a new
patch that hopefully can get into -mm later on.

> Aside - for those reading this thread on lkml, it originated
> on linux-mm. It looks like Dave added lkml to the cc list.

Huh? I sent my patches both to lkml and linux-mm...

Thank you for the feedback!

/ magnus

2005-10-03 05:32:40

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

Hi,

> > In theory at least, I applaud Magnus's work here. The assymetry of the
> > SMP/NUMA define structure has always annoyed me slightly, and only been
> > explainable in my view as a consequence of the historical order of
> > development. I had a PC with a second memory board in an ISA slot,
> > which would qualify as a one CPU, two Memory Node system.
> >
> > Or what byte us in the future (that PC was a long time ago), the kinks
> > in the current setup might be a hitch in our side as we extend to
> > increasingly interesting architectures.
>
> Nice to hear that you like the idea.
>
> Maybe I should have broken down my patches into three smaller sets:
>
> 1) i386: NUMA without SMP
> 2) CPUSETS: NUMA || SMP
> 3) i386: NUMA emulation
>
> If people like 1) then it's probably a good idea to convert other
> architectures too. Both 2) and 3) above are separate but related
> issues. And now seems like a good time to solve 2).
>
> So, Paul, please let me know if you prefer SMP || NUMA or no
> depencencies in the Kconfig. When I know that I will create a new
> patch that hopefully can get into -mm later on.

The latter seems a good idea to me if you're going to enhance CPUSETS
acceptable for CPUMETER or something like that.

Thanks.

2005-10-03 05:34:08

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

Magnus wrote:
> So, Paul, please let me know if you prefer SMP || NUMA or no
> depencencies in the Kconfig.

In theory, I prefer none. But the devil is in the details here,
and I really don't care that much.

So pick whichever you prefer, or whichever provides the nicest
looking code or patch, or flip a coin ;).

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-10-03 05:34:54

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

Magnus wrote:
> I sent my patches both to lkml and linux-mm...

Must be confusion on my end then. Sorry.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-10-03 05:59:34

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

On 10/3/05, Paul Jackson <[email protected]> wrote:
> Magnus wrote:
> > So, Paul, please let me know if you prefer SMP || NUMA or no
> > depencencies in the Kconfig.
>
> In theory, I prefer none. But the devil is in the details here,
> and I really don't care that much.
>
> So pick whichever you prefer, or whichever provides the nicest
> looking code or patch, or flip a coin ;).

I'm tempted to consult the magic eight-ball, but I think I will stick
with the advice from Takahashi-san instead. =) So, the dependency will
be removed.

/ magnus

2005-10-03 07:27:14

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 00/07][RFC] i386: NUMA emulation

Magnus wrote:
> I think I will stick with the advice from Takahashi-san

Yes - Takahashi-san gives much better advice than an eight-ball.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401