LinuxLists.cc - [PATCH] Introduce nodemask

2004-03-18 23:05:19

Subject: [PATCH] Introduce nodemask_t ADT [0/7]

I've got a fairly good size patch set to implement an ADT (Abstract Data
Type) for nodemasks, which follows the path blazed by wli with the
cpumask_t code. The basic idea is to create a generic,
platform/arch-agnostic nodemask data type, complete with operations to
do most anything you'd want to do with a nodemask. This stops us from
open-coding nodemask operations, allows non-consecutive node numbering
(ie: nodes don't have to be numbered 0...numnodes-1), gets rid of
numnodes entirely (replaced with num_online_nodes()), and will
facilitate the hotplugging of whole nodes.

As I mentioned, the code is heavily based on Bill Irwin's cpumask_t
code. The changes are broken into seven patches:

nodemask_t-01-definitions.patch
The basic definitions of the nodemask_t type: include/linux/nodemask.h,
include/asm-generic/{nodemask.h, nodemask_arith.h, nodemask_array.h,
nodemask_const_reference.h, nodemask_const_value.h, nodemask_nonuma.h},
and some small changes to include/linux/mmzone.h (removing extistant
definition of node_online_map and helper functions).

nodemask_t-02-core.patch
Changes to arch-independent code. Surprisingly few references to
numnodes, open-coded node loops, etc. Most important result of this
patch is that no generic code assumes anything about node numbering.
This allows individual arches to use sparse numbering if they care to.

nodemask_t-03-i386.patch
Changes to i386 specific code. As with most arch changes, it involves
close-coding loops (ie: for_each_online_node(nid) rather than
for(nid=0;nid<numnodes;nid++)) and replacing the use of numnodes with
num_online_nodes() and node_set_online(nid).

nodemask_t-04-ppc64.patch
Changes to ppc64 specific code. Untested. Code review & testing
requested.

nodemask_t-05-x86_64.patch
Changes to x86_64 specific code. Untested. Code review & testing
requested.

nodemask_t-06-ia64.patch
Changes to ia64 specific code. Untested. Code review & testing
requested.

nodemask_t-07-other.patch
Changes to other arch-specific code (alpha, arm, mips, sparc64 & sh).
Untested. Code review & testing requested.

2004-03-19 00:01:41

by Matthew Dobson

[permalink] [raw]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

On Thu, 2004-03-18 at 15:23, Jesse Barnes wrote:
> On Thursday 18 March 2004 3:04 pm, Matthew Dobson wrote:
> > do most anything you'd want to do with a nodemask. This stops us from
> > open-coding nodemask operations, allows non-consecutive node numbering
> > (ie: nodes don't have to be numbered 0...numnodes-1), gets rid of
> > numnodes entirely (replaced with num_online_nodes()), and will
> > facilitate the hotplugging of whole nodes.
>
> My hero! :) I think this has been needed for awhile, but now that I

Anything for a damsel in distress! ;)

> think about it, it begs the question of what a node is. Is it a set
> of CPUs and blocks of memory (that seems to be the most commonly used
> definition in the code), just memory, just CPUs, or what?

There have been arguments about exactly what a node is since there has
been a concept of a node at all. In the kernel, it isn't defined. A
node doesn't *have* to have CPUs on it (see nr_cpus_node()), doesn't
*have* to have memory, doesn't *have* to have I/O. It's supposed to be
just a container for those 3 things, but the containers can be empty.
This code doesn't get into what a node is, just makes sure they're used
properly... ;)

> On sn2
> hardware, we have the concept of a node without CPUs. And due to our
> wacky I/O layout, we also have nodes without CPUs *or* memory! (The
> I/O guys call these "ionodes".)

Yep... I saw both numnodes and numionodes perusing the ia64 code. You
should be able to put these CPU/memless nodes in the node_online_map
now... If there's code that's assuming a node contains either CPUs or
memory, I'd like to find it! :)

> And then of course, there are CPUs
> that aren't particularly close to any memory (i.e. they have none of
> their own, and have to go several hops and/or through other CPUs to
> get at memory at all).

node_distance(from, to)

> I'll take a look at the ia64 bits when I get them (I've only received
> two of the seven patches thus far).
>
> Jesse

Super. I'd really like feedback on the ia64 code (well, actually all
the non-i386 code). I did what I thought was right, but eyes more
familiar with the code are extremely welcome.

Cheers!

-Matt

2004-03-19 00:01:42

by Jesse Barnes

[permalink] [raw]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

On Thursday 18 March 2004 3:43 pm, Martin J. Bligh wrote:
> > It's probably not too late to change this to
> > pcibus_to_nodemask(pci_bus *), or pci_to_nodemask(pci_dev *), there
> > aren't that many callers, are there (my grep is still running)?
>
> It probably shouldn't have anything to do with PCI directly either,
> so .... ;-) My former thought was that you might just want the most
> local memory for DMAing into.

Right, we want local memory (or potentially remote memory) for DMA,
but what about interrupt redirection? Some chipsets don't support
interrupt round robin, and just target interrupts at one CPU. In that
case (and probably the round robin case too), you want to know which
CPU(s) to send the interrupt at. Can't immediately think of other
in-kernel uses though (administrators will of course want to be able
to locate a given PCI device in a multirack system, but that's another
subject--one that Martin Hicks posted on yesterday).

Jesse

2004-03-19 00:05:40

by Matthew Dobson

[permalink] [raw]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

On Thu, 2004-03-18 at 15:32, Martin J. Bligh wrote:
> > On Thursday 18 March 2004 3:04 pm, Matthew Dobson wrote:
> >> do most anything you'd want to do with a nodemask. This stops us from
> >> open-coding nodemask operations, allows non-consecutive node numbering
> >> (ie: nodes don't have to be numbered 0...numnodes-1), gets rid of
> >> numnodes entirely (replaced with num_online_nodes()), and will
> >> facilitate the hotplugging of whole nodes.
> >
> > My hero! :) I think this has been needed for awhile, but now that I
> > think about it, it begs the question of what a node is. Is it a set
> > of CPUs and blocks of memory (that seems to be the most commonly used
> > definition in the code), just memory, just CPUs, or what? On sn2
> > hardware, we have the concept of a node without CPUs. And due to our
> > wacky I/O layout, we also have nodes without CPUs *or* memory! (The
> > I/O guys call these "ionodes".) And then of course, there are CPUs
> > that aren't particularly close to any memory (i.e. they have none of
> > their own, and have to go several hops and/or through other CPUs to
> > get at memory at all).
>
> I think the closest answer we have is that it's a grouping of cpus and
> memory, where either may be NULL.
>
> I/O isn't directly associated with a node, though it should fit into the
> topo infrastructure, to give distances from io buses to nodes (for which
> I think we currently use cpumasks, which is probably wrong in retrospect,
> but then life is tough and flawed ;-))
>
> M.

Yeah... We used cpumasks because that seemed like a good idea at the
time. Nodemasks may be a better choice now... We can write a quicky
inline function nodemask_to_cpumask() as well, to keep the current users
happy.

-Matt

2004-03-18 23:48:31

by Jesse Barnes

[permalink] [raw]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

On Thursday 18 March 2004 3:04 pm, Matthew Dobson wrote:
> do most anything you'd want to do with a nodemask. This stops us from
> open-coding nodemask operations, allows non-consecutive node numbering
> (ie: nodes don't have to be numbered 0...numnodes-1), gets rid of
> numnodes entirely (replaced with num_online_nodes()), and will
> facilitate the hotplugging of whole nodes.

My hero! :) I think this has been needed for awhile, but now that I
think about it, it begs the question of what a node is. Is it a set
of CPUs and blocks of memory (that seems to be the most commonly used
definition in the code), just memory, just CPUs, or what? On sn2
hardware, we have the concept of a node without CPUs. And due to our
wacky I/O layout, we also have nodes without CPUs *or* memory! (The
I/O guys call these "ionodes".) And then of course, there are CPUs
that aren't particularly close to any memory (i.e. they have none of
their own, and have to go several hops and/or through other CPUs to
get at memory at all).

I'll take a look at the ia64 bits when I get them (I've only received
two of the seven patches thus far).

Jesse

2004-03-19 00:22:10

by Martin J. Bligh

[permalink] [raw]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

> On Thursday 18 March 2004 3:04 pm, Matthew Dobson wrote:
>> do most anything you'd want to do with a nodemask. This stops us from
>> open-coding nodemask operations, allows non-consecutive node numbering
>> (ie: nodes don't have to be numbered 0...numnodes-1), gets rid of
>> numnodes entirely (replaced with num_online_nodes()), and will
>> facilitate the hotplugging of whole nodes.
>
> My hero! :) I think this has been needed for awhile, but now that I
> think about it, it begs the question of what a node is. Is it a set
> of CPUs and blocks of memory (that seems to be the most commonly used
> definition in the code), just memory, just CPUs, or what? On sn2
> hardware, we have the concept of a node without CPUs. And due to our
> wacky I/O layout, we also have nodes without CPUs *or* memory! (The
> I/O guys call these "ionodes".) And then of course, there are CPUs
> that aren't particularly close to any memory (i.e. they have none of
> their own, and have to go several hops and/or through other CPUs to
> get at memory at all).

I think the closest answer we have is that it's a grouping of cpus and
memory, where either may be NULL.

I/O isn't directly associated with a node, though it should fit into the
topo infrastructure, to give distances from io buses to nodes (for which
I think we currently use cpumasks, which is probably wrong in retrospect,
but then life is tough and flawed ;-))

M.

2004-03-19 00:22:08

by Jesse Barnes

[permalink] [raw]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

On Thursday 18 March 2004 3:32 pm, Martin J. Bligh wrote:
> I think the closest answer we have is that it's a grouping of cpus and
> memory, where either may be NULL.

Yep, that seems to make the most sense, but then part of me wants to
drop the term node and never use it again :)

> I/O isn't directly associated with a node, though it should fit into the
> topo infrastructure, to give distances from io buses to nodes (for which
> I think we currently use cpumasks, which is probably wrong in retrospect,
> but then life is tough and flawed ;-))

It's probably not too late to change this to
pcibus_to_nodemask(pci_bus *), or pci_to_nodemask(pci_dev *), there
aren't that many callers, are there (my grep is still running)?

Thanks,
Jesse

2004-03-19 00:22:07

by Martin J. Bligh

[permalink] [raw]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

--On Thursday, March 18, 2004 15:37:10 -0800 Jesse Barnes <[email protected]> wrote:

> On Thursday 18 March 2004 3:32 pm, Martin J. Bligh wrote:
>> I think the closest answer we have is that it's a grouping of cpus and
>> memory, where either may be NULL.
>
> Yep, that seems to make the most sense, but then part of me wants to
> drop the term node and never use it again :)

Hey, *I* wasn't the one who started splitting their h/w into wierdo pieces ;-)
Anyway, it's a damned sight shorter than "cpumemset".

>> I/O isn't directly associated with a node, though it should fit into the
>> topo infrastructure, to give distances from io buses to nodes (for which
>> I think we currently use cpumasks, which is probably wrong in retrospect,
>> but then life is tough and flawed ;-))
>
> It's probably not too late to change this to
> pcibus_to_nodemask(pci_bus *), or pci_to_nodemask(pci_dev *), there
> aren't that many callers, are there (my grep is still running)?

It probably shouldn't have anything to do with PCI directly either,
so .... ;-) My former thought was that you might just want the most
local memory for DMAing into.

M.

2004-03-19 00:58:10

by Zwane Mwaikambo

[permalink] [raw]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

On Thu, 18 Mar 2004, Martin J. Bligh wrote:

> >> I/O isn't directly associated with a node, though it should fit into the
> >> topo infrastructure, to give distances from io buses to nodes (for which
> >> I think we currently use cpumasks, which is probably wrong in retrospect,
> >> but then life is tough and flawed ;-))
> >
> > It's probably not too late to change this to
> > pcibus_to_nodemask(pci_bus *), or pci_to_nodemask(pci_dev *), there
> > aren't that many callers, are there (my grep is still running)?
>
> It probably shouldn't have anything to do with PCI directly either,
> so .... ;-) My former thought was that you might just want the most
> local memory for DMAing into.

That knowledge should be in the dma api thing shouldn't it? But in it's
current incarnation i don't see how that's possible.

2004-03-19 01:01:45

Subject: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Attachments:

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]

Subject: Re: [PATCH] Introduce nodemask_t ADT [0/7]