LinuxLists.cc - New NUMA scheduler and hotplug CPU

2004-01-25 23:54:20

Subject: New NUMA scheduler and hotplug CPU

Hi Nick!

Looking at your new scheduler in -mm, it uses cpu_online_map
alot in arch_init_sched_domains. This means with hotplug CPU that it
would need to be modified: certainly possible to do, but messy.

The other option is to use cpu_possible_map to create the full
topology up front, and then it need never change. AFAICT, no other
changes are neccessary: you already check against moving tasks to
offline cpus.

Anyway, I was just porting the hotplug CPU patches over to -mm, and
came across this, so I thought I'd ask.

Thanks!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-26 08:28:11

by Nick Piggin

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

Rusty Russell wrote:

>Hi Nick!
>
> Looking at your new scheduler in -mm, it uses cpu_online_map
>alot in arch_init_sched_domains. This means with hotplug CPU that it
>would need to be modified: certainly possible to do, but messy.
>
> The other option is to use cpu_possible_map to create the full
>topology up front, and then it need never change. AFAICT, no other
>changes are neccessary: you already check against moving tasks to
>offline cpus.
>
>Anyway, I was just porting the hotplug CPU patches over to -mm, and
>came across this, so I thought I'd ask.
>

Hi Rusty,
Yes I'd like to use the cpu_possible_map to create the full
topology straight up. Martin?

2004-01-26 16:34:47

by Martin J. Bligh

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

>> Looking at your new scheduler in -mm, it uses cpu_online_map
>> alot in arch_init_sched_domains. This means with hotplug CPU that it
>> would need to be modified: certainly possible to do, but messy.
>>
>> The other option is to use cpu_possible_map to create the full
>> topology up front, and then it need never change. AFAICT, no other
>> changes are neccessary: you already check against moving tasks to
>> offline cpus.
>>
>> Anyway, I was just porting the hotplug CPU patches over to -mm, and
>> came across this, so I thought I'd ask.
>>
>
> Hi Rusty,
> Yes I'd like to use the cpu_possible_map to create the full
> topology straight up. Martin?

Well isn't it a bad idea to have cpus in the data that are offline?
It'll throw off all your balancing calculations, won't it? You seemed
to be careful to do things like divide the total load on the node by
the number of CPUs on the node, and that'll get totally borked if you
have fake CPUs in there.

To me, it'd make more sense to add the CPUs to the scheduler structures
as they get brought online. I can also imagine machines where you have
a massive (infinite?) variety of possible CPUs that could appear -
like an NUMA box where you could just plug arbitrary numbers of new
nodes in as you wanted.

Moreover, as the CPUs aren't fixed numbers in advance, how are you going
to know which node to put them in, etc? Setting up every possible thing
in advance seems like an infeasible way to do hotplug to me.

M.

2004-01-26 23:01:58

by Nick Piggin

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

Martin J. Bligh wrote:

>>> Looking at your new scheduler in -mm, it uses cpu_online_map
>>>alot in arch_init_sched_domains. This means with hotplug CPU that it
>>>would need to be modified: certainly possible to do, but messy.
>>>
>>> The other option is to use cpu_possible_map to create the full
>>>topology up front, and then it need never change. AFAICT, no other
>>>changes are neccessary: you already check against moving tasks to
>>>offline cpus.
>>>
>>>Anyway, I was just porting the hotplug CPU patches over to -mm, and
>>>came across this, so I thought I'd ask.
>>>
>>>
>>Hi Rusty,
>>Yes I'd like to use the cpu_possible_map to create the full
>>topology straight up. Martin?
>>
>
>Well isn't it a bad idea to have cpus in the data that are offline?
>It'll throw off all your balancing calculations, won't it? You seemed
>to be careful to do things like divide the total load on the node by
>the number of CPUs on the node, and that'll get totally borked if you
>have fake CPUs in there.
>

I think it mostly does a good job at making sure to only take
online cpus into account. If there are places where it doesn't
then it shouldn't be too hard to fix.

>
>To me, it'd make more sense to add the CPUs to the scheduler structures
>as they get brought online. I can also imagine machines where you have
>a massive (infinite?) variety of possible CPUs that could appear -
>like an NUMA box where you could just plug arbitrary numbers of new
>nodes in as you wanted.
>

I guess so, but you'd still need NR_CPUS to be >= that arbitrary
number.

>
>Moreover, as the CPUs aren't fixed numbers in advance, how are you going
>to know which node to put them in, etc? Setting up every possible thing
>in advance seems like an infeasible way to do hotplug to me.
>

Well this would be the problem. I guess its quite possible that
one doesn't know the topology of newly added CPUs before hand.

Well OK, this would require a per architecture function to handle
CPU hotplug. It could possibly just default to arch_init_sched_domains,
and just completely reinitialise everything which would be the simplest.

2004-01-26 23:24:47

by Martin J. Bligh

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

>> Well isn't it a bad idea to have cpus in the data that are offline?
>> It'll throw off all your balancing calculations, won't it? You seemed
>> to be careful to do things like divide the total load on the node by
>> the number of CPUs on the node, and that'll get totally borked if you
>> have fake CPUs in there.
>
> I think it mostly does a good job at making sure to only take
> online cpus into account. If there are places where it doesn't
> then it shouldn't be too hard to fix.

It'd make the code a damned sight simpler and cleaner if you dropped
all that stuff, and updated the structures when you hotplugged a CPU,
which is really the only sensible way to do it anyway ...

For instance, if I remove cpu X, then bring back a new CPU on another node
(or in another HT sibling pair) as CPU X, then you'll need to update all
that stuff anyway. CPUs aren't fixed position in that map - the ordering
handed out is arbitrary.

>> To me, it'd make more sense to add the CPUs to the scheduler structures
>> as they get brought online. I can also imagine machines where you have
>> a massive (infinite?) variety of possible CPUs that could appear -
>> like an NUMA box where you could just plug arbitrary numbers of new
>> nodes in as you wanted.
>
> I guess so, but you'd still need NR_CPUS to be >= that arbitrary
> number.

Yup ... but you don't have to enumerate all possible positions that way.
See Linus' arguement re dynamic device numbers and ISCSI disks, etc.
Same thing applies.

> Well this would be the problem. I guess its quite possible that
> one doesn't know the topology of newly added CPUs before hand.
>
> Well OK, this would require a per architecture function to handle
> CPU hotplug. It could possibly just default to arch_init_sched_domains,
> and just completely reinitialise everything which would be the simplest.

Yeah, it's not trivially simple. But then neither is the rest of CPU
hotplug, to do it right ;-) Requiring CPU hotplug callback hooks does
seem to be the right way to interface with the sched code though ...

M.

2004-01-26 23:41:28

by Nick Piggin

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

Martin J. Bligh wrote:

>>>Well isn't it a bad idea to have cpus in the data that are offline?
>>>It'll throw off all your balancing calculations, won't it? You seemed
>>>to be careful to do things like divide the total load on the node by
>>>the number of CPUs on the node, and that'll get totally borked if you
>>>have fake CPUs in there.
>>>
>>I think it mostly does a good job at making sure to only take
>>online cpus into account. If there are places where it doesn't
>>then it shouldn't be too hard to fix.
>>
>
>It'd make the code a damned sight simpler and cleaner if you dropped
>all that stuff, and updated the structures when you hotplugged a CPU,
>which is really the only sensible way to do it anyway ...
>
>For instance, if I remove cpu X, then bring back a new CPU on another node
>(or in another HT sibling pair) as CPU X, then you'll need to update all
>that stuff anyway. CPUs aren't fixed position in that map - the ordering
>handed out is arbitrary.
>
>
>>>To me, it'd make more sense to add the CPUs to the scheduler structures
>>>as they get brought online. I can also imagine machines where you have
>>>a massive (infinite?) variety of possible CPUs that could appear -
>>>like an NUMA box where you could just plug arbitrary numbers of new
>>>nodes in as you wanted.
>>>
>>I guess so, but you'd still need NR_CPUS to be >= that arbitrary
>>number.
>>
>
>Yup ... but you don't have to enumerate all possible positions that way.
>See Linus' arguement re dynamic device numbers and ISCSI disks, etc.
>Same thing applies.
>
>
>>Well this would be the problem. I guess its quite possible that
>>one doesn't know the topology of newly added CPUs before hand.
>>
>>Well OK, this would require a per architecture function to handle
>>CPU hotplug. It could possibly just default to arch_init_sched_domains,
>>and just completely reinitialise everything which would be the simplest.
>>
>
>Yeah, it's not trivially simple. But then neither is the rest of CPU
>hotplug, to do it right ;-) Requiring CPU hotplug callback hooks does
>seem to be the right way to interface with the sched code though ...
>

OK you've convinced me.

2004-01-26 23:43:58

by Andrew Theurer

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

> >To me, it'd make more sense to add the CPUs to the scheduler structures
> >as they get brought online. I can also imagine machines where you have
> >a massive (infinite?) variety of possible CPUs that could appear -
> >like an NUMA box where you could just plug arbitrary numbers of new
> >nodes in as you wanted.
>
> I guess so, but you'd still need NR_CPUS to be >= that arbitrary
> number.
>
> >Moreover, as the CPUs aren't fixed numbers in advance, how are you going
> >to know which node to put them in, etc? Setting up every possible thing
> >in advance seems like an infeasible way to do hotplug to me.
>
> Well this would be the problem. I guess its quite possible that
> one doesn't know the topology of newly added CPUs before hand.
>
> Well OK, this would require a per architecture function to handle
> CPU hotplug. It could possibly just default to arch_init_sched_domains,
> and just completely reinitialise everything which would be the simplest.

Call me crazy, but why not let the topology be determined via userspace at a
more appropriate time? When you hotplug, you tell it where in the scheduler
to plug it. Have structures in the scheduler which represent the
nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu structs with
simple pointers), but let the topology be built based on user's desires thru
hotplug.

For example, you boot on just the boot cpu, which by default is in the first
node on the first runqueue. All other cpus, whether being "booted" for the
for the first time or hotplugged (maybe now there's really no difference),
the hotplugging tells where the cpu should be, in what node and what
runqueue. HT cpus work even better, because you can hotplug siblings, once
at a time if you wanted, to the same runqueue. Or you have cpus sharing a
die, same thing, lots of choices here. This removes any per-arch updates to
the kernel for things like scheduler topology, and lets them go somewhere
else more easily changes, like userspace.

Forgive me if this sounds stupid; I have not been following the discussion
closely.

2004-01-27 00:12:33

by Nick Piggin

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

Andrew Theurer wrote:

>>>To me, it'd make more sense to add the CPUs to the scheduler structures
>>>as they get brought online. I can also imagine machines where you have
>>>a massive (infinite?) variety of possible CPUs that could appear -
>>>like an NUMA box where you could just plug arbitrary numbers of new
>>>nodes in as you wanted.
>>>
>>I guess so, but you'd still need NR_CPUS to be >= that arbitrary
>>number.
>>
>>
>>>Moreover, as the CPUs aren't fixed numbers in advance, how are you going
>>>to know which node to put them in, etc? Setting up every possible thing
>>>in advance seems like an infeasible way to do hotplug to me.
>>>
>>Well this would be the problem. I guess its quite possible that
>>one doesn't know the topology of newly added CPUs before hand.
>>
>>Well OK, this would require a per architecture function to handle
>>CPU hotplug. It could possibly just default to arch_init_sched_domains,
>>and just completely reinitialise everything which would be the simplest.
>>
>
>Call me crazy, but why not let the topology be determined via userspace at a
>more appropriate time? When you hotplug, you tell it where in the scheduler
>to plug it. Have structures in the scheduler which represent the
>nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu structs with
>simple pointers), but let the topology be built based on user's desires thru
>hotplug.
>

Well isn't userspace's idea of topology just what the kernel tells it?
I'm not sure what it would buy you... but I guess it wouldn't be too
much harder than doing it in kernel, just a matter of making the userspace
API.

BTW. I guess you haven't seen my sched domains code. It can describe
arbitrary topologies.

2004-01-27 00:09:51

by Martin J. Bligh

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

> Call me crazy, but why not let the topology be determined via userspace at a
> more appropriate time? When you hotplug, you tell it where in the scheduler
> to plug it. Have structures in the scheduler which represent the
> nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu structs with
> simple pointers), but let the topology be built based on user's desires thru
> hotplug.

Well, I agree with the "at a more appropriate time" bit. But there's no
real need to make a bunch of complicated stuff out in userspace for this -
we're trying to lay out the scheduler domains according to the hardware
topology of the machine. It's not a userspace namespace or anything.
Having userspace fishing down way deep in hardware specific stuff is
silly - the kernel is there as a hardware abstraction layer.

Now if you wanted to use sched domains for workload management or something
and involve userspace, then yes ... that'd be more appropriate.

> For example, you boot on just the boot cpu, which by default is in the first
> node on the first runqueue. All other cpus, whether being "booted" for the
> for the first time or hotplugged (maybe now there's really no difference),
> the hotplugging tells where the cpu should be, in what node and what
> runqueue. HT cpus work even better, because you can hotplug siblings, once
> at a time if you wanted, to the same runqueue. Or you have cpus sharing a
> die, same thing, lots of choices here. This removes any per-arch updates to
> the kernel for things like scheduler topology, and lets them go somewhere
> else more easily changes, like userspace.

Ummm ... but *none* of that is dictated as policy stuff - it's all just
the hardware layout of the machine. You cannot "decide" as the sysadmin
which node a CPU is in, or which HT sibling it has. It's just there ;-)
The only thing you could possibly dictate is the CPU number you want
assigned to the new CPU, which frankly, I think is pointless - they're
arbitrary tags, and always have been.

M.

2004-01-27 02:16:58

by Andrew Theurer

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

On Monday 26 January 2004 18:09, Martin J. Bligh wrote:
> > For example, you boot on just the boot cpu, which by default is in the
> > first node on the first runqueue. All other cpus, whether being "booted"
> > for the for the first time or hotplugged (maybe now there's really no
> > difference), the hotplugging tells where the cpu should be, in what node
> > and what runqueue. HT cpus work even better, because you can hotplug
> > siblings, once at a time if you wanted, to the same runqueue. Or you
> > have cpus sharing a die, same thing, lots of choices here. This removes
> > any per-arch updates to the kernel for things like scheduler topology,
> > and lets them go somewhere else more easily changes, like userspace.
>
> Ummm ... but *none* of that is dictated as policy stuff - it's all just
> the hardware layout of the machine. You cannot "decide" as the sysadmin
> which node a CPU is in, or which HT sibling it has. It's just there ;-)
> The only thing you could possibly dictate is the CPU number you want
> assigned to the new CPU, which frankly, I think is pointless - they're
> arbitrary tags, and always have been.

How many cpus share a runqueue IMO could be a policy thing. Some HT cpus may
be better sharing a runqueue where others (lots and lots of siblings in one
core) may not.

2004-01-27 02:20:51

by Andrew Theurer

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

On Monday 26 January 2004 18:07, Nick Piggin wrote:
> >>Well OK, this would require a per architecture function to handle
> >>CPU hotplug. It could possibly just default to arch_init_sched_domains,
> >>and just completely reinitialise everything which would be the simplest.
> >
> >Call me crazy, but why not let the topology be determined via userspace at
> > a more appropriate time? When you hotplug, you tell it where in the
> > scheduler to plug it. Have structures in the scheduler which represent
> > the nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu
> > structs with simple pointers), but let the topology be built based on
> > user's desires thru hotplug.
>
> Well isn't userspace's idea of topology just what the kernel tells it?
> I'm not sure what it would buy you... but I guess it wouldn't be too
> much harder than doing it in kernel, just a matter of making the userspace
> API.

Sort of, the cpus to node is pretty much what the kernel says it is, but the
cpu to runqueue mapping IMO is not a clear cut thing.

2004-01-27 02:40:47

by Nick Piggin

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

Andrew Theurer wrote:

>On Monday 26 January 2004 18:07, Nick Piggin wrote:
>
>>>>Well OK, this would require a per architecture function to handle
>>>>CPU hotplug. It could possibly just default to arch_init_sched_domains,
>>>>and just completely reinitialise everything which would be the simplest.
>>>>
>>>Call me crazy, but why not let the topology be determined via userspace at
>>>a more appropriate time? When you hotplug, you tell it where in the
>>>scheduler to plug it. Have structures in the scheduler which represent
>>>the nodes-runqueues-cpus topology (in the past I tried a node/rq/cpu
>>>structs with simple pointers), but let the topology be built based on
>>>user's desires thru hotplug.
>>>
>>Well isn't userspace's idea of topology just what the kernel tells it?
>>I'm not sure what it would buy you... but I guess it wouldn't be too
>>much harder than doing it in kernel, just a matter of making the userspace
>>API.
>>
>
>Sort of, the cpus to node is pretty much what the kernel says it is, but the
>cpu to runqueue mapping IMO is not a clear cut thing.
>
>

But userspace still can't know more than the kernel tells it.
Apart from that, the SMT stuff in the sched domains patch means
SMT CPUs need not share runqueues.

2004-01-27 02:40:36

by Rusty Russell

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

In message <31860000.1075159471@flay> you write:
> > I think it mostly does a good job at making sure to only take
> > online cpus into account. If there are places where it doesn't
> > then it shouldn't be too hard to fix.
>
> It'd make the code a damned sight simpler and cleaner if you dropped
> all that stuff, and updated the structures when you hotplugged a CPU,
> which is really the only sensible way to do it anyway ...

No, actually, it wouldn't. Take it from someone who has actually
looked at the code with an eye to doing this.

Replacing static structures by dynamic ones for an architecture which
doesn't yet exist is NOT a good idea.

> For instance, if I remove cpu X, then bring back a new CPU on another node
> (or in another HT sibling pair) as CPU X, then you'll need to update all
> that stuff anyway. CPUs aren't fixed position in that map - the ordering
> handed out is arbitrary.

Sure, if they were stupid they'd do it this way.

If (when) an architecture has hotpluggable CPUs and NUMA
characteristics, they probably will have fixed CPU *slots*, and number
CPUs based on what slot they are in. Since the slots don't move, all
your fancy dynamic logic will be wasted.

When someone really has dynamic hotplug CPU capability with variable
attributes, *they* can code up the dynamic hierarchy. Because *they*
can actually test it!

> > I guess so, but you'd still need NR_CPUS to be >= that arbitrary
> > number.
>
> Yup ... but you don't have to enumerate all possible positions that way.
> See Linus' arguement re dynamic device numbers and ISCSI disks, etc.
> Same thing applies.

Crap. When all the fixed per-cpu arrays have been removed from the
kernel, come back and talk about instantiation and location of
arbitrary CPUS.

You're way overdesigning: have you been sharing food with the AIX guys?

Cheers!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-27 04:38:36

by Martin J. Bligh

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

> No, actually, it wouldn't. Take it from someone who has actually
> looked at the code with an eye to doing this.
>
> Replacing static structures by dynamic ones for an architecture which
> doesn't yet exist is NOT a good idea.

Trying to force a dynamic infrastructure into the static bitmap arrays
that we have is the bad idea, IMHO. Why on earth would you want offline
CPUs in the scheduler domains? Just to make your coding easier? Sorry,
but that just doesn't cut it for me.

> Sure, if they were stupid they'd do it this way.
>
> If (when) an architecture has hotpluggable CPUs and NUMA
> characteristics, they probably will have fixed CPU *slots*, and number
> CPUs based on what slot they are in. Since the slots don't move, all
> your fancy dynamic logic will be wasted.
>
> When someone really has dynamic hotplug CPU capability with variable
> attributes, *they* can code up the dynamic hierarchy. Because *they*
> can actually test it!

The cpu numbers are now dynamically allocated tags. I don't see why
we should sacrifice that just to get cpu hotplug. Sure, it makes your
coding a little harder, but ....

>> Yup ... but you don't have to enumerate all possible positions that way.
>> See Linus' arguement re dynamic device numbers and ISCSI disks, etc.
>> Same thing applies.
>
> Crap. When all the fixed per-cpu arrays have been removed from the
> kernel, come back and talk about instantiation and location of
> arbitrary CPUS.
>
> You're way overdesigning: have you been sharing food with the AIX guys?

A cheap shot. Please, I'd expect better flaming from you.

Sorry if this makes your coding harder, but it seems clear to me that
it's the right way to go. I guess the final decision is up to Andrew,
but I really don't want to see this kind of stuff. You don't start
kthreads for every possible cpu, do you?

M.

2004-01-27 05:40:31

by Nick Piggin

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

Martin J. Bligh wrote:

>>No, actually, it wouldn't. Take it from someone who has actually
>>looked at the code with an eye to doing this.
>>
>>Replacing static structures by dynamic ones for an architecture which
>>doesn't yet exist is NOT a good idea.
>>
>
>Trying to force a dynamic infrastructure into the static bitmap arrays
>that we have is the bad idea, IMHO. Why on earth would you want offline
>CPUs in the scheduler domains? Just to make your coding easier? Sorry,
>but that just doesn't cut it for me.
>
>
>>Sure, if they were stupid they'd do it this way.
>>
>>If (when) an architecture has hotpluggable CPUs and NUMA
>>characteristics, they probably will have fixed CPU *slots*, and number
>>CPUs based on what slot they are in. Since the slots don't move, all
>>your fancy dynamic logic will be wasted.
>>
>>When someone really has dynamic hotplug CPU capability with variable
>>attributes, *they* can code up the dynamic hierarchy. Because *they*
>>can actually test it!
>>
>
>The cpu numbers are now dynamically allocated tags. I don't see why
>we should sacrifice that just to get cpu hotplug. Sure, it makes your
>coding a little harder, but ....
>
>
>>>Yup ... but you don't have to enumerate all possible positions that way.
>>>See Linus' arguement re dynamic device numbers and ISCSI disks, etc.
>>>Same thing applies.
>>>
>>Crap. When all the fixed per-cpu arrays have been removed from the
>>kernel, come back and talk about instantiation and location of
>>arbitrary CPUS.
>>
>>You're way overdesigning: have you been sharing food with the AIX guys?
>>
>
>A cheap shot. Please, I'd expect better flaming from you.
>
>Sorry if this makes your coding harder, but it seems clear to me that
>it's the right way to go. I guess the final decision is up to Andrew,
>but I really don't want to see this kind of stuff. You don't start
>kthreads for every possible cpu, do you?
>
>

Well lets not worry too much about this for now. We could use
static arrays and cpu_possible for now until we get a feel
for what specific architectures want.

To be honest I haven't seen the hotplug CPU code and I don't
know about what architectures want to be doing with it, so
this is my preferred direction just out of ignorance.

An easy next step toward a dynamic scheme would be just to
re-init the entire sched domain topology (the generic init uses
the generic NUMA topology info which will have to be handled
by these architectures anyway). Modulo a small locking problem.

There aren't any fundamental design issues (with sched domains)
that I can see preventing a more dynamic system so we can keep
that in mind.

2004-01-27 07:19:56

by Martin J. Bligh

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

> Well lets not worry too much about this for now. We could use
> static arrays and cpu_possible for now until we get a feel
> for what specific architectures want.
>
> To be honest I haven't seen the hotplug CPU code and I don't
> know about what architectures want to be doing with it, so
> this is my preferred direction just out of ignorance.
>
> An easy next step toward a dynamic scheme would be just to
> re-init the entire sched domain topology (the generic init uses
> the generic NUMA topology info which will have to be handled
> by these architectures anyway). Modulo a small locking problem.
>
> There aren't any fundamental design issues (with sched domains)
> that I can see preventing a more dynamic system so we can keep
> that in mind.

Yeah, I talked it over with Rusty some on IRC. I have more of a feeling
why he's trying to do it that way now. However, one other thought occurs
to me ... it'd be good to use the same infrastructure (sched domains)
for the workload management stuff as well (where the domains would be
defined from userspace). That'd also necessitate them being dynamic,
if you think that'd work out as a usage model.

The cpu_possible stuff might work for a first cut at hotplug I guess.
I still think it's ugly though ;-)

M.

2004-01-27 15:27:19

by Martin J. Bligh

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

> Yeah, I talked it over with Rusty some on IRC. I have more of a feeling
> why he's trying to do it that way now.

BTW, Rusty - what are the locking rules for cpu_online_map under hotplug?
Is it RCU or something? The sched domains usage of it doesn't seem to take
any locks.

M.

2004-01-28 00:59:25

by Rusty Russell

[permalink] [raw]

Subject: Re: New NUMA scheduler and hotplug CPU

In message <368660000.1075217230@[10.10.2.4]> you write:
> > Yeah, I talked it over with Rusty some on IRC. I have more of a feeling
> > why he's trying to do it that way now.
>
> BTW, Rusty - what are the locking rules for cpu_online_map under hotplug?
> Is it RCU or something? The sched domains usage of it doesn't seem to take
> any locks.

The trivial usage is to take the cpucontrol sem (down_cpucontrol()).
There's a grace period between taking the cpu offline and actually
killing it too, so for most usages RCU is sufficient.

Fortunately, I've yet to hit a case where this isn't sufficient. For
the scheduler there's an explicit "move all tasks off the CPU" call
which takes the tasklist lock and walks the tasks.

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.