Recently I had some thoughts on how to realise CPU attachment and
detachment in a running Linux system (based on the 2.4 kernel).
CPU attachment and detachment would make sense on an S/390 when there
are several Linuxes running, each in its own logical partition. This
way a CPU could be taken from one partition and be given to another
partition (e.g. dependent on the current workload) on the fly without
the need to reboot anything.
Now the question is: how can this goal be achieved?
Attachment of a new CPU:
The idea is to synchronize all CPUs and then start the new CPU with a
sigp. To synchronize n CPUs one can create n kernel threads and give
them a high priority to make sure they will be executed soon (e.g. by
setting p->policy to SCHED_RR and p->rt_priority to a very high
value). As soon as all CPUs are in synchronized state (with
interrupts disabled) the new CPU can be started. But before this can
be done there are some other things left to do:
First of all a new cpu_idle task needs to be created for the new CPU.
Unfortunately there are several other parts in the kernel that need
to be updated when a new CPU will be attached to the running system.
For example the slabcache has a per-CPU cache for each of its caches.
This implies that with the arrival of a new CPU for each of these
caches a new per-CPU cache needs to be allocated. This is of course
only one issue in the common part of the kernel that needs to be
addressed.
Considering this maybe it would be a good idea that each part of the
kernel that has per-CPU data structures that need an update should
register a function which will be called before a new CPU will be
attached to the system.
Then the attachment of a new CPU should work the following way:
- synchronize all CPUs via kernel threads
- create a new idle task
- update all parts of the kernel that have per-CPU dependencies with
?the prior registered functions
- and finally start the new CPU (out of one of the kernel threads).
Detachment of a CPU:
Detaching a CPU should work nearly the same way:
- synchronize all CPUs via kernel threads
- stop the selected CPU (out of a kernel thread)
- update all parts of the kernel that have per-CPU dependencies with
?registered functions
- and finally remove the cpu_idle task of the released CPU (and the
?kernel thread which ran on the released CPU until the CPU was
?stopped).
Detaching a CPU is a bit more difficult than attaching a CPU because
one has to think for example of pending tasklets on the to be stopped
CPU. But these could simply be moved to another CPU.
The general question is: what do all of you think of the idea of an
interface where the parts of the kernel that have per-CPU
dependencies should register two functions (one for attaching a CPU
and one for detaching)?
Any comments on this would be appreciated..
Best regards,
Heiko
Heiko,
If I'm not mistaken, this sort of thing has been done by the beowulf folks.
Matthew D. Pitts
[email protected]
----- Original Message -----
From: <[email protected]>
To: <[email protected]>
Sent: Monday, December 11, 2000 9:03 AM
Subject: CPU attachent and detachment in a running Linux system
Recently I had some thoughts on how to realise CPU attachment and
detachment in a running Linux system (based on the 2.4 kernel).
CPU attachment and detachment would make sense on an S/390 when there
are several Linuxes running, each in its own logical partition. This
way a CPU could be taken from one partition and be given to another
partition (e.g. dependent on the current workload) on the fly without
the need to reboot anything.
Now the question is: how can this goal be achieved?
Attachment of a new CPU:
The idea is to synchronize all CPUs and then start the new CPU with a
sigp. To synchronize n CPUs one can create n kernel threads and give
them a high priority to make sure they will be executed soon (e.g. by
setting p->policy to SCHED_RR and p->rt_priority to a very high
value). As soon as all CPUs are in synchronized state (with
interrupts disabled) the new CPU can be started. But before this can
be done there are some other things left to do:
First of all a new cpu_idle task needs to be created for the new CPU.
Unfortunately there are several other parts in the kernel that need
to be updated when a new CPU will be attached to the running system.
For example the slabcache has a per-CPU cache for each of its caches.
This implies that with the arrival of a new CPU for each of these
caches a new per-CPU cache needs to be allocated. This is of course
only one issue in the common part of the kernel that needs to be
addressed.
Considering this maybe it would be a good idea that each part of the
kernel that has per-CPU data structures that need an update should
register a function which will be called before a new CPU will be
attached to the system.
Then the attachment of a new CPU should work the following way:
- synchronize all CPUs via kernel threads
- create a new idle task
- update all parts of the kernel that have per-CPU dependencies with
the prior registered functions
- and finally start the new CPU (out of one of the kernel threads).
Detachment of a CPU:
Detaching a CPU should work nearly the same way:
- synchronize all CPUs via kernel threads
- stop the selected CPU (out of a kernel thread)
- update all parts of the kernel that have per-CPU dependencies with
registered functions
- and finally remove the cpu_idle task of the released CPU (and the
kernel thread which ran on the released CPU until the CPU was
stopped).
Detaching a CPU is a bit more difficult than attaching a CPU because
one has to think for example of pending tasklets on the to be stopped
CPU. But these could simply be moved to another CPU.
The general question is: what do all of you think of the idea of an
interface where the parts of the kernel that have per-CPU
dependencies should register two functions (one for attaching a CPU
and one for detaching)?
Any comments on this would be appreciated..
Best regards,
Heiko
Heiko and Matthew -
I'm pretty certain this is not something beowulfish, unless perhaps
you are thinking in terms of mosix and some of the other batch/queueing
systems. Beowulf after all is a set of distributed processors,
not SMP (although an individual node maybe SMP).
regards,
Per Jessen, London
On Mon, 11 Dec 2000 13:11:11 -0500, Matthew D. Pitts wrote:
>Heiko,
>If I'm not mistaken, this sort of thing has been done by the beowulf folks.
>
>Matthew D. Pitts
>[email protected]
>
>----- Original Message -----
>From: <[email protected]>
>To: <[email protected]>
>Sent: Monday, December 11, 2000 9:03 AM
>Subject: CPU attachent and detachment in a running Linux system
[snip]
On Mon, 11 Dec 2000 15:03:47 [email protected] wrote:
>
> Recently I had some thoughts on how to realise CPU attachment and
> detachment in a running Linux system (based on the 2.4 kernel).
>
> CPU attachment and detachment would make sense on an S/390 when there
> are several Linuxes running, each in its own logical partition. This
> way a CPU could be taken from one partition and be given to another
> partition (e.g. dependent on the current workload) on the fly without
> the need to reboot anything.
>
Perhaps the PSet project can help you, take a look at
http://isunix.it.ilstu.edu/~thockin/pset/
I think it can be a good thing, now that linux has to manage with many
CPUs. But i think it is discontinued.
--
Juan Antonio Magallon Lacarta #> cd /pub
mailto:[email protected] #> more beer
Linux werewolf 2.2.18-vm #1 SMP Mon Dec 11 02:36:30 CET 2000 i686
>> sigp. To synchronize n CPUs one can create n kernel threads and give
>> them a high priority to make sure they will be executed soon (e.g. by
>> setting p->policy to SCHED_RR and p->rt_priority to a very high
>> value). As soon as all CPUs are in synchronized state (with
>> interrupts disabled) the new CPU can be started. But before this can
>> be done there are some other things left to do:
>
>You dont IMHO need to use such a large hammer. We already do similar
sequences
>for tlb invalidation on X86 for example. You can broadcast an
interprocessor
>interrupt and have the other processors set a flag each. You spin until
they
>are all captured, then when you clear the flag they all continue. You just
>need to watch two processors doing it at the same time 8)
Alan,
thanks for your input but I think it won't work this way because the value
of smp_num_cpus needs to be increased by one right before a new cpu gets
started. Then one can imagine the following situation at one of the cpus
that needs to be captured:
read the value of smp_num_cpus;
- interrupt that is intended to capture this cpu -
the value of smp_num_cpus will be increased and the new cpu will be started
by another cpu before this cpu continues with normal operation;
- end of interrupt handling -
do something that relies on the prior read value of smp_num_cpus (which is
now wrong);
The result would be an inconsistency. This problem should not occur if all
cpus would be captured by kernel threads.
I still wonder what you and other people think about the idea of an
interface where the parts of the kernel with per-cpu dependencies should
register two functions...
Best regards,
Heiko
By the way, I changed the subject of your original reply because I sent my
first mail twice (with and without a subject line).
I'm sorry for my own stupidity :)
> thanks for your input but I think it won't work this way because the value
> of smp_num_cpus needs to be increased by one right before a new cpu gets
> started. Then one can imagine the following situation at one of the cpus
> that needs to be captured:
This is fine providing the code is aware of the potential race. You capture
all the CPUs up the count by one, set the new cpu going and wait for it
to jump to being captured (even though not by an interrupt) ?
> I still wonder what you and other people think about the idea of an
> interface where the parts of the kernel with per-cpu dependencies should
> register two functions...
The other approach would be too make sure the per CPU structures allocated
which consume little memory have space already for the extra processors - eg
with the arrays of pointers etc
Hi!
> I still wonder what you and other people think about the idea of an
> interface where the parts of the kernel with per-cpu dependencies should
> register two functions...
Why not compile kernel with structeres big enough for 32 processors,
and then just add CPUs up to the limit without changing anything?
Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]
Hi,
>> I still wonder what you and other people think about the idea of an
>> interface where the parts of the kernel with per-cpu dependencies should
>> register two functions...
>Why not compile kernel with structeres big enough for 32 processors,
>and then just add CPUs up to the limit without changing anything?
That's a good point and it would probably work for attachment of cpus, but
it won't work for detachment because there are some data structures that
need to be updated if a cpu gets detached. For example it would be nice
to flush the per-cpu cache of the detached cpu in the slabcache. Then one
has to think of pending tasklets for the detached cpu which should be
moved to another cpu and then there are a lot of per-cpu data structures
in the networking part of the kernel.. most of them seem to be for
statistics only but I think these structures should be updated in any
case.
So at least for detaching it would make sense to register functions which
will be called whenever a cpu gets detached.
Heiko
On Mon, 18 Dec 2000 [email protected] wrote:
> Hi,
>
> >> I still wonder what you and other people think about the idea of an
> >> interface where the parts of the kernel with per-cpu dependencies should
> >> register two functions...
> >Why not compile kernel with structeres big enough for 32 processors,
> >and then just add CPUs up to the limit without changing anything?
>
> That's a good point and it would probably work for attachment of cpus, but
> it won't work for detachment because there are some data structures that
> need to be updated if a cpu gets detached. For example it would be nice
> to flush the per-cpu cache of the detached cpu in the slabcache. Then one
> has to think of pending tasklets for the detached cpu which should be
> moved to another cpu and then there are a lot of per-cpu data structures
> in the networking part of the kernel.. most of them seem to be for
> statistics only but I think these structures should be updated in any
> case.
> So at least for detaching it would make sense to register functions which
> will be called whenever a cpu gets detached.
Plus userspace CPU monitors will need to know when the CPU arrangement has
changed.
> That's a good point and it would probably work for attachment of cpus, but
> it won't work for detachment because there are some data structures that
> need to be updated if a cpu gets detached. For example it would be nice
> to flush the per-cpu cache of the detached cpu in the slabcache. Then one
> has to think of pending tasklets for the detached cpu which should be
> moved to another cpu and then there are a lot of per-cpu data structures
> in the networking part of the kernel.. most of them seem to be for
> statistics only but I think these structures should be updated in any
> case.
> So at least for detaching it would make sense to register functions which
> will be called whenever a cpu gets detached.
I remember someone from SGI had a patch to merge all the per cpu structures
together which would make this easier. It would also save bytes especially
on machines like the e10k where we must have NR_CPUS = 64.
Anton
Hi,
>> That's a good point and it would probably work for attachment of cpus,
but
>> it won't work for detachment because there are some data structures that
>> need to be updated if a cpu gets detached. For example it would be nice
>> [...]
>> So at least for detaching it would make sense to register functions
which
>> will be called whenever a cpu gets detached.
>I remember someone from SGI had a patch to merge all the per cpu
structures
>together which would make this easier. It would also save bytes especially
>on machines like the e10k where we must have NR_CPUS = 64.
Thanks for your comment, but I thought of an additional kernel parameter
max_dyn_cpus which would limit the maximum number of cpus that are allowed
to run. This way at least the waste of dynamically allocated memory which
depends on smp_num_cpus will be limited. This could be done by replacing
appropriate occurrences of smp_num_cpus with a macro MAX_DYN_CPUS which
could be defined the following way:
#ifdef CONFIG_DYN_CPU
extern volatile int smp_num_cpus; /* smp_num_cpus may change */
extern int max_dyn_cpus;
#define MAX_DYN_CPUS max_dyn_cpus
#else
extern int smp_num_cpus; /* smp_num_cpus won't change */
#define MAX_DYN_CPUS smp_num_cpus
#endif
Comming back to the question on how to realize an interface where per cpu
dependent parts of the kernel could register a function whenever a cpu gets
detached I think the following approach would work fine:
To register a function the following structure would be used:
typedef struct smp_dyncpu_func_s smp_dyncpu_func_t;
struct smp_dyncpu_func_s {
? ? ? ?void (*f)(int);
? ? ? ?smp_dyncpu_func_t *next;
};
The function which would be called when a function needs to be registered
would look like this:
smp_dyncpu_func_t *dyncpu_func; /* NULL */
...
void smp_register_dyncpu_func(smp_dyncpu_func_t *func)
{
? ? ? ?func->next = dyncpu_func;
? ? ? ?dyncpu_func = func;
? ? ? ?return;
}
And finally every part of the kernel that needs to register a function
which would be used to clean up per cpu data structures would have some
additional code added which would look like this:
static smp_dyncpu_func_t smp_cleanup_func;
...
void local_dyncpu_handler(int killed_cpu){...}
...
static int __init local_dyncpu_init(void)
{
? ? ? ?smp_cleanup_func.f = &local_dyncpu_handler;
? ? ? ?smp_register_dyncpu_func(&smp_cleanup_func);
? ? ? ?return 0;
}
...
__initcall(local_dyncpu_init);
Thinking of modules which may have also per cpu structures there could
be a second function which allows to unregister prior registered
functions.
So what do you think of this approach? I would appreciate any comments
on this.
Best regards,
Heiko