2008-06-29 16:49:49

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/2] S390 topology: don't use kthread() for arch_reinit_sched_domains()

Untested, needs Heiko's ack.

Now that it is safe to use get_online_cpus() we can revert

[S390] cpu topology: Fix possible deadlock.
commit: fd781fa25c9e9c6fd1599df060b05e7c4ad724e5

and call arch_reinit_sched_domains() directly from topology_work_fn().

Signed-off-by: Oleg Nesterov <[email protected]>

--- 26-rc2/arch/s390/kernel/topology.c~WQ_5_TOPOLOGY 2008-05-18 15:43:28.000000000 +0400
+++ 26-rc2/arch/s390/kernel/topology.c 2008-06-29 20:42:38.000000000 +0400
@@ -9,7 +9,6 @@
#include <linux/device.h>
#include <linux/bootmem.h>
#include <linux/sched.h>
-#include <linux/kthread.h>
#include <linux/workqueue.h>
#include <linux/cpu.h>
#include <linux/smp.h>
@@ -230,20 +229,9 @@ void arch_update_cpu_topology(void)
}
}

-static int topology_kthread(void *data)
-{
- arch_reinit_sched_domains();
- return 0;
-}
-
static void topology_work_fn(struct work_struct *work)
{
- /* We can't call arch_reinit_sched_domains() from a multi-threaded
- * workqueue context since it may deadlock in case of cpu hotplug.
- * So we have to create a kernel thread in order to call
- * arch_reinit_sched_domains().
- */
- kthread_run(topology_kthread, NULL, "topology_update");
+ arch_reinit_sched_domains();
}

void topology_schedule_update(void)


2008-06-30 13:45:49

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 2/2] S390 topology: don't use kthread() for arch_reinit_sched_domains()

On Sun, Jun 29, 2008 at 08:51:32PM +0400, Oleg Nesterov wrote:
> Untested, needs Heiko's ack.
>
> Now that it is safe to use get_online_cpus() we can revert
>
> [S390] cpu topology: Fix possible deadlock.
> commit: fd781fa25c9e9c6fd1599df060b05e7c4ad724e5
>
> and call arch_reinit_sched_domains() directly from topology_work_fn().
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 26-rc2/arch/s390/kernel/topology.c~WQ_5_TOPOLOGY 2008-05-18 15:43:28.000000000 +0400
> +++ 26-rc2/arch/s390/kernel/topology.c 2008-06-29 20:42:38.000000000 +0400
> @@ -9,7 +9,6 @@
> #include <linux/device.h>
> #include <linux/bootmem.h>
> #include <linux/sched.h>
> -#include <linux/kthread.h>
> #include <linux/workqueue.h>
> #include <linux/cpu.h>
> #include <linux/smp.h>
> @@ -230,20 +229,9 @@ void arch_update_cpu_topology(void)
> }
> }
>
> -static int topology_kthread(void *data)
> -{
> - arch_reinit_sched_domains();
> - return 0;
> -}
> -
> static void topology_work_fn(struct work_struct *work)
> {
> - /* We can't call arch_reinit_sched_domains() from a multi-threaded
> - * workqueue context since it may deadlock in case of cpu hotplug.
> - * So we have to create a kernel thread in order to call
> - * arch_reinit_sched_domains().
> - */
> - kthread_run(topology_kthread, NULL, "topology_update");
> + arch_reinit_sched_domains();

Thank you! Tested and still works ;)

Acked-by: Heiko Carstens <[email protected]>

2008-06-30 19:01:29

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH 2/2] S390 topology: don't use kthread() for arch_reinit_sched_domains()

Heiko Carstens wrote:
> On Sun, Jun 29, 2008 at 08:51:32PM +0400, Oleg Nesterov wrote:
>> static void topology_work_fn(struct work_struct *work)
>> {
>> - /* We can't call arch_reinit_sched_domains() from a multi-threaded
>> - * workqueue context since it may deadlock in case of cpu hotplug.
>> - * So we have to create a kernel thread in order to call
>> - * arch_reinit_sched_domains().
>> - */
>> - kthread_run(topology_kthread, NULL, "topology_update");
>> + arch_reinit_sched_domains();
>
> Thank you! Tested and still works ;)

btw Ideally we should not be calling arch_reinit_sched_domains() here
because it'll nuke all sched domains created by the cpuset. We should
instead notify cpusets subsystem and let it rebuild sched domains. In
order to do that we either have to make rebuild_sched_domains() callable
from any context or change arch_init_sched_domains() to no destroy
current domains. Paul M. sent some patches to address first option and
I'm working on the second option.

Max

2008-06-30 19:33:51

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 2/2] S390 topology: don't use kthread() for arch_reinit_sched_domains()

Max wrote:
> we either have to ... or change arch_init_sched_domains() to no destroy
> current domains.

I might be misreading this, but I doubt that just not destroying
current domains is an option. Once any CPU goes on or off line, the
only way back to the new correct sched domain configuration is via the
rebuild_sched_domains() routine in kernel/cpuset.c.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-06-30 19:49:37

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH 2/2] S390 topology: don't use kthread() for arch_reinit_sched_domains()

Paul Jackson wrote:
> Max wrote:
>> we either have to ... or change arch_init_sched_domains() to no destroy
>> current domains.
>
> I might be misreading this, but I doubt that just not destroying
> current domains is an option. Once any CPU goes on or off line, the
> only way back to the new correct sched domain configuration is via the
> rebuild_sched_domains() routine in kernel/cpuset.c.
>

Despite all the typos and missing words you read it correctly :). Here
is what I'm thinking.
When a CPU goes off line overall partitioning does not change we just
need to update current domains and remove the CPU that is no longer
available. When a CPU goes online it always ends up in the root cpuset,
which means it can be added to the first load-balanced sched domain.

In other words I'm thinking of simulating what rebuild_sched_domains()
would've done on hotplug events and calling partition_sched_domains()
directly from sched cpu hotplug code.
That way we can avoid cpuset/cgroup locking in that path.
Now, I haven't really looked into details. Maybe it's not feasible. In
which case Paul M.'s new locking scheme is the way to go.

Max

2008-06-30 20:28:53

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 2/2] S390 topology: don't use kthread() for arch_reinit_sched_domains()

Max wrote:
> When a CPU goes off line overall partitioning does not change we just
> need to update current domains and remove the CPU that is no longer
> available.

I don't believe that this is correct.

If one had say just the following three cpusets in a system:

/dev/cpuset # sched_load_balance == 0
/dev/cpuset/alpha # sched_load_balance == 1
/dev/cpuset/beta # sched_load_balance == 1

where the 'cpus' of alpha and beta overlapped by a single CPU,
and if one then took that single CPU offline, then the overall
partitioning of the system would change, from a single sched
domain covering the combined 'cpus' of alpha and beta, to two
separate sched domains, handling the 'cpus' of alpha and beta
separately.

> When a CPU goes online it always ends up in the root cpuset,
> which means it can be added to the first load-balanced sched domain.

Also not correct, but at least in this case, one might be able
to avoid doing a full fledged 'rebuild_sched_domains()' call,
by the following reasoning.

When bringing CPUs online, either the top cpuset has
sched_load_balance set (1) or off (0). If it is set, then one
has a single sched domain covering all online CPUs, and yes one
could just add the new CPU to that sched domain. If off, then
the newly online CPU would only be in the top cpuset, which does
not by itself put that CPU in any sched domain, and that new CPU
should not be added to -any- cpuset.

However, since we have to handle the offline case as well as the online
case, and since that case requires (to the best of my understanding)
calling rebuild_sched_domains(), I think it is best to just call that
routine in all cases.

An earlier version of this sched domain code always attempted to
incrementally adjust sched domains to online and offline events,
and that code ended up being a maintenance nightmare. I will be most
reluctant to attempt to go back to such calculations.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-06-30 20:47:37

by Max Krasnyansky

[permalink] [raw]
Subject: Re: [PATCH 2/2] S390 topology: don't use kthread() for arch_reinit_sched_domains()

Paul Jackson wrote:
> Max wrote:
>> When a CPU goes off line overall partitioning does not change we just
>> need to update current domains and remove the CPU that is no longer
>> available.
>
> I don't believe that this is correct.
>
> If one had say just the following three cpusets in a system:
>
> /dev/cpuset # sched_load_balance == 0
> /dev/cpuset/alpha # sched_load_balance == 1
> /dev/cpuset/beta # sched_load_balance == 1
>
> where the 'cpus' of alpha and beta overlapped by a single CPU,
> and if one then took that single CPU offline, then the overall
> partitioning of the system would change, from a single sched
> domain covering the combined 'cpus' of alpha and beta, to two
> separate sched domains, handling the 'cpus' of alpha and beta
> separately.
>
>> When a CPU goes online it always ends up in the root cpuset,
>> which means it can be added to the first load-balanced sched domain.
>
> Also not correct, but at least in this case, one might be able
> to avoid doing a full fledged 'rebuild_sched_domains()' call,
> by the following reasoning.
>
> When bringing CPUs online, either the top cpuset has
> sched_load_balance set (1) or off (0). If it is set, then one
> has a single sched domain covering all online CPUs, and yes one
> could just add the new CPU to that sched domain. If off, then
> the newly online CPU would only be in the top cpuset, which does
> not by itself put that CPU in any sched domain, and that new CPU
> should not be added to -any- cpuset.
>
> However, since we have to handle the offline case as well as the online
> case, and since that case requires (to the best of my understanding)
> calling rebuild_sched_domains(), I think it is best to just call that
> routine in all cases.
>
> An earlier version of this sched domain code always attempted to
> incrementally adjust sched domains to online and offline events,
> and that code ended up being a maintenance nightmare. I will be most
> reluctant to attempt to go back to such calculations.

Makes perfect sense. Thanx for the explanation. You've just saved me a
bunch of time :). I'll give up on that idea then and instead will go and
play with Paul M.'s latest patch.

btw We should update s390 to not call arch_init_sched_domains() directly
once we know that rebuild_sched_domains() is safe to call from most
contexts.

Max