2017-06-20 22:08:49

by Thiago Jung Bauermann

[permalink] [raw]
Subject: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

Calling arch_update_cpu_topology from a CPU hotplug state machine callback
hits a deadlock because the function tries to get a read lock on
cpu_hotplug_lock while the state machine still holds a write lock on it.

Since all callers of arch_update_cpu_topology except rtasd already hold
cpu_hotplug_lock, this patch changes the function to use
stop_machine_cpuslocked and creates a separate function for rtasd which
still tries to obtain the lock.

Michael Bringmann investigated the bug and provided a detailed analysis
of the deadlock on this previous RFC for an alternate solution:

https://patchwork.ozlabs.org/patch/771293/

Signed-off-by: Thiago Jung Bauermann <[email protected]>
---

Notes:
This patch applies on tip/smp/hotplug, it should probably be carried there.

arch/powerpc/include/asm/topology.h | 6 ++++++
arch/powerpc/kernel/rtasd.c | 2 +-
arch/powerpc/mm/numa.c | 22 +++++++++++++++++++---
3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 8b3b46b7b0f2..a2d36b7703ae 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -43,6 +43,7 @@ extern void __init dump_numa_cpu_topology(void);

extern int sysfs_add_device_to_node(struct device *dev, int nid);
extern void sysfs_remove_device_from_node(struct device *dev, int nid);
+extern int numa_update_cpu_topology(bool cpus_locked);

#else

@@ -57,6 +58,11 @@ static inline void sysfs_remove_device_from_node(struct device *dev,
int nid)
{
}
+
+static inline int numa_update_cpu_topology(bool cpus_locked)
+{
+ return 0;
+}
#endif /* CONFIG_NUMA */

#if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 3650732639ed..0f0b1b2f3b60 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -283,7 +283,7 @@ static void prrn_work_fn(struct work_struct *work)
* the RTAS event.
*/
pseries_devicetree_update(-prrn_update_scope);
- arch_update_cpu_topology();
+ numa_update_cpu_topology(false);
}

static DECLARE_WORK(prrn_work, prrn_work_fn);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 371792e4418f..b95c584ce19d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1311,8 +1311,10 @@ static int update_lookup_table(void *data)
/*
* Update the node maps and sysfs entries for each cpu whose home node
* has changed. Returns 1 when the topology has changed, and 0 otherwise.
+ *
+ * cpus_locked says whether we already hold cpu_hotplug_lock.
*/
-int arch_update_cpu_topology(void)
+int numa_update_cpu_topology(bool cpus_locked)
{
unsigned int cpu, sibling, changed = 0;
struct topology_update_data *updates, *ud;
@@ -1400,15 +1402,23 @@ int arch_update_cpu_topology(void)
if (!cpumask_weight(&updated_cpus))
goto out;

- stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
+ if (cpus_locked)
+ stop_machine_cpuslocked(update_cpu_topology, &updates[0],
+ &updated_cpus);
+ else
+ stop_machine(update_cpu_topology, &updates[0], &updated_cpus);

/*
* Update the numa-cpu lookup table with the new mappings, even for
* offline CPUs. It is best to perform this update from the stop-
* machine context.
*/
- stop_machine(update_lookup_table, &updates[0],
+ if (cpus_locked)
+ stop_machine_cpuslocked(update_lookup_table, &updates[0],
cpumask_of(raw_smp_processor_id()));
+ else
+ stop_machine(update_lookup_table, &updates[0],
+ cpumask_of(raw_smp_processor_id()));

for (ud = &updates[0]; ud; ud = ud->next) {
unregister_cpu_under_node(ud->cpu, ud->old_nid);
@@ -1426,6 +1436,12 @@ int arch_update_cpu_topology(void)
return changed;
}

+int arch_update_cpu_topology(void)
+{
+ lockdep_assert_cpus_held();
+ return numa_update_cpu_topology(true);
+}
+
static void topology_work_fn(struct work_struct *work)
{
rebuild_sched_domains();
--
2.7.4


2017-06-21 10:10:23

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

Thiago Jung Bauermann <[email protected]> writes:

> Calling arch_update_cpu_topology from a CPU hotplug state machine callback
> hits a deadlock because the function tries to get a read lock on
> cpu_hotplug_lock while the state machine still holds a write lock on it.
>
> Since all callers of arch_update_cpu_topology except rtasd already hold
> cpu_hotplug_lock, this patch changes the function to use
> stop_machine_cpuslocked and creates a separate function for rtasd which
> still tries to obtain the lock.
>
> Michael Bringmann investigated the bug and provided a detailed analysis
> of the deadlock on this previous RFC for an alternate solution:
>
> https://patchwork.ozlabs.org/patch/771293/

Do we know when this broke? Or has it never worked?

Should it go to stable? (can't in its current form AFAICS)

> Signed-off-by: Thiago Jung Bauermann <[email protected]>
> ---
>
> Notes:
> This patch applies on tip/smp/hotplug, it should probably be carried there.

stop_machine_cpuslocked() doesn't exist in mainline so I think it has to
be carried there right?

cheers

2017-06-22 01:14:57

by Thiago Jung Bauermann

[permalink] [raw]
Subject: Re: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd


Michael Ellerman <[email protected]> writes:
> Thiago Jung Bauermann <[email protected]> writes:
>
>> Calling arch_update_cpu_topology from a CPU hotplug state machine callback
>> hits a deadlock because the function tries to get a read lock on
>> cpu_hotplug_lock while the state machine still holds a write lock on it.
>>
>> Since all callers of arch_update_cpu_topology except rtasd already hold
>> cpu_hotplug_lock, this patch changes the function to use
>> stop_machine_cpuslocked and creates a separate function for rtasd which
>> still tries to obtain the lock.
>>
>> Michael Bringmann investigated the bug and provided a detailed analysis
>> of the deadlock on this previous RFC for an alternate solution:
>>
>> https://patchwork.ozlabs.org/patch/771293/
>
> Do we know when this broke? Or has it never worked?

It's been broken since at least v4.4, I think. I don't know about
earlier versions.

> Should it go to stable? (can't in its current form AFAICS)

It's not hard to backport both this patch and commit fe5595c07400
("stop_machine: Provide stop_machine_cpuslocked()") from branch
smp/hotplug in tip.git for stable.

Since rtasd only started calling arch_update_cpu_topology since v4.11,
for earlier versions this patch can be simplified to making that
function call stop_machine_cpuslocked unconditionally instead of
defining a separate function.

>> Signed-off-by: Thiago Jung Bauermann <[email protected]>
>> ---
>>
>> Notes:
>> This patch applies on tip/smp/hotplug, it should probably be carried there.
>
> stop_machine_cpuslocked() doesn't exist in mainline so I think it has to
> be carried there right?

Yes. I said "probably" because I don't know if you want to wait
until that branch is merged so that you can carry this patch in your
tree.

--
Thiago Jung Bauermann
IBM Linux Technology Center

2017-06-22 12:24:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

On Wed, 21 Jun 2017, Thiago Jung Bauermann wrote:
> Michael Ellerman <[email protected]> writes:
> >> Notes:
> >> This patch applies on tip/smp/hotplug, it should probably be carried there.
> >
> > stop_machine_cpuslocked() doesn't exist in mainline so I think it has to
> > be carried there right?
>
> Yes. I said "probably" because I don't know if you want to wait
> until that branch is merged so that you can carry this patch in your
> tree.

I'll pick it up for 4.12 and you folks can figure out the backporting once
that hits Linus tree.

Thanks,

tglx

2017-06-22 13:07:31

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

Thiago Jung Bauermann <[email protected]> writes:

> Michael Ellerman <[email protected]> writes:
>> Thiago Jung Bauermann <[email protected]> writes:
>>
>>> Calling arch_update_cpu_topology from a CPU hotplug state machine callback
>>> hits a deadlock because the function tries to get a read lock on
>>> cpu_hotplug_lock while the state machine still holds a write lock on it.
>>>
>>> Since all callers of arch_update_cpu_topology except rtasd already hold
>>> cpu_hotplug_lock, this patch changes the function to use
>>> stop_machine_cpuslocked and creates a separate function for rtasd which
>>> still tries to obtain the lock.
>>>
>>> Michael Bringmann investigated the bug and provided a detailed analysis
>>> of the deadlock on this previous RFC for an alternate solution:
>>>
>>> https://patchwork.ozlabs.org/patch/771293/
>>
>> Do we know when this broke? Or has it never worked?
>
> It's been broken since at least v4.4, I think. I don't know about
> earlier versions.

OK.

Just to be clear, this is happening on a 4.12-rcX system with no other
patches?


The code in arch_update_cpu_topology() has used stop_machine() since
30c05350c39d ("powerpc/pseries: Use stop machine to update cpu maps")
which went into v3.10, about 4 years ago.

Prior to that it used get/put_online_cpus(), since 9eff1a38407c
("powerpc/pseries: Poll VPA for topology changes and update NUMA maps"),
which was 2.6.38 in 2010.

I wouldn't rule out the possibility it's been broken for 7 years, but I
wonder if something else has changed to cause it to break.

We really need to work it out before we backport anything.

>> Should it go to stable? (can't in its current form AFAICS)
>
> It's not hard to backport both this patch and commit fe5595c07400
> ("stop_machine: Provide stop_machine_cpuslocked()") from branch
> smp/hotplug in tip.git for stable.

Yeah but it's not really my business backporting that unfortunately.

> Since rtasd only started calling arch_update_cpu_topology since v4.11,
> for earlier versions this patch can be simplified to making that
> function call stop_machine_cpuslocked unconditionally instead of
> defining a separate function.

OK.

cheers

2017-06-22 19:31:56

by Thiago Jung Bauermann

[permalink] [raw]
Subject: Re: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd


Michael Ellerman <[email protected]> writes:

> Thiago Jung Bauermann <[email protected]> writes:
>
>> Michael Ellerman <[email protected]> writes:
>>> Thiago Jung Bauermann <[email protected]> writes:
>>>
>>>> Calling arch_update_cpu_topology from a CPU hotplug state machine callback
>>>> hits a deadlock because the function tries to get a read lock on
>>>> cpu_hotplug_lock while the state machine still holds a write lock on it.
>>>>
>>>> Since all callers of arch_update_cpu_topology except rtasd already hold
>>>> cpu_hotplug_lock, this patch changes the function to use
>>>> stop_machine_cpuslocked and creates a separate function for rtasd which
>>>> still tries to obtain the lock.
>>>>
>>>> Michael Bringmann investigated the bug and provided a detailed analysis
>>>> of the deadlock on this previous RFC for an alternate solution:
>>>>
>>>> https://patchwork.ozlabs.org/patch/771293/
>>>
>>> Do we know when this broke? Or has it never worked?
>>
>> It's been broken since at least v4.4, I think. I don't know about
>> earlier versions.
>
> OK.
>
> Just to be clear, this is happening on a 4.12-rcX system with no other
> patches?
>
> The code in arch_update_cpu_topology() has used stop_machine() since
> 30c05350c39d ("powerpc/pseries: Use stop machine to update cpu maps")
> which went into v3.10, about 4 years ago.
>
> Prior to that it used get/put_online_cpus(), since 9eff1a38407c
> ("powerpc/pseries: Poll VPA for topology changes and update NUMA maps"),
> which was 2.6.38 in 2010.
>
> I wouldn't rule out the possibility it's been broken for 7 years, but I
> wonder if something else has changed to cause it to break.
>
> We really need to work it out before we backport anything.

Michael Bringmann provided this information:

We need at least one patch to show the issue in the latest 4.12
codebase:

[PATCH V6 2/2] powerpc/numa: Update CPU topology when VPHN enabled
https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/159341.html

Reason: Prior to this patch we were not exercising the PowerPC VPHN
hcall nor the associated path through the kernel/sched code that
encounters the problem. All CPUs, whether present at boot or
hot-added, are added to node 0 without this patch. This
representation of the topology is incorrect in many/most cases.

cpu_hotplug_begin() + get_online_cpus() have potentially been broken
since the implementation of multithreading for _cpu_up() and
_cpu_down().

Reason: 'cpu_hotplug.active_writer' check in get_online_cpus() is
dependent upon the nested routines that call get_online_cpus() to
execute in the same thread as the one that invokes
'cpu_hotplug_begin'.

PowerPC's version of arch_update_cpu_topology() has used
stop_machine() or get_online_cpus() for years, since at least 2011.

Practically speaking, until the recent patch, in the 4.12 codebase
PowerPC CPUs are being added only to nodes that were online at boot,
and the topology did not change enough to trigger the paths through
'stop_machine'.

I can't say for certain about the earlier code bases where
'arch_update_cpu_topology' used 'get_online_cpus'/'put_online_cpus'
directly.

>>> Should it go to stable? (can't in its current form AFAICS)
>>
>> It's not hard to backport both this patch and commit fe5595c07400
>> ("stop_machine: Provide stop_machine_cpuslocked()") from branch
>> smp/hotplug in tip.git for stable.
>
> Yeah but it's not really my business backporting that unfortunately.

Sorry, I wasn't clear. I was offering to provide backported patches for
the relevant stable branches.

Though that will only be necessary if we also backport the topology
fixes as well.

--
Thiago Jung Bauermann
IBM Linux Technology Center

2017-06-22 21:42:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

On Thu, 22 Jun 2017, Thiago Jung Bauermann wrote:
> Michael Bringmann provided this information:
> >> It's not hard to backport both this patch and commit fe5595c07400
> >> ("stop_machine: Provide stop_machine_cpuslocked()") from branch
> >> smp/hotplug in tip.git for stable.
> >
> > Yeah but it's not really my business backporting that unfortunately.
>
> Sorry, I wasn't clear. I was offering to provide backported patches for
> the relevant stable branches.
>
> Though that will only be necessary if we also backport the topology
> fixes as well.

So shall I pick up the fix and route it through tip smp/hotplug where the
cpu hotplug core changes reside on which that patch depends on?

Thanks,

tglx

2017-06-23 04:13:35

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

Thomas Gleixner <[email protected]> writes:

> On Thu, 22 Jun 2017, Thiago Jung Bauermann wrote:
>> Michael Bringmann provided this information:
>> >> It's not hard to backport both this patch and commit fe5595c07400
>> >> ("stop_machine: Provide stop_machine_cpuslocked()") from branch
>> >> smp/hotplug in tip.git for stable.
>> >
>> > Yeah but it's not really my business backporting that unfortunately.
>>
>> Sorry, I wasn't clear. I was offering to provide backported patches for
>> the relevant stable branches.
>>
>> Though that will only be necessary if we also backport the topology
>> fixes as well.
>
> So shall I pick up the fix and route it through tip smp/hotplug where the
> cpu hotplug core changes reside on which that patch depends on?

Yes please. Here's an ack if you like:

Acked-by: Michael Ellerman <[email protected]>

cheers

Subject: [tip:smp/hotplug] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

Commit-ID: 3e401f7a2e5199151f735aee6a5c6b4776e6a35e
Gitweb: http://git.kernel.org/tip/3e401f7a2e5199151f735aee6a5c6b4776e6a35e
Author: Thiago Jung Bauermann <[email protected]>
AuthorDate: Tue, 20 Jun 2017 19:08:30 -0300
Committer: Thomas Gleixner <[email protected]>
CommitDate: Fri, 23 Jun 2017 09:32:11 +0200

powerpc: Only obtain cpu_hotplug_lock if called by rtasd

Calling arch_update_cpu_topology from a CPU hotplug state machine callback
hits a deadlock because the function tries to get a read lock on
cpu_hotplug_lock while the state machine still holds a write lock on it.

Since all callers of arch_update_cpu_topology except rtasd already hold
cpu_hotplug_lock, this patch changes the function to use
stop_machine_cpuslocked and creates a separate function for rtasd which
still tries to obtain the lock.

Michael Bringmann investigated the bug and provided a detailed analysis
of the deadlock on this previous RFC for an alternate solution:

Signed-off-by: Thiago Jung Bauermann <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Cc: John Allen <[email protected]>
Cc: Michael Bringmann <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: https://patchwork.ozlabs.org/patch/771293/

---
arch/powerpc/include/asm/topology.h | 6 ++++++
arch/powerpc/kernel/rtasd.c | 2 +-
arch/powerpc/mm/numa.c | 22 +++++++++++++++++++---
3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 8b3b46b..a2d36b7 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -43,6 +43,7 @@ extern void __init dump_numa_cpu_topology(void);

extern int sysfs_add_device_to_node(struct device *dev, int nid);
extern void sysfs_remove_device_from_node(struct device *dev, int nid);
+extern int numa_update_cpu_topology(bool cpus_locked);

#else

@@ -57,6 +58,11 @@ static inline void sysfs_remove_device_from_node(struct device *dev,
int nid)
{
}
+
+static inline int numa_update_cpu_topology(bool cpus_locked)
+{
+ return 0;
+}
#endif /* CONFIG_NUMA */

#if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 3650732..0f0b1b2 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -283,7 +283,7 @@ static void prrn_work_fn(struct work_struct *work)
* the RTAS event.
*/
pseries_devicetree_update(-prrn_update_scope);
- arch_update_cpu_topology();
+ numa_update_cpu_topology(false);
}

static DECLARE_WORK(prrn_work, prrn_work_fn);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 371792e..b95c584 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1311,8 +1311,10 @@ static int update_lookup_table(void *data)
/*
* Update the node maps and sysfs entries for each cpu whose home node
* has changed. Returns 1 when the topology has changed, and 0 otherwise.
+ *
+ * cpus_locked says whether we already hold cpu_hotplug_lock.
*/
-int arch_update_cpu_topology(void)
+int numa_update_cpu_topology(bool cpus_locked)
{
unsigned int cpu, sibling, changed = 0;
struct topology_update_data *updates, *ud;
@@ -1400,15 +1402,23 @@ int arch_update_cpu_topology(void)
if (!cpumask_weight(&updated_cpus))
goto out;

- stop_machine(update_cpu_topology, &updates[0], &updated_cpus);
+ if (cpus_locked)
+ stop_machine_cpuslocked(update_cpu_topology, &updates[0],
+ &updated_cpus);
+ else
+ stop_machine(update_cpu_topology, &updates[0], &updated_cpus);

/*
* Update the numa-cpu lookup table with the new mappings, even for
* offline CPUs. It is best to perform this update from the stop-
* machine context.
*/
- stop_machine(update_lookup_table, &updates[0],
+ if (cpus_locked)
+ stop_machine_cpuslocked(update_lookup_table, &updates[0],
cpumask_of(raw_smp_processor_id()));
+ else
+ stop_machine(update_lookup_table, &updates[0],
+ cpumask_of(raw_smp_processor_id()));

for (ud = &updates[0]; ud; ud = ud->next) {
unregister_cpu_under_node(ud->cpu, ud->old_nid);
@@ -1426,6 +1436,12 @@ out:
return changed;
}

+int arch_update_cpu_topology(void)
+{
+ lockdep_assert_cpus_held();
+ return numa_update_cpu_topology(true);
+}
+
static void topology_work_fn(struct work_struct *work)
{
rebuild_sched_domains();