2023-06-08 12:32:45

by Junhao He

[permalink] [raw]
Subject: [PATCH] drivers/perf: hisi: Don't migrate perf to the CPU going to teardown

The driver needs to migrate the perf context if the current using CPU going
to teardown. By the time calling the cpuhp::teardown() callback the
cpu_online_mask() hasn't updated yet and still includes the CPU going to
teardown. In current driver's implementation we may migrate the context
to the teardown CPU and leads to the below calltrace:

...
[ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
[ 368.113699][ T932] Call trace:
[ 368.116834][ T932] __switch_to+0x7c/0xbc
[ 368.120924][ T932] __schedule+0x338/0x6f0
[ 368.125098][ T932] schedule+0x50/0xe0
[ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
[ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
[ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
[ 368.144573][ T932] mutex_lock+0x50/0x60
[ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
[ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
[ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
[ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
[ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
[ 368.175099][ T932] kthread+0x108/0x13c
[ 368.179012][ T932] ret_from_fork+0x10/0x18
...

Use function cpumask_any_but() to find one correct active cpu to fixes
this issue.

Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Signed-off-by: Junhao He <[email protected]>
---
drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
index 0bc8dc36aff5..14f8b4b03337 100644
--- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
+++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
@@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)

pcie_pmu->on_cpu = -1;
/* Choose a new CPU from all online cpus. */
- target = cpumask_first(cpu_online_mask);
+ target = cpumask_any_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids) {
pci_err(pcie_pmu->pdev, "There is no CPU to set\n");
return 0;
--
2.30.0



2023-06-08 12:36:27

by Yicong Yang

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: hisi: Don't migrate perf to the CPU going to teardown

On 2023/6/8 19:43, Junhao He wrote:
> The driver needs to migrate the perf context if the current using CPU going
> to teardown. By the time calling the cpuhp::teardown() callback the
> cpu_online_mask() hasn't updated yet and still includes the CPU going to
> teardown. In current driver's implementation we may migrate the context
> to the teardown CPU and leads to the below calltrace:
>
> ...
> [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
> [ 368.113699][ T932] Call trace:
> [ 368.116834][ T932] __switch_to+0x7c/0xbc
> [ 368.120924][ T932] __schedule+0x338/0x6f0
> [ 368.125098][ T932] schedule+0x50/0xe0
> [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
> [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
> [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
> [ 368.144573][ T932] mutex_lock+0x50/0x60
> [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
> [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
> [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
> [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
> [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
> [ 368.175099][ T932] kthread+0x108/0x13c
> [ 368.179012][ T932] ret_from_fork+0x10/0x18
> ...
>
> Use function cpumask_any_but() to find one correct active cpu to fixes
> this issue.
>
> Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
> Signed-off-by: Junhao He <[email protected]>
Reviewed-by: Yicong Yang <[email protected]>

> ---
> drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> index 0bc8dc36aff5..14f8b4b03337 100644
> --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> @@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
>
> pcie_pmu->on_cpu = -1;
> /* Choose a new CPU from all online cpus. */
> - target = cpumask_first(cpu_online_mask);
> + target = cpumask_any_but(cpu_online_mask, cpu);
> if (target >= nr_cpu_ids) {
> pci_err(pcie_pmu->pdev, "There is no CPU to set\n");
> return 0;
>

2023-06-08 12:46:57

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: hisi: Don't migrate perf to the CPU going to teardown

On Thu, Jun 08, 2023 at 07:43:26PM +0800, Junhao He wrote:
> The driver needs to migrate the perf context if the current using CPU going
> to teardown. By the time calling the cpuhp::teardown() callback the
> cpu_online_mask() hasn't updated yet and still includes the CPU going to
> teardown. In current driver's implementation we may migrate the context
> to the teardown CPU and leads to the below calltrace:
>
> ...
> [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
> [ 368.113699][ T932] Call trace:
> [ 368.116834][ T932] __switch_to+0x7c/0xbc
> [ 368.120924][ T932] __schedule+0x338/0x6f0
> [ 368.125098][ T932] schedule+0x50/0xe0
> [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
> [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
> [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
> [ 368.144573][ T932] mutex_lock+0x50/0x60
> [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
> [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
> [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
> [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
> [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
> [ 368.175099][ T932] kthread+0x108/0x13c
> [ 368.179012][ T932] ret_from_fork+0x10/0x18
> ...
>
> Use function cpumask_any_but() to find one correct active cpu to fixes
> this issue.
>
> Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
> Signed-off-by: Junhao He <[email protected]>

Acked-by: Mark Rutland <[email protected]>

I assume that Will can pick this up.

I did a quick check, and all other perf drivers seem to do the right thing
here, either using cpumask_any_but(), or generating a temporary mask with the
cpu being offlined removed.

Mark.

> ---
> drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> index 0bc8dc36aff5..14f8b4b03337 100644
> --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> @@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
>
> pcie_pmu->on_cpu = -1;
> /* Choose a new CPU from all online cpus. */
> - target = cpumask_first(cpu_online_mask);
> + target = cpumask_any_but(cpu_online_mask, cpu);
> if (target >= nr_cpu_ids) {
> pci_err(pcie_pmu->pdev, "There is no CPU to set\n");
> return 0;
> --
> 2.30.0
>

2023-06-08 13:37:53

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: hisi: Don't migrate perf to the CPU going to teardown

On Thu, 8 Jun 2023 19:43:26 +0800
Junhao He <[email protected]> wrote:

> The driver needs to migrate the perf context if the current using CPU going
> to teardown. By the time calling the cpuhp::teardown() callback the
> cpu_online_mask() hasn't updated yet and still includes the CPU going to
> teardown. In current driver's implementation we may migrate the context
> to the teardown CPU and leads to the below calltrace:
>
> ...
> [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
> [ 368.113699][ T932] Call trace:
> [ 368.116834][ T932] __switch_to+0x7c/0xbc
> [ 368.120924][ T932] __schedule+0x338/0x6f0
> [ 368.125098][ T932] schedule+0x50/0xe0
> [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
> [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
> [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
> [ 368.144573][ T932] mutex_lock+0x50/0x60
> [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
> [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
> [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
> [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
> [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
> [ 368.175099][ T932] kthread+0x108/0x13c
> [ 368.179012][ T932] ret_from_fork+0x10/0x18
> ...
>
> Use function cpumask_any_but() to find one correct active cpu to fixes
> this issue.
>
> Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
> Signed-off-by: Junhao He <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>

> ---
> drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> index 0bc8dc36aff5..14f8b4b03337 100644
> --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c
> +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c
> @@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
>
> pcie_pmu->on_cpu = -1;
> /* Choose a new CPU from all online cpus. */
> - target = cpumask_first(cpu_online_mask);
> + target = cpumask_any_but(cpu_online_mask, cpu);
> if (target >= nr_cpu_ids) {
> pci_err(pcie_pmu->pdev, "There is no CPU to set\n");
> return 0;


2023-06-09 11:28:54

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] drivers/perf: hisi: Don't migrate perf to the CPU going to teardown

On Thu, 8 Jun 2023 19:43:26 +0800, Junhao He wrote:
> The driver needs to migrate the perf context if the current using CPU going
> to teardown. By the time calling the cpuhp::teardown() callback the
> cpu_online_mask() hasn't updated yet and still includes the CPU going to
> teardown. In current driver's implementation we may migrate the context
> to the teardown CPU and leads to the below calltrace:
>
> ...
> [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008
> [ 368.113699][ T932] Call trace:
> [ 368.116834][ T932] __switch_to+0x7c/0xbc
> [ 368.120924][ T932] __schedule+0x338/0x6f0
> [ 368.125098][ T932] schedule+0x50/0xe0
> [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24
> [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc
> [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30
> [ 368.144573][ T932] mutex_lock+0x50/0x60
> [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0
> [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu]
> [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650
> [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190
> [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0
> [ 368.175099][ T932] kthread+0x108/0x13c
> [ 368.179012][ T932] ret_from_fork+0x10/0x18
> ...
>
> [...]

Applied to will (for-next/perf), thanks!

[1/1] drivers/perf: hisi: Don't migrate perf to the CPU going to teardown
https://git.kernel.org/will/c/7a6a9f1c5a0a

Cheers,
--
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev