2022-08-22 02:31:46

by Pingfan Liu

[permalink] [raw]
Subject: [RFC 00/10] arm64/riscv: Introduce fast kexec reboot

On a SMP arm64 machine, it may take a long time to kexec-reboot a new
kernel, where the time is linear to the number of the cpus. On a 80 cpus
machine, it takes about 15 seconds, while with this patch, the time
will dramaticly drop to one second.

*** Current situation 'slow kexec reboot' ***

At present, some architectures rely on smp_shutdown_nonboot_cpus() to
implement "kexec -e". Since smp_shutdown_nonboot_cpus() tears down the
cpus serially, it is very slow.

Take a close look, a cpu_down() processing on a single cpu can approximately be
divided into two stages:
-1. from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
-2. from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD
which is by stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
and runs on the teardown cpu.

If these processes can run in parallel, then, the reboot can be speeded
up. That is the aim of this patch.

*** Contrast to other implements ***

X86 and PowerPC have their own machine_shutdown(), which does not reply
on the cpu hot-removing mechanism. They just discriminate some critical
components and tear down in per cpu NMI handler during the kexec
reboot. But for some architectures, let's say arm64, it is not easy to define
these critical component due to various chipmakers' implements.

As a result, sticking to the cpu hot-removing mechanism is the simplest
way to re-implement the parallel.


*** Things worthy of consideration ***

1. The definition of a clean boundary between the first kernel and the new kernel
-1.1 firmware
The firmware's internal state should enter into a proper state, so
it can work for the new kernel. And this is achieved by the firmware's
cpuhp_step's teardown interface if any.

-1.2 CPU internal state
Whether the cache or PMU needs a clean shutdown before rebooting.

2. The dependency of each cpuhp_step
The boundary of a clean cut involves only few cpuhp_step, but they
may propagate to other cpuhp_step by dependency. This series does not
bother to judge the dependency, instead, just iterate downside each
cpuhp_step. And this strategy demands that each involved cpuhp_step's
teardown procedure supports parallelism.


*** Solution ***

Ideally, if the interface _cpu_down() can be enhanced to enable
parallelism, then the fast reboot can be achieved.

But revisiting the two parts of the current cpu_down() process, the
second part 'stop_machine_cpuslocked()' is a blockade. Packed inside the
_cpu_down(), stop_machine_cpuslocked() only allow one cpu to execute the
teardown.

So this patch breaks down the process of _cpu_down(), and divides the
teardown into three steps.
1. Send each AP from CPUHP_ONLINE to CPUHP_TEARDOWN_CPU
in parallel.
2. Sync on BP to wait all APs to enter CPUHP_TEARDOWN_CPU state
3. Send each AP from CPUHP_TEARDOWN_CPU to CPUHP_AP_IDLE_DEAD by the
interface of stop_machine_cpuslocked() in parallel.

Finally the exposed stop_machine_cpuslocked()can be used to support
parallelism.

Apparently, step 2 is introduced in order to satisfy the prerequisite on
which stop_machine_cpuslocked() can start on each cpu.

Then the rest issue is about how to support parallelism in step 1&3.
Fortunately, each subsystem has its own carefully designed lock
mechanism. In each cpuhp_step teardown interface, adapting to the
subsystem's lock rule will make things work.


*** No rollback if failure ***

During kexec reboot, the devices have already been shutdown, there is no
way for system to roll back to a workable state. So this series also
does not consider the rollback issue if a failure on cpu_down() happens,
it just adventures to move on.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Kuppuswamy Sathyanarayanan <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]

Pingfan Liu (10):
cpu/hotplug: Make __cpuhp_kick_ap() ready for async
cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on
CONFIG_SHUTDOWN_NONBOOT_CPUS
cpu/hotplug: Introduce fast kexec reboot
cpu/hotplug: Check the capability of kexec quick reboot
perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel
rcu/hotplug: Make rcutree_dead_cpu() parallel
lib/cpumask: Introduce cpumask_not_dying_but()
cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)
genirq/cpuhotplug: Ask migrate_one_irq() to migrate to a real online
cpu
arm64: smp: Make __cpu_disable() parallel

arch/Kconfig | 4 +
arch/arm/Kconfig | 1 +
arch/arm/mach-imx/mmdc.c | 2 +-
arch/arm/mm/cache-l2x0-pmu.c | 2 +-
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/smp.c | 31 +++-
arch/ia64/Kconfig | 1 +
arch/riscv/Kconfig | 1 +
drivers/dma/idxd/perfmon.c | 2 +-
drivers/fpga/dfl-fme-perf.c | 2 +-
drivers/gpu/drm/i915/i915_pmu.c | 2 +-
drivers/perf/arm-cci.c | 2 +-
drivers/perf/arm-ccn.c | 2 +-
drivers/perf/arm-cmn.c | 4 +-
drivers/perf/arm_dmc620_pmu.c | 2 +-
drivers/perf/arm_dsu_pmu.c | 16 +-
drivers/perf/arm_smmuv3_pmu.c | 2 +-
drivers/perf/fsl_imx8_ddr_perf.c | 2 +-
drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
drivers/perf/marvell_cn10k_tad_pmu.c | 2 +-
drivers/perf/qcom_l2_pmu.c | 2 +-
drivers/perf/qcom_l3_pmu.c | 2 +-
drivers/perf/xgene_pmu.c | 2 +-
drivers/soc/fsl/qbman/bman_portal.c | 2 +-
drivers/soc/fsl/qbman/qman_portal.c | 2 +-
include/linux/cpuhotplug.h | 2 +
include/linux/cpumask.h | 3 +
kernel/cpu.c | 213 ++++++++++++++++++++---
kernel/irq/cpuhotplug.c | 3 +-
kernel/rcu/tree.c | 3 +-
lib/cpumask.c | 18 ++
31 files changed, 281 insertions(+), 54 deletions(-)

--
2.31.1


2022-08-22 02:31:46

by Pingfan Liu

[permalink] [raw]
Subject: [RFC 01/10] cpu/hotplug: Make __cpuhp_kick_ap() ready for async

At present, during the kexec reboot, the teardown of cpus can not run in
parallel. As the first step towards the parallel, it demands the
initiator to kick ap thread one by one instead of waiting for each ap
thread completion.

Change the prototype of __cpuhp_kick_ap() to cope with this demand.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Price <[email protected]>
Cc: "Peter Zijlstra
Cc: Andi Kleen <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Mark Rutland <[email protected]>
To: [email protected]
---
kernel/cpu.c | 41 ++++++++++++++++++++++++++++++-----------
1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index bbad5e375d3b..338e1d426c7e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -526,7 +526,7 @@ cpuhp_reset_state(int cpu, struct cpuhp_cpu_state *st,
}

/* Regular hotplug invocation of the AP hotplug thread */
-static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
+static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st, bool sync)
{
if (!st->single && st->state == st->target)
return;
@@ -539,20 +539,22 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
smp_mb();
st->should_run = true;
wake_up_process(st->thread);
- wait_for_ap_thread(st, st->bringup);
+ if (sync)
+ wait_for_ap_thread(st, st->bringup);
}

static int cpuhp_kick_ap(int cpu, struct cpuhp_cpu_state *st,
- enum cpuhp_state target)
+ enum cpuhp_state target, bool sync)
{
enum cpuhp_state prev_state;
int ret;

prev_state = cpuhp_set_state(cpu, st, target);
- __cpuhp_kick_ap(st);
- if ((ret = st->result)) {
+ __cpuhp_kick_ap(st, sync);
+ ret = st->result;
+ if (sync && ret) {
cpuhp_reset_state(cpu, st, prev_state);
- __cpuhp_kick_ap(st);
+ __cpuhp_kick_ap(st, true);
}

return ret;
@@ -583,7 +585,7 @@ static int bringup_wait_for_ap(unsigned int cpu)
if (st->target <= CPUHP_AP_ONLINE_IDLE)
return 0;

- return cpuhp_kick_ap(cpu, st, st->target);
+ return cpuhp_kick_ap(cpu, st, st->target, true);
}

static int bringup_cpu(unsigned int cpu)
@@ -835,7 +837,7 @@ cpuhp_invoke_ap_callback(int cpu, enum cpuhp_state state, bool bringup,
st->cb_state = state;
st->single = true;

- __cpuhp_kick_ap(st);
+ __cpuhp_kick_ap(st, true);

/*
* If we failed and did a partial, do a rollback.
@@ -844,7 +846,7 @@ cpuhp_invoke_ap_callback(int cpu, enum cpuhp_state state, bool bringup,
st->rollback = true;
st->bringup = !bringup;

- __cpuhp_kick_ap(st);
+ __cpuhp_kick_ap(st, true);
}

/*
@@ -868,12 +870,29 @@ static int cpuhp_kick_ap_work(unsigned int cpu)
cpuhp_lock_release(true);

trace_cpuhp_enter(cpu, st->target, prev_state, cpuhp_kick_ap_work);
- ret = cpuhp_kick_ap(cpu, st, st->target);
+ ret = cpuhp_kick_ap(cpu, st, st->target, true);
trace_cpuhp_exit(cpu, st->state, prev_state, ret);

return ret;
}

+/* In the async case, trace is meaningless since ret value is not available */
+static int cpuhp_kick_ap_work_async(unsigned int cpu)
+{
+ struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+ int ret;
+
+ cpuhp_lock_acquire(false);
+ cpuhp_lock_release(false);
+
+ cpuhp_lock_acquire(true);
+ cpuhp_lock_release(true);
+
+ ret = cpuhp_kick_ap(cpu, st, st->target, false);
+
+ return ret;
+}
+
static struct smp_hotplug_thread cpuhp_threads = {
.store = &cpuhp_state.thread,
.thread_should_run = cpuhp_should_run,
@@ -1171,7 +1190,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
if (ret && st->state < prev_state) {
if (st->state == CPUHP_TEARDOWN_CPU) {
cpuhp_reset_state(cpu, st, prev_state);
- __cpuhp_kick_ap(st);
+ __cpuhp_kick_ap(st, true);
} else {
WARN(1, "DEAD callback error for CPU%d", cpu);
}
--
2.31.1

2022-08-22 02:31:59

by Pingfan Liu

[permalink] [raw]
Subject: [RFC 10/10] arm64: smp: Make __cpu_disable() parallel

On a dying cpu, take_cpu_down()->__cpu_disable(), which means if the
teardown path supports parallel, __cpu_disable() confront the parallel,
which may ruin cpu_online_mask etc if no extra lock provides the
protection.

At present, the cpumask is protected by cpu_add_remove_lock, that lock
is quite above __cpu_disable(). In order to protect __cpu_disable() from
parrallel in kexec quick reboot path, introducing a local lock
cpumap_lock.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Phil Auld <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Ben Dooks <[email protected]>
To: [email protected]
To: [email protected]
---
arch/arm64/kernel/smp.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffc5d76cf695..fee8879048b0 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -287,6 +287,28 @@ static int op_cpu_disable(unsigned int cpu)
return 0;
}

+static DEFINE_SPINLOCK(cpumap_lock);
+
+static void __cpu_clear_maps(unsigned int cpu)
+{
+ /*
+ * In the case of kexec rebooting, the cpu_add_remove_lock mutex can not protect
+ */
+ if (kexec_in_progress)
+ spin_lock(&cpumap_lock);
+ remove_cpu_topology(cpu);
+ numa_remove_cpu(cpu);
+
+ /*
+ * Take this CPU offline. Once we clear this, we can't return,
+ * and we must not schedule until we're ready to give up the cpu.
+ */
+ set_cpu_online(cpu, false);
+ if (kexec_in_progress)
+ spin_unlock(&cpumap_lock);
+
+}
+
/*
* __cpu_disable runs on the processor to be shutdown.
*/
@@ -299,14 +321,7 @@ int __cpu_disable(void)
if (ret)
return ret;

- remove_cpu_topology(cpu);
- numa_remove_cpu(cpu);
-
- /*
- * Take this CPU offline. Once we clear this, we can't return,
- * and we must not schedule until we're ready to give up the cpu.
- */
- set_cpu_online(cpu, false);
+ __cpu_clear_maps(cpu);
ipi_teardown(cpu);

/*
--
2.31.1

2022-08-22 02:32:23

by Pingfan Liu

[permalink] [raw]
Subject: [RFC 02/10] cpu/hotplug: Compile smp_shutdown_nonboot_cpus() conditioned on CONFIG_SHUTDOWN_NONBOOT_CPUS

Only arm/arm64/ia64/riscv share the smp_shutdown_nonboot_cpus(). So
compiling this code conditioned on the macro
CONFIG_SHUTDOWN_NONBOOT_CPUS. Later this macro will brace the quick
kexec reboot code.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Dan Li <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Tony Lindgren <[email protected]>
Cc: Nick Hawkins <[email protected]>
Cc: John Crispin <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Price <[email protected]>
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
---
arch/Kconfig | 4 ++++
arch/arm/Kconfig | 1 +
arch/arm64/Kconfig | 1 +
arch/ia64/Kconfig | 1 +
arch/riscv/Kconfig | 1 +
kernel/cpu.c | 3 +++
6 files changed, 11 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index f330410da63a..be447537d0f6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,10 @@ menu "General architecture-dependent options"
config CRASH_CORE
bool

+config SHUTDOWN_NONBOOT_CPUS
+ select KEXEC_CORE
+ bool
+
config KEXEC_CORE
select CRASH_CORE
bool
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 87badeae3181..711cfdb4f9f4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -129,6 +129,7 @@ config ARM
select PCI_SYSCALL if PCI
select PERF_USE_VMALLOC
select RTC_LIB
+ select SHUTDOWN_NONBOOT_CPUS
select SYS_SUPPORTS_APM_EMULATION
select THREAD_INFO_IN_TASK
select HAVE_ARCH_VMAP_STACK if MMU && ARM_HAS_GROUP_RELOCS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 571cc234d0b3..8c481a0b1829 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -223,6 +223,7 @@ config ARM64
select PCI_SYSCALL if PCI
select POWER_RESET
select POWER_SUPPLY
+ select SHUTDOWN_NONBOOT_CPUS
select SPARSE_IRQ
select SWIOTLB
select SYSCTL_EXCEPTION_TRACE
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 26ac8ea15a9e..8a3ddea97d1b 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -52,6 +52,7 @@ config IA64
select ARCH_CLOCKSOURCE_DATA
select GENERIC_TIME_VSYSCALL
select LEGACY_TIMER_TICK
+ select SHUTDOWN_NONBOOT_CPUS
select SWIOTLB
select SYSCTL_ARCH_UNALIGN_NO_WARN
select HAVE_MOD_ARCH_SPECIFIC
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index ed66c31e4655..02606a48c5ea 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -120,6 +120,7 @@ config RISCV
select PCI_MSI if PCI
select RISCV_INTC
select RISCV_TIMER if RISCV_SBI
+ select SHUTDOWN_NONBOOT_CPUS
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 338e1d426c7e..2be6ba811a01 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1258,6 +1258,8 @@ int remove_cpu(unsigned int cpu)
}
EXPORT_SYMBOL_GPL(remove_cpu);

+#ifdef CONFIG_SHUTDOWN_NONBOOT_CPUS
+
void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
{
unsigned int cpu;
@@ -1299,6 +1301,7 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)

cpu_maps_update_done();
}
+#endif

#else
#define takedown_cpu NULL
--
2.31.1

2022-08-22 02:32:52

by Pingfan Liu

[permalink] [raw]
Subject: [RFC 05/10] perf/arm-dsu: Make dsu_pmu_cpu_teardown() parallel

In the case of kexec quick reboot, dsu_pmu_cpu_teardown() confronts
parallel and lock are needed to protect the contest on a dsu_pmu.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
To: [email protected]
To: [email protected]
---
drivers/perf/arm_dsu_pmu.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index a36698a90d2f..aa9f4393ff0c 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -833,16 +833,23 @@ static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
struct dsu_pmu *dsu_pmu = hlist_entry_safe(node, struct dsu_pmu,
cpuhp_node);

- if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
+ raw_spin_lock(&dsu_pmu->pmu_lock);
+ if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu)) {
+ raw_spin_unlock(&dsu_pmu->pmu_lock);
return 0;
+ }

dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
/* If there are no active CPUs in the DSU, leave IRQ disabled */
- if (dst >= nr_cpu_ids)
+ if (dst >= nr_cpu_ids) {
+ raw_spin_unlock(&dsu_pmu->pmu_lock);
return 0;
+ }

- perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
+ /* dst should not be in dying mask. So after setting, blocking parallel */
dsu_pmu_set_active_cpu(dst, dsu_pmu);
+ raw_spin_unlock(&dsu_pmu->pmu_lock);
+ perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);

return 0;
}
@@ -858,6 +865,7 @@ static int __init dsu_pmu_init(void)
if (ret < 0)
return ret;
dsu_pmu_cpuhp_state = ret;
+ cpuhp_set_step_parallel(ret);
return platform_driver_register(&dsu_pmu_driver);
}

--
2.31.1

2022-08-22 02:33:04

by Pingfan Liu

[permalink] [raw]
Subject: [RFC 07/10] lib/cpumask: Introduce cpumask_not_dying_but()

During cpu hot-removing, the dying cpus are still in cpu_online_mask.
On the other hand, A subsystem will migrate its broker from the dying
cpu to a online cpu in its teardown cpuhp_step.

After enabling the teardown of cpus in parallel, cpu_online_mask can not
tell those dying from the real online.

Introducing a function cpumask_not_dying_but() to pick a real online
cpu.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Yury Norov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Kuppuswamy Sathyanarayanan <[email protected]>
To: [email protected]
---
include/linux/cpumask.h | 3 +++
kernel/cpu.c | 3 +++
lib/cpumask.c | 18 ++++++++++++++++++
3 files changed, 24 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 0d435d0edbcb..d2033a239a07 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -317,6 +317,9 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
return i;
}

+/* for parallel kexec reboot */
+int cpumask_not_dying_but(const struct cpumask *mask, unsigned int cpu);
+
#define CPU_BITS_NONE \
{ \
[0 ... BITS_TO_LONGS(NR_CPUS)-1] = 0UL \
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 90debbe28e85..771e344f8ff9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1282,6 +1282,9 @@ static void cpus_down_no_rollback(struct cpumask *cpus)
struct cpuhp_cpu_state *st;
unsigned int cpu;

+ for_each_cpu(cpu, cpus)
+ set_cpu_dying(cpu, true);
+
/* launch ap work one by one, but not wait for completion */
for_each_cpu(cpu, cpus) {
st = per_cpu_ptr(&cpuhp_state, cpu);
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 8baeb37e23d3..6474f07ed87a 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -7,6 +7,24 @@
#include <linux/memblock.h>
#include <linux/numa.h>

+/* Used in parallel kexec-reboot cpuhp callbacks */
+int cpumask_not_dying_but(const struct cpumask *mask,
+ unsigned int cpu)
+{
+ unsigned int i;
+
+ if (CONFIG_SHUTDOWN_NONBOOT_CPUS) {
+ cpumask_check(cpu);
+ for_each_cpu(i, mask)
+ if (i != cpu && !cpumask_test_cpu(i, cpu_dying_mask))
+ break;
+ return i;
+ } else {
+ return cpumask_any_but(mask, cpu);
+ }
+}
+EXPORT_SYMBOL(cpumask_not_dying_but);
+
/**
* cpumask_next_wrap - helper to implement for_each_cpu_wrap
* @n: the cpu prior to the place to search
--
2.31.1

2022-08-22 02:50:04

by Pingfan Liu

[permalink] [raw]
Subject: [RFC 08/10] cpuhp: Replace cpumask_any_but(cpu_online_mask, cpu)

In a kexec quick reboot path, the dying cpus are still on
cpu_online_mask. During the teardown of cpu, a subsystem needs to
migrate its broker to a real online cpu.

This patch replaces cpumask_any_but(cpu_online_mask, cpu) in a teardown
procedure with cpumask_not_dying_but(cpu_online_mask, cpu).

Signed-off-by: Pingfan Liu <[email protected]>
Cc: Russell King <[email protected]>
Cc: Shawn Guo <[email protected]>
Cc: Sascha Hauer <[email protected]>
Cc: Pengutronix Kernel Team <[email protected]>
Cc: Fabio Estevam <[email protected]>
Cc: NXP Linux Team <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: Wu Hao <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: Moritz Fischer <[email protected]>
Cc: Xu Yilun <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Frank Li <[email protected]>
Cc: Shaokun Zhang <[email protected]>
Cc: Qi Liu <[email protected]>
Cc: Andy Gross <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Konrad Dybcio <[email protected]>
Cc: Khuong Dinh <[email protected]>
Cc: Li Yang <[email protected]>
Cc: Yury Norov <[email protected]>
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
To: [email protected]
---
arch/arm/mach-imx/mmdc.c | 2 +-
arch/arm/mm/cache-l2x0-pmu.c | 2 +-
drivers/dma/idxd/perfmon.c | 2 +-
drivers/fpga/dfl-fme-perf.c | 2 +-
drivers/gpu/drm/i915/i915_pmu.c | 2 +-
drivers/perf/arm-cci.c | 2 +-
drivers/perf/arm-ccn.c | 2 +-
drivers/perf/arm-cmn.c | 4 ++--
drivers/perf/arm_dmc620_pmu.c | 2 +-
drivers/perf/arm_dsu_pmu.c | 2 +-
drivers/perf/arm_smmuv3_pmu.c | 2 +-
drivers/perf/fsl_imx8_ddr_perf.c | 2 +-
drivers/perf/hisilicon/hisi_uncore_pmu.c | 2 +-
drivers/perf/marvell_cn10k_tad_pmu.c | 2 +-
drivers/perf/qcom_l2_pmu.c | 2 +-
drivers/perf/qcom_l3_pmu.c | 2 +-
drivers/perf/xgene_pmu.c | 2 +-
drivers/soc/fsl/qbman/bman_portal.c | 2 +-
drivers/soc/fsl/qbman/qman_portal.c | 2 +-
19 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index af12668d0bf5..a109a7ea8613 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -220,7 +220,7 @@ static int mmdc_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
if (!cpumask_test_and_clear_cpu(cpu, &pmu_mmdc->cpu))
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index 993fefdc167a..1b0037ef7fa5 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -428,7 +428,7 @@ static int l2x0_pmu_offline_cpu(unsigned int cpu)
if (!cpumask_test_and_clear_cpu(cpu, &pmu_cpu))
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index d73004f47cf4..f3f1ccb55f73 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -528,7 +528,7 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);

/* migrate events if there is a valid target */
if (target < nr_cpu_ids)
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 587c82be12f7..57804f28357e 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -948,7 +948,7 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
if (cpu != priv->cpu)
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 958b37123bf1..f866f9223492 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -1068,7 +1068,7 @@ static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
return 0;

if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
- target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
+ target = cpumask_not_dying_but(topology_sibling_cpumask(cpu), cpu);

/* Migrate events if there is a valid target */
if (target < nr_cpu_ids) {
diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 03b1309875ae..481da937fb9d 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1447,7 +1447,7 @@ static int cci_pmu_offline_cpu(unsigned int cpu)
if (!g_cci_pmu || cpu != g_cci_pmu->cpu)
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 728d13d8e98a..573d6906ec9b 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -1205,7 +1205,7 @@ static int arm_ccn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)

if (cpu != dt->cpu)
return 0;
- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;
perf_pmu_migrate_context(&dt->pmu, cpu, target);
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 80d8309652a4..1847182a1ed3 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1787,9 +1787,9 @@ static int arm_cmn_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_no
node = dev_to_node(cmn->dev);
if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
- target = cpumask_any(&mask);
+ target = cpumask_not_dying_but(&mask, cpu);
else
- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target < nr_cpu_ids)
arm_cmn_migrate(cmn, target);
return 0;
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 280a6ae3e27c..3a0a2bb92e12 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -611,7 +611,7 @@ static int dmc620_pmu_cpu_teardown(unsigned int cpu,
if (cpu != irq->cpu)
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index aa9f4393ff0c..e19ce0406b02 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -236,7 +236,7 @@ static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)

cpumask_and(&online_supported,
&dsu_pmu->associated_cpus, cpu_online_mask);
- return cpumask_any_but(&online_supported, cpu);
+ return cpumask_not_dying_but(&online_supported, cpu);
}

static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c
index 00d4c45a8017..827315d31056 100644
--- a/drivers/perf/arm_smmuv3_pmu.c
+++ b/drivers/perf/arm_smmuv3_pmu.c
@@ -640,7 +640,7 @@ static int smmu_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
if (cpu != smmu_pmu->on_cpu)
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 8e058e08fe81..4e0276fc1548 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -664,7 +664,7 @@ static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
if (cpu != pmu->cpu)
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c b/drivers/perf/hisilicon/hisi_uncore_pmu.c
index fbc8a93d5eac..8c39da8f4b3c 100644
--- a/drivers/perf/hisilicon/hisi_uncore_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
@@ -518,7 +518,7 @@ int hisi_uncore_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
/* Choose a new CPU to migrate ownership of the PMU to */
cpumask_and(&pmu_online_cpus, &hisi_pmu->associated_cpus,
cpu_online_mask);
- target = cpumask_any_but(&pmu_online_cpus, cpu);
+ target = cpumask_not_dying_but(&pmu_online_cpus, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/perf/marvell_cn10k_tad_pmu.c b/drivers/perf/marvell_cn10k_tad_pmu.c
index 69c3050a4348..268e3288893d 100644
--- a/drivers/perf/marvell_cn10k_tad_pmu.c
+++ b/drivers/perf/marvell_cn10k_tad_pmu.c
@@ -387,7 +387,7 @@ static int tad_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
if (cpu != pmu->cpu)
return 0;

- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 30234c261b05..8823d0bb6476 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -822,7 +822,7 @@ static int l2cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)
/* Any other CPU for this cluster which is still online */
cpumask_and(&cluster_online_cpus, &cluster->cluster_cpus,
cpu_online_mask);
- target = cpumask_any_but(&cluster_online_cpus, cpu);
+ target = cpumask_not_dying_but(&cluster_online_cpus, cpu);
if (target >= nr_cpu_ids) {
disable_irq(cluster->irq);
return 0;
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 1ff2ff6582bf..ba26b2fa0736 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -718,7 +718,7 @@ static int qcom_l3_cache_pmu_offline_cpu(unsigned int cpu, struct hlist_node *no

if (!cpumask_test_and_clear_cpu(cpu, &l3pmu->cpumask))
return 0;
- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;
perf_pmu_migrate_context(&l3pmu->pmu, cpu, target);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0c32dffc7ede..069eb0a0d3ba 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1804,7 +1804,7 @@ static int xgene_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node)

if (!cpumask_test_and_clear_cpu(cpu, &xgene_pmu->cpu))
return 0;
- target = cpumask_any_but(cpu_online_mask, cpu);
+ target = cpumask_not_dying_but(cpu_online_mask, cpu);
if (target >= nr_cpu_ids)
return 0;

diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c
index 4d7b9caee1c4..8ebcf87e7d06 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -67,7 +67,7 @@ static int bman_offline_cpu(unsigned int cpu)
return 0;

/* use any other online CPU */
- cpu = cpumask_any_but(cpu_online_mask, cpu);
+ cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
irq_set_affinity(pcfg->irq, cpumask_of(cpu));
return 0;
}
diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c
index e23b60618c1a..3807a8285ced 100644
--- a/drivers/soc/fsl/qbman/qman_portal.c
+++ b/drivers/soc/fsl/qbman/qman_portal.c
@@ -148,7 +148,7 @@ static int qman_offline_cpu(unsigned int cpu)
pcfg = qman_get_qm_portal_config(p);
if (pcfg) {
/* select any other online CPU */
- cpu = cpumask_any_but(cpu_online_mask, cpu);
+ cpu = cpumask_not_dying_but(cpu_online_mask, cpu);
irq_set_affinity(pcfg->irq, cpumask_of(cpu));
qman_portal_update_sdest(pcfg, cpu);
}
--
2.31.1

2022-08-22 14:31:17

by Yury Norov

[permalink] [raw]
Subject: Re: [RFC 07/10] lib/cpumask: Introduce cpumask_not_dying_but()

On Mon, Aug 22, 2022 at 10:15:17AM +0800, Pingfan Liu wrote:
> During cpu hot-removing, the dying cpus are still in cpu_online_mask.
> On the other hand, A subsystem will migrate its broker from the dying
> cpu to a online cpu in its teardown cpuhp_step.
>
> After enabling the teardown of cpus in parallel, cpu_online_mask can not
> tell those dying from the real online.
>
> Introducing a function cpumask_not_dying_but() to pick a real online
> cpu.
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Yury Norov <[email protected]>
> Cc: Andy Shevchenko <[email protected]>
> Cc: Rasmus Villemoes <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Steven Price <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: "Jason A. Donenfeld" <[email protected]>
> Cc: Kuppuswamy Sathyanarayanan <[email protected]>
> To: [email protected]
> ---
> include/linux/cpumask.h | 3 +++
> kernel/cpu.c | 3 +++
> lib/cpumask.c | 18 ++++++++++++++++++
> 3 files changed, 24 insertions(+)
>
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index 0d435d0edbcb..d2033a239a07 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -317,6 +317,9 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> return i;
> }
>
> +/* for parallel kexec reboot */
> +int cpumask_not_dying_but(const struct cpumask *mask, unsigned int cpu);
> +
> #define CPU_BITS_NONE \
> { \
> [0 ... BITS_TO_LONGS(NR_CPUS)-1] = 0UL \
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 90debbe28e85..771e344f8ff9 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1282,6 +1282,9 @@ static void cpus_down_no_rollback(struct cpumask *cpus)
> struct cpuhp_cpu_state *st;
> unsigned int cpu;
>
> + for_each_cpu(cpu, cpus)
> + set_cpu_dying(cpu, true);
> +
> /* launch ap work one by one, but not wait for completion */
> for_each_cpu(cpu, cpus) {
> st = per_cpu_ptr(&cpuhp_state, cpu);
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8baeb37e23d3..6474f07ed87a 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -7,6 +7,24 @@
> #include <linux/memblock.h>
> #include <linux/numa.h>
>
> +/* Used in parallel kexec-reboot cpuhp callbacks */
> +int cpumask_not_dying_but(const struct cpumask *mask,
> + unsigned int cpu)
> +{
> + unsigned int i;
> +
> + if (CONFIG_SHUTDOWN_NONBOOT_CPUS) {

Hmm... Would it even work? Anyways, the documentation says:
Within code, where possible, use the IS_ENABLED macro to convert a Kconfig
symbol into a C boolean expression, and use it in a normal C conditional:

.. code-block:: c

if (IS_ENABLED(CONFIG_SOMETHING)) {
...
}


> + cpumask_check(cpu);
> + for_each_cpu(i, mask)
> + if (i != cpu && !cpumask_test_cpu(i, cpu_dying_mask))
> + break;
> + return i;
> + } else {
> + return cpumask_any_but(mask, cpu);
> + }
> +}
> +EXPORT_SYMBOL(cpumask_not_dying_but);

I don't like how you create a dedicated function for a random
mask. Dying mask is nothing special, right? What you really
need is probably this:
cpumask_andnot_any_but(mask, cpu_dying_mask, cpu);

Now, if you still think it's worth that, you can add a trivial wrapper
for cpu_dying_mask. (But please pick some other name, because
'not dying but' sounds like a hangover description. :) )

Thanks,
Yury

> +
> /**
> * cpumask_next_wrap - helper to implement for_each_cpu_wrap
> * @n: the cpu prior to the place to search
> --
> 2.31.1

2022-08-23 07:39:28

by Pingfan Liu

[permalink] [raw]
Subject: Re: [RFC 07/10] lib/cpumask: Introduce cpumask_not_dying_but()

On Mon, Aug 22, 2022 at 07:15:45AM -0700, Yury Norov wrote:
> On Mon, Aug 22, 2022 at 10:15:17AM +0800, Pingfan Liu wrote:
> > During cpu hot-removing, the dying cpus are still in cpu_online_mask.
> > On the other hand, A subsystem will migrate its broker from the dying
> > cpu to a online cpu in its teardown cpuhp_step.
> >
> > After enabling the teardown of cpus in parallel, cpu_online_mask can not
> > tell those dying from the real online.
> >
> > Introducing a function cpumask_not_dying_but() to pick a real online
> > cpu.
> >
> > Signed-off-by: Pingfan Liu <[email protected]>
> > Cc: Yury Norov <[email protected]>
> > Cc: Andy Shevchenko <[email protected]>
> > Cc: Rasmus Villemoes <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Steven Price <[email protected]>
> > Cc: Mark Rutland <[email protected]>
> > Cc: "Jason A. Donenfeld" <[email protected]>
> > Cc: Kuppuswamy Sathyanarayanan <[email protected]>
> > To: [email protected]
> > ---
> > include/linux/cpumask.h | 3 +++
> > kernel/cpu.c | 3 +++
> > lib/cpumask.c | 18 ++++++++++++++++++
> > 3 files changed, 24 insertions(+)
> >
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index 0d435d0edbcb..d2033a239a07 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -317,6 +317,9 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> > return i;
> > }
> >
> > +/* for parallel kexec reboot */
> > +int cpumask_not_dying_but(const struct cpumask *mask, unsigned int cpu);
> > +
> > #define CPU_BITS_NONE \
> > { \
> > [0 ... BITS_TO_LONGS(NR_CPUS)-1] = 0UL \
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 90debbe28e85..771e344f8ff9 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -1282,6 +1282,9 @@ static void cpus_down_no_rollback(struct cpumask *cpus)
> > struct cpuhp_cpu_state *st;
> > unsigned int cpu;
> >
> > + for_each_cpu(cpu, cpus)
> > + set_cpu_dying(cpu, true);
> > +
> > /* launch ap work one by one, but not wait for completion */
> > for_each_cpu(cpu, cpus) {
> > st = per_cpu_ptr(&cpuhp_state, cpu);
> > diff --git a/lib/cpumask.c b/lib/cpumask.c
> > index 8baeb37e23d3..6474f07ed87a 100644
> > --- a/lib/cpumask.c
> > +++ b/lib/cpumask.c
> > @@ -7,6 +7,24 @@
> > #include <linux/memblock.h>
> > #include <linux/numa.h>
> >
> > +/* Used in parallel kexec-reboot cpuhp callbacks */
> > +int cpumask_not_dying_but(const struct cpumask *mask,
> > + unsigned int cpu)
> > +{
> > + unsigned int i;
> > +
> > + if (CONFIG_SHUTDOWN_NONBOOT_CPUS) {
>
> Hmm... Would it even work? Anyways, the documentation says:
> Within code, where possible, use the IS_ENABLED macro to convert a Kconfig
> symbol into a C boolean expression, and use it in a normal C conditional:
>
> .. code-block:: c
>
> if (IS_ENABLED(CONFIG_SOMETHING)) {
> ...
> }
>

Yes, it shall be like you pointed out.

I changed the code from "#ifdef" style to "if (IS_ENABLED()" style just
before sending out the series. Sorry for the haste without compiling
check again.

>
> > + cpumask_check(cpu);
> > + for_each_cpu(i, mask)
> > + if (i != cpu && !cpumask_test_cpu(i, cpu_dying_mask))
> > + break;
> > + return i;
> > + } else {
> > + return cpumask_any_but(mask, cpu);
> > + }
> > +}
> > +EXPORT_SYMBOL(cpumask_not_dying_but);
>
> I don't like how you create a dedicated function for a random
> mask. Dying mask is nothing special, right? What you really

Yes, I agree.

> need is probably this:
> cpumask_andnot_any_but(mask, cpu_dying_mask, cpu);
>

That is it.

> Now, if you still think it's worth that, you can add a trivial wrapper
> for cpu_dying_mask. (But please pick some other name, because
> 'not dying but' sounds like a hangover description. :) )
>

I think that since even if !IS_ENABLED(CONFIG_SHUTDOWN_NONBOOT_CPUS),
cpumask_andnot_any_but(mask, cpu_dying_mask, cpu) can work properly,
so replacing the callsite with "cpumask_andnot() + cpumask_any_but()"
will be a choice.

Appreciate for your help.


Thanks,

Pingfan

> Thanks,
> Yury
>
> > +
> > /**
> > * cpumask_next_wrap - helper to implement for_each_cpu_wrap
> > * @n: the cpu prior to the place to search
> > --
> > 2.31.1