2022-04-22 23:02:31

by Donghai Qiao

[permalink] [raw]
Subject: [PATCH v2 00/11] smp: cross CPU call interface

The motivation of submitting this patch set is intended to make the
existing cross CPU call mechanism become a bit more formal interface
and more friendly to the kernel developers.

Basically the minimum set of functions below can satisfy any demand
for cross CPU call from kernel consumers. For the sack of simplicity
self-explanatory and less code redundancy no ambiguity, the functions
in this interface are renamed, simplified, or eliminated. But they
are still inheriting the same semantics and parameter lists from their
previous version.

int smp_call(int cpu, smp_call_func_t func, void *info, unsigned int flags)

int smp_call_cond(int cpu, smp_call_func_t func, void *info,
smp_cond_func_t condf, unsigned int flags)

void smp_call_mask(const struct cpumask *mask, smp_call_func_t func,
void *info, unsigned int flags)

void smp_call_mask_cond(const struct cpumask *mask, smp_call_func_t func,
void *info, smp_cond_func_t condf, unsigned int flags)

int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags)

int smp_call_any(const struct cpumask *mask, smp_call_func_t func,
void *info, unsigned int flags)


Here is the explanation about the patch set:

Patch 1: The smp cross call related structures and definitions are
consolidated from smp.c smp_types.h to smp.h. As a result, smp_types.h
is deleted from the source tree.

Patch 2: The set of smp_call* functions listed above are defined.
But the details will be done with the subsequent patches in this set.

Patch 3: Eliminated the macros SCF_WAIT and SCF_RUN_LOCAL and the
code around them. The reason that we can do that is because the
function smp_call_function_many_cond() was able to handle the local
cpu call after the commit a32a4d8a815c ("smp: Run functions concurrently
in smp_call_function_many_cond()") only if the local cpu showed up
in cpumask. So it was incorrect to force a local cpu call for the
on_each_cpu_cond_mask code path.

This change and the changes in subsequent patches will eventually
help eliminate the set of on_each_cpu* functions.

Patch 4: Eliminated the percpu global csd_data. Let
smp_call_function_single() temporarily hook up to smp_call().

Patch 5: Replaced smp_call_function_single_async() with smp_call_private()
and also extended smp_call_private() to support synchronous call
with a preallocated csd structures.

Patch 6: Previously, there were two special cross call clients
irq_work.c and core.c that they were using __smp_call_single_queue
which was a smp internal function. With some minor changes in this
patch, they are able to use this interface.

Patch 7: Actually kernel consumers can use smp_call() when they
want to use smp_call_function_any(). The extra logics handled by
smp_call_function_any() should be moved out of there and have the
kernel consumers pick up the CPU. Because there are quite a few
of the consumers need to run the cross call function on any one
of the CPUs, so there is some advantage to add smp_call_any()
to the interface.

Patch 8: Eliminated smp_call_function, smp_call_function_many,
smp_call_function_many_cond, on_each_cpu, on_each_cpu_mask,
on_each_cpu_cond, on_each_cpu_cond_mask.

Patch 9: Eliminated smp_call_function_single_async.

Patch 10: Eliminated smp_call_function_single.

Patch 11: modify up.c to adopt the same format of cross CPU call.

Note: Each patch in this set depends on its precedent patch only.
The kernel can be built and boot if it is patched with any
number of patches starting from 1 to 11.

Donghai Qiao (11):
smp: consolidate the structure definitions to smp.h
smp: define the cross call interface
smp: eliminate SCF_WAIT and SCF_RUN_LOCAL
smp: replace smp_call_function_single() with smp_call()
smp: replace smp_call_function_single_async() with smp_call_private()
smp: use smp_call_private() fron irq_work.c and core.c
smp: change smp_call_function_any() to smp_call_any()
smp: replace smp_call_function_many_cond() with
__smp_call_mask_cond()
smp: replace smp_call_function_single_async with smp_call_private
smp: replace smp_call_function_single() with smp_call()
smp: modify up.c to adopt the same format of cross CPU call.

v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros.

arch/alpha/kernel/process.c | 2 +-
arch/alpha/kernel/rtc.c | 4 +-
arch/alpha/kernel/smp.c | 10 +-
arch/arc/kernel/perf_event.c | 2 +-
arch/arc/mm/cache.c | 2 +-
arch/arc/mm/tlb.c | 14 +-
arch/arm/common/bL_switcher.c | 2 +-
arch/arm/kernel/machine_kexec.c | 2 +-
arch/arm/kernel/perf_event_v7.c | 6 +-
arch/arm/kernel/smp_tlb.c | 22 +-
arch/arm/kernel/smp_twd.c | 4 +-
arch/arm/mach-bcm/bcm_kona_smc.c | 2 +-
arch/arm/mach-mvebu/pmsu.c | 4 +-
arch/arm/mm/flush.c | 4 +-
arch/arm/vfp/vfpmodule.c | 2 +-
arch/arm64/kernel/armv8_deprecated.c | 4 +-
arch/arm64/kernel/perf_event.c | 8 +-
arch/arm64/kernel/topology.c | 2 +-
arch/arm64/kvm/arm.c | 6 +-
arch/csky/abiv2/cacheflush.c | 2 +-
arch/csky/kernel/cpu-probe.c | 2 +-
arch/csky/kernel/perf_event.c | 2 +-
arch/csky/kernel/smp.c | 2 +-
arch/csky/mm/cachev2.c | 2 +-
arch/ia64/kernel/mca.c | 4 +-
arch/ia64/kernel/palinfo.c | 3 +-
arch/ia64/kernel/smp.c | 10 +-
arch/ia64/kernel/smpboot.c | 2 +-
arch/ia64/kernel/uncached.c | 4 +-
arch/mips/cavium-octeon/octeon-irq.c | 4 +-
arch/mips/cavium-octeon/setup.c | 4 +-
arch/mips/kernel/crash.c | 2 +-
arch/mips/kernel/machine_kexec.c | 2 +-
arch/mips/kernel/perf_event_mipsxx.c | 7 +-
arch/mips/kernel/process.c | 2 +-
arch/mips/kernel/smp-bmips.c | 3 +-
arch/mips/kernel/smp-cps.c | 8 +-
arch/mips/kernel/smp.c | 10 +-
arch/mips/kernel/sysrq.c | 2 +-
arch/mips/mm/c-r4k.c | 4 +-
arch/mips/sibyte/common/cfe.c | 2 +-
arch/openrisc/kernel/smp.c | 12 +-
arch/parisc/kernel/cache.c | 4 +-
arch/parisc/mm/init.c | 2 +-
arch/powerpc/kernel/dawr.c | 2 +-
arch/powerpc/kernel/kvm.c | 2 +-
arch/powerpc/kernel/security.c | 6 +-
arch/powerpc/kernel/smp.c | 4 +-
arch/powerpc/kernel/sysfs.c | 28 +-
arch/powerpc/kernel/tau_6xx.c | 4 +-
arch/powerpc/kernel/watchdog.c | 4 +-
arch/powerpc/kexec/core_64.c | 2 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
arch/powerpc/kvm/book3s_hv.c | 8 +-
arch/powerpc/mm/book3s64/pgtable.c | 2 +-
arch/powerpc/mm/book3s64/radix_tlb.c | 12 +-
arch/powerpc/mm/nohash/tlb.c | 10 +-
arch/powerpc/mm/slice.c | 4 +-
arch/powerpc/perf/core-book3s.c | 2 +-
arch/powerpc/perf/imc-pmu.c | 2 +-
arch/powerpc/platforms/85xx/smp.c | 8 +-
arch/powerpc/platforms/powernv/idle.c | 2 +-
arch/powerpc/platforms/pseries/lparcfg.c | 2 +-
arch/riscv/mm/cacheflush.c | 4 +-
arch/s390/hypfs/hypfs_diag0c.c | 2 +-
arch/s390/kernel/alternative.c | 2 +-
arch/s390/kernel/perf_cpum_cf.c | 10 +-
arch/s390/kernel/perf_cpum_cf_common.c | 4 +-
arch/s390/kernel/perf_cpum_sf.c | 4 +-
arch/s390/kernel/processor.c | 2 +-
arch/s390/kernel/smp.c | 2 +-
arch/s390/kernel/topology.c | 2 +-
arch/s390/mm/pgalloc.c | 2 +-
arch/s390/pci/pci_irq.c | 4 +-
arch/sh/kernel/smp.c | 14 +-
arch/sh/mm/cache.c | 2 +-
arch/sparc/include/asm/mman.h | 4 +-
arch/sparc/kernel/nmi.c | 16 +-
arch/sparc/kernel/perf_event.c | 4 +-
arch/sparc/kernel/smp_64.c | 8 +-
arch/sparc/mm/init_64.c | 2 +-
arch/x86/events/core.c | 6 +-
arch/x86/events/intel/core.c | 4 +-
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/amd_nb.c | 2 +-
arch/x86/kernel/apic/apic.c | 2 +-
arch/x86/kernel/apic/vector.c | 2 +-
arch/x86/kernel/cpu/aperfmperf.c | 5 +-
arch/x86/kernel/cpu/bugs.c | 2 +-
arch/x86/kernel/cpu/mce/amd.c | 4 +-
arch/x86/kernel/cpu/mce/core.c | 12 +-
arch/x86/kernel/cpu/mce/inject.c | 14 +-
arch/x86/kernel/cpu/mce/intel.c | 2 +-
arch/x86/kernel/cpu/microcode/core.c | 4 +-
arch/x86/kernel/cpu/mtrr/mtrr.c | 2 +-
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 +-
arch/x86/kernel/cpu/sgx/main.c | 5 +-
arch/x86/kernel/cpu/umwait.c | 2 +-
arch/x86/kernel/cpu/vmware.c | 2 +-
arch/x86/kernel/cpuid.c | 2 +-
arch/x86/kernel/kvm.c | 6 +-
arch/x86/kernel/ldt.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 3 +-
arch/x86/kvm/x86.c | 11 +-
arch/x86/lib/cache-smp.c | 4 +-
arch/x86/lib/msr-smp.c | 20 +-
arch/x86/mm/pat/set_memory.c | 4 +-
arch/x86/mm/tlb.c | 12 +-
arch/x86/xen/mmu_pv.c | 4 +-
arch/x86/xen/smp_pv.c | 2 +-
arch/x86/xen/suspend.c | 4 +-
arch/xtensa/kernel/smp.c | 29 +-
block/blk-mq.c | 2 +-
drivers/acpi/processor_idle.c | 4 +-
drivers/char/agp/generic.c | 2 +-
drivers/clocksource/ingenic-timer.c | 2 +-
drivers/clocksource/mips-gic-timer.c | 2 +-
drivers/cpufreq/acpi-cpufreq.c | 10 +-
drivers/cpufreq/powernow-k8.c | 9 +-
drivers/cpufreq/powernv-cpufreq.c | 14 +-
drivers/cpufreq/sparc-us2e-cpufreq.c | 4 +-
drivers/cpufreq/sparc-us3-cpufreq.c | 4 +-
drivers/cpufreq/speedstep-ich.c | 7 +-
drivers/cpufreq/tegra194-cpufreq.c | 8 +-
drivers/cpuidle/coupled.c | 2 +-
drivers/cpuidle/driver.c | 8 +-
drivers/edac/amd64_edac.c | 4 +-
drivers/firmware/arm_sdei.c | 10 +-
drivers/gpu/drm/i915/vlv_sideband.c | 2 +-
drivers/hwmon/fam15h_power.c | 2 +-
.../hwtracing/coresight/coresight-cpu-debug.c | 3 +-
.../coresight/coresight-etm3x-core.c | 11 +-
.../coresight/coresight-etm4x-core.c | 12 +-
.../coresight/coresight-etm4x-sysfs.c | 2 +-
drivers/hwtracing/coresight/coresight-trbe.c | 6 +-
drivers/irqchip/irq-mvebu-pic.c | 4 +-
.../net/ethernet/cavium/liquidio/lio_core.c | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 34 +-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 8 +-
drivers/perf/arm_spe_pmu.c | 2 +-
.../intel/speed_select_if/isst_if_mbox_msr.c | 4 +-
drivers/platform/x86/intel_ips.c | 4 +-
drivers/powercap/intel_rapl_common.c | 2 +-
drivers/powercap/intel_rapl_msr.c | 2 +-
drivers/regulator/qcom_spmi-regulator.c | 3 +-
drivers/soc/fsl/qbman/qman.c | 4 +-
drivers/soc/fsl/qbman/qman_test_stash.c | 9 +-
drivers/soc/xilinx/xlnx_event_manager.c | 2 +-
drivers/tty/sysrq.c | 2 +-
drivers/watchdog/booke_wdt.c | 8 +-
fs/buffer.c | 2 +-
include/linux/irq_work.h | 2 +-
include/linux/smp.h | 227 +++++--
include/linux/smp_types.h | 69 --
kernel/cpu.c | 4 +-
kernel/debug/debug_core.c | 2 +-
kernel/events/core.c | 10 +-
kernel/irq_work.c | 4 +-
kernel/profile.c | 4 +-
kernel/rcu/rcutorture.c | 3 +-
kernel/rcu/tasks.h | 4 +-
kernel/rcu/tree.c | 6 +-
kernel/rcu/tree_exp.h | 4 +-
kernel/relay.c | 5 +-
kernel/scftorture.c | 13 +-
kernel/sched/core.c | 4 +-
kernel/sched/fair.c | 2 +-
kernel/sched/membarrier.c | 14 +-
kernel/smp.c | 633 ++++++++----------
kernel/time/clockevents.c | 2 +-
kernel/time/clocksource.c | 2 +-
kernel/time/hrtimer.c | 6 +-
kernel/time/tick-common.c | 2 +-
kernel/trace/ftrace.c | 6 +-
kernel/trace/ring_buffer.c | 2 +-
kernel/trace/trace.c | 12 +-
kernel/trace/trace_events.c | 2 +-
kernel/up.c | 56 +-
mm/kasan/quarantine.c | 2 +-
mm/mmu_gather.c | 2 +-
mm/slab.c | 2 +-
net/bpf/test_run.c | 4 +-
net/core/dev.c | 2 +-
net/iucv/iucv.c | 17 +-
virt/kvm/kvm_main.c | 12 +-
186 files changed, 945 insertions(+), 1002 deletions(-)
delete mode 100644 include/linux/smp_types.h

--
2.27.0


2022-04-22 23:06:13

by Donghai Qiao

[permalink] [raw]
Subject: [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private()

Replaced smp_call_function_single_async() with smp_call_private()
and also extended smp_call_private() to support one CPU synchronous
call with preallocated csd structures.

Ideally, the new interface smp_call() should be able to do what
smp_call_function_single_async() does. Because the csd is provided
and maintained by the callers, it exposes the risk of corrupting
the call_single_queue[cpu] linked list if the clents menipulate
their csd inappropriately. On the other hand, there should have no
noticeable performance advantage to provide preallocated csd for
cross call kernel consumers. Thus, in the long run, the consumers
should change to not use this type of preallocated csd.

Signed-off-by: Donghai Qiao <[email protected]>
---
v1 -> v2: removed 'x' from the function names and change XCALL to SMP_CALL from the new macros

include/linux/smp.h | 3 +-
kernel/smp.c | 163 +++++++++++++++++++++-----------------------
2 files changed, 81 insertions(+), 85 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index bee1e6b5b2fd..0301faf270bf 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -206,7 +206,8 @@ int smp_call_function_single(int cpuid, smp_call_func_t func, void *info,
void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
void *info, bool wait, const struct cpumask *mask);

-int smp_call_function_single_async(int cpu, struct __call_single_data *csd);
+#define smp_call_function_single_async(cpu, csd) \
+ smp_call_private(cpu, csd, SMP_CALL_TYPE_ASYNC)

/*
* Cpus stopping functions in panic. All have default weak definitions.
diff --git a/kernel/smp.c b/kernel/smp.c
index 8998b97d5f72..51715633b4f7 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -429,41 +429,6 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
send_call_function_single_ipi(cpu);
}

-/*
- * Insert a previously allocated call_single_data_t element
- * for execution on the given CPU. data must already have
- * ->func, ->info, and ->flags set.
- */
-static int generic_exec_single(int cpu, struct __call_single_data *csd)
-{
- if (cpu == smp_processor_id()) {
- smp_call_func_t func = csd->func;
- void *info = csd->info;
- unsigned long flags;
-
- /*
- * We can unlock early even for the synchronous on-stack case,
- * since we're doing this from the same CPU..
- */
- csd_lock_record(csd);
- csd_unlock(csd);
- local_irq_save(flags);
- func(info);
- csd_lock_record(NULL);
- local_irq_restore(flags);
- return 0;
- }
-
- if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
- csd_unlock(csd);
- return -ENXIO;
- }
-
- __smp_call_single_queue(cpu, &csd->node.llist);
-
- return 0;
-}
-
/**
* generic_smp_call_function_single_interrupt - Execute SMP IPI callbacks
*
@@ -661,52 +626,6 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
}
EXPORT_SYMBOL(smp_call_function_single);

-/**
- * smp_call_function_single_async() - Run an asynchronous function on a
- * specific CPU.
- * @cpu: The CPU to run on.
- * @csd: Pre-allocated and setup data structure
- *
- * Like smp_call_function_single(), but the call is asynchonous and
- * can thus be done from contexts with disabled interrupts.
- *
- * The caller passes his own pre-allocated data structure
- * (ie: embedded in an object) and is responsible for synchronizing it
- * such that the IPIs performed on the @csd are strictly serialized.
- *
- * If the function is called with one csd which has not yet been
- * processed by previous call to smp_call_function_single_async(), the
- * function will return immediately with -EBUSY showing that the csd
- * object is still in progress.
- *
- * NOTE: Be careful, there is unfortunately no current debugging facility to
- * validate the correctness of this serialization.
- *
- * Return: %0 on success or negative errno value on error
- */
-int smp_call_function_single_async(int cpu, struct __call_single_data *csd)
-{
- int err = 0;
-
- preempt_disable();
-
- if (csd->node.u_flags & CSD_FLAG_LOCK) {
- err = -EBUSY;
- goto out;
- }
-
- csd->node.u_flags = CSD_FLAG_LOCK;
- smp_wmb();
-
- err = generic_exec_single(cpu, csd);
-
-out:
- preempt_enable();
-
- return err;
-}
-EXPORT_SYMBOL_GPL(smp_call_function_single_async);
-
/*
* smp_call_function_any - Run a function on any of the given cpus
* @mask: The mask of cpus it can run on.
@@ -1251,16 +1170,92 @@ EXPORT_SYMBOL(smp_call_mask_cond);
* Because the call is asynchonous with a preallocated csd structure, thus
* it can be called from contexts with disabled interrupts.
*
- * Parameters
+ * Ideally this functionality should be part of smp_call_mask_cond().
+ * Because the csd is provided and maintained by the callers, merging this
+ * functionality into smp_call_mask_cond() will result in some extra
+ * complications in it. Before there is better way to facilitate all
+ * kinds of call, let's still handle this case with a separate function.
+ *
+ * The bit CSD_FLAG_LOCK will be set to csd->node.u_flags only if the
+ * call is made as type CSD_TYPE_SYNC or CSD_TYPE_ASYNC.
*
+ * Parameters:
* cpu: Must be a positive value less than nr_cpu_id.
* csd: The private csd provided by the caller.
- *
* Others: see smp_call().
+ *
+ * Return: %0 on success or negative errno value on error.
+ *
+ * The following comments are from smp_call_function_single_async():
+ *
+ * The call is asynchronous and can thus be done from contexts with
+ * disabled interrupts. If the function is called with one csd which
+ * has not yet been processed by previous call, the function will
+ * return immediately with -EBUSY showing that the csd object is
+ * still in progress.
+ *
+ * NOTE: Be careful, there is unfortunately no current debugging
+ * facility to validate the correctness of this serialization.
*/
int smp_call_private(int cpu, call_single_data_t *csd, unsigned int flags)
{
- return 0;
+ int err = 0;
+
+ if ((unsigned int)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+ pr_warn("cpu ID must be a positive number < nr_cpu_ids and must be currently online\n");
+ return -EINVAL;
+ }
+
+ if (csd == NULL) {
+ pr_warn("csd must not be NULL\n");
+ return -EINVAL;
+ }
+
+ preempt_disable();
+ if (csd->node.u_flags & CSD_FLAG_LOCK) {
+ err = -EBUSY;
+ goto out;
+ }
+
+ /*
+ * CSD_FLAG_LOCK is set for CSD_TYPE_SYNC or CSD_TYPE_ASYNC only.
+ */
+ if ((flags & ~(CSD_TYPE_SYNC | CSD_TYPE_ASYNC)) == 0)
+ csd->node.u_flags = CSD_FLAG_LOCK | flags;
+ else
+ csd->node.u_flags = flags;
+
+ if (cpu == smp_processor_id()) {
+ smp_call_func_t func = csd->func;
+ void *info = csd->info;
+ unsigned long flags;
+
+ /*
+ * We can unlock early even for the synchronous on-stack case,
+ * since we're doing this from the same CPU..
+ */
+ csd_lock_record(csd);
+ csd_unlock(csd);
+ local_irq_save(flags);
+ func(info);
+ csd_lock_record(NULL);
+ local_irq_restore(flags);
+ goto out;
+ }
+
+ /*
+ * Ensure the flags are visible before the csd
+ * goes to the queue.
+ */
+ smp_wmb();
+
+ __smp_call_single_queue(cpu, &csd->node.llist);
+
+ if (flags & CSD_TYPE_SYNC)
+ csd_lock_wait(csd);
+out:
+ preempt_enable();
+ return err;
}
EXPORT_SYMBOL(smp_call_private);

--
2.27.0

2022-04-23 14:04:52

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private()

Hi Donghai,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on rafael-pm/linux-next linus/master v5.18-rc3 next-20220422]
[cannot apply to tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: i386-randconfig-a002 (https://download.01.org/0day-ci/archive/20220423/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
git checkout 3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> block/blk-mq.c:1065:39: warning: passing 4-byte aligned argument to 16-byte aligned parameter 2 of 'smp_call_private' may result in an unaligned pointer access [-Walign-mismatch]
smp_call_function_single_async(cpu, &rq->csd);
^
1 warning generated.


vim +/smp_call_private +1065 block/blk-mq.c

963395269c7586 Christoph Hellwig 2020-06-11 1055
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1056 static void blk_mq_complete_send_ipi(struct request *rq)
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1057 {
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1058 struct llist_head *list;
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1059 unsigned int cpu;
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1060
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1061 cpu = rq->mq_ctx->cpu;
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1062 list = &per_cpu(blk_cpu_done, cpu);
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1063 if (llist_add(&rq->ipi_list, list)) {
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1064 INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq);
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 @1065 smp_call_function_single_async(cpu, &rq->csd);
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1066 }
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1067 }
f9ab49184af093 Sebastian Andrzej Siewior 2021-01-23 1068

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-04-25 05:58:38

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private()

On Sat, Apr 23, 2022 at 03:30:18PM +0800, kernel test robot wrote:
> Hi Donghai,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on powerpc/next]
> [also build test WARNING on rafael-pm/linux-next linus/master v5.18-rc3 next-20220422]
> [cannot apply to tip/x86/core]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
> base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: i386-randconfig-a002 (https://download.01.org/0day-ci/archive/20220423/[email protected]/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/intel-lab-lkp/linux/commit/3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Donghai-Qiao/smp-cross-CPU-call-interface/20220423-060436
> git checkout 3b8a90029bebdb77e2d5c0cd16f371e83d8ed2e8
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> >> block/blk-mq.c:1065:39: warning: passing 4-byte aligned argument to 16-byte aligned parameter 2 of 'smp_call_private' may result in an unaligned pointer access [-Walign-mismatch]
> smp_call_function_single_async(cpu, &rq->csd);
> ^
> 1 warning generated.

This warning was fixed by commit 1139aeb1c521 ("smp: Fix
smp_call_function_single_async prototype"), which is undone by this
proposed change, hence the warning.

Cheers,
Nathan

2022-04-25 14:02:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 05/11] smp: replace smp_call_function_single_async() with smp_call_private()

On Fri, Apr 22, 2022 at 04:00:34PM -0400, Donghai Qiao wrote:
> Replaced smp_call_function_single_async() with smp_call_private()
> and also extended smp_call_private() to support one CPU synchronous
> call with preallocated csd structures.
>
> Ideally, the new interface smp_call() should be able to do what
> smp_call_function_single_async() does. Because the csd is provided
> and maintained by the callers, it exposes the risk of corrupting
> the call_single_queue[cpu] linked list if the clents menipulate
> their csd inappropriately. On the other hand, there should have no
> noticeable performance advantage to provide preallocated csd for
> cross call kernel consumers. Thus, in the long run, the consumers
> should change to not use this type of preallocated csd.

You're completely missing the point of this interface. *please* look at
it more carefully.

2022-04-27 03:55:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2 00/11] smp: cross CPU call interface

On Fri, Apr 22, 2022 at 04:00:29PM -0400, Donghai Qiao wrote:
> The motivation of submitting this patch set is intended to make the
> existing cross CPU call mechanism become a bit more formal interface
> and more friendly to the kernel developers.

As far as I can tell it mostly does the reverse. The new interfaces
are a lot more verbose for no good reason. E.g. this is a common
pattern:


- on_each_cpu(common_shutdown_1, &args, 0);
+ smp_call(SMP_CALL_ALL, common_shutdown_1, &args, SMP_CALL_TYPE_ASYNC);

or this:

- smp_call_function_single(boot_cpuid, do_remote_read, &x, 1);
+ smp_call(boot_cpuid, do_remote_read, &x, SMP_CALL_TYPE_SYNC);


The old interface more or less made some sense. The new one is just a mess.

> Patch 1: The smp cross call related structures and definitions are
> consolidated from smp.c smp_types.h to smp.h. As a result, smp_types.h
> is deleted from the source tree.

And a lot of definitions that used to be private are not public.