Hi all,
my qemu test for arm:vexpress fails with the latest upstream kernel. It fails
hard - I don't get any output from the console. Bisect points to commit
8053871d0f7f ("smp: Fix smp_call_function_single_async() locking").
Reverting this commit fixes the problem.
Please let me know if there is anything I can help to track down the problem.
Thanks,
Guenter
---
bisect log:
# bad: [34a984f7b0cc6355a1e0c184251d0d4cc86f44d2] Merge branch 'x86-pmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
# good: [39a8804455fb23f09157341d3ba7db6d7ae6ee76] Linux 4.0
git bisect start 'HEAD' 'v4.0'
# good: [c1c21f4e60ed4523292f1a89ff45a208bddd3849] i2c: core: Export bus recovery functions
git bisect good c1c21f4e60ed4523292f1a89ff45a208bddd3849
# good: [e693d73c20ffdb06840c9378f367bad849ac0d5d] parisc: remove use of seq_printf return value
git bisect good e693d73c20ffdb06840c9378f367bad849ac0d5d
# good: [d19d5efd8c8840aa4f38a6dfbfe500d8cc27de46] Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
git bisect good d19d5efd8c8840aa4f38a6dfbfe500d8cc27de46
# good: [96d928ed75c4ba4253e82910a697ec7b06ace8b4] Merge tag 'xtensa-20150416' of git://github.com/czankel/xtensa-linux
git bisect good 96d928ed75c4ba4253e82910a697ec7b06ace8b4
# good: [e2fdae7e7c5a690b10b2d2891ec819e554dc033d] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect good e2fdae7e7c5a690b10b2d2891ec819e554dc033d
# good: [510965dd4a0a59504ba38455f77339ea8b4c6a70] Merge tag 'gpio-v4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
git bisect good 510965dd4a0a59504ba38455f77339ea8b4c6a70
# good: [10027551ccf5459cc771c31ac8bc8e5cc8db45f8] f2fs: pass checkpoint reason on roll-forward recovery
git bisect good 10027551ccf5459cc771c31ac8bc8e5cc8db45f8
# good: [d6a24d0640d609138a4e40a4ce9fd9fe7859e24c] Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
git bisect good d6a24d0640d609138a4e40a4ce9fd9fe7859e24c
# bad: [396c9df2231865ef55aa031e3f5df9d99e036869] Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 396c9df2231865ef55aa031e3f5df9d99e036869
# good: [7bdb11de8ee4f4ae195e2fa19efd304e0b36c63b] mnt: Factor out unhash_mnt from detach_mnt and umount_tree
git bisect good 7bdb11de8ee4f4ae195e2fa19efd304e0b36c63b
# good: [e0c9c0afd2fc958ffa34b697972721d81df8a56f] mnt: Update detach_mounts to leave mounts connected
git bisect good e0c9c0afd2fc958ffa34b697972721d81df8a56f
# bad: [8053871d0f7f67c7efb7f226ef031f78877d6625] smp: Fix smp_call_function_single_async() locking
git bisect bad 8053871d0f7f67c7efb7f226ef031f78877d6625
# good: [d7bc3197b41e0a1af6677e83f8736e93a1575ce0] lockdep: Make print_lock() robust against concurrent release
git bisect good d7bc3197b41e0a1af6677e83f8736e93a1575ce0
# first bad commit: [8053871d0f7f67c7efb7f226ef031f78877d6625] smp: Fix smp_call_function_single_async() locking
---
configuration:
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
CONFIG_CPUSETS=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_LBDAF is not set
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
# CONFIG_IOSCHED_CFQ is not set
CONFIG_ARCH_VEXPRESS=y
CONFIG_ARCH_VEXPRESS_CA9X4=y
CONFIG_ARCH_VEXPRESS_DCSCB=y
CONFIG_ARCH_VEXPRESS_TC2_PM=y
CONFIG_SMP=y
CONFIG_HAVE_ARM_ARCH_TIMER=y
CONFIG_MCPM=y
CONFIG_VMSPLIT_2G=y
CONFIG_NR_CPUS=8
CONFIG_ARM_PSCI=y
CONFIG_AEABI=y
CONFIG_CMA=y
CONFIG_ZBOOT_ROM_TEXT=0x0
CONFIG_ZBOOT_ROM_BSS=0x0
CONFIG_CMDLINE="console=ttyAMA0"
CONFIG_CPU_IDLE=y
CONFIG_VFP=y
CONFIG_NEON=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
# CONFIG_INET_LRO is not set
# CONFIG_IPV6 is not set
# CONFIG_WIRELESS is not set
CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_MTD=y
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_INTELEXT=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_PHYSMAP_OF=y
CONFIG_MTD_PLATRAM=y
CONFIG_MTD_UBI=y
CONFIG_VIRTIO_BLK=y
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=y
CONFIG_SCSI_VIRTIO=y
CONFIG_ATA=y
# CONFIG_SATA_PMP is not set
CONFIG_NETDEVICES=y
CONFIG_VIRTIO_NET=y
CONFIG_SMC91X=y
CONFIG_SMSC911X=y
# CONFIG_WLAN is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_SERIO_SERPORT is not set
CONFIG_SERIO_AMBAKMI=y
CONFIG_LEGACY_PTY_COUNT=16
CONFIG_SERIAL_AMBA_PL011=y
CONFIG_SERIAL_AMBA_PL011_CONSOLE=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_I2C=y
CONFIG_I2C_VERSATILE=y
CONFIG_SENSORS_VEXPRESS=y
CONFIG_REGULATOR=y
CONFIG_REGULATOR_VEXPRESS=y
CONFIG_FB=y
CONFIG_FB_ARMCLCD=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_SOUND=y
CONFIG_SND=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
# CONFIG_SND_DRIVERS is not set
CONFIG_SND_ARMAACI=y
CONFIG_HID_DRAGONRISE=y
CONFIG_HID_GYRATION=y
CONFIG_HID_TWINHAN=y
CONFIG_HID_NTRIG=y
CONFIG_HID_PANTHERLORD=y
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
CONFIG_HID_SUNPLUS=y
CONFIG_HID_GREENASIA=y
CONFIG_HID_SMARTJOYPLUS=y
CONFIG_HID_TOPSEED=y
CONFIG_HID_THRUSTMASTER=y
CONFIG_HID_ZEROPLUS=y
CONFIG_USB=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
CONFIG_USB_MON=y
CONFIG_USB_ISP1760_HCD=y
CONFIG_USB_STORAGE=y
CONFIG_MMC=y
CONFIG_MMC_ARMMMCI=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_CPU=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_PL031=y
CONFIG_VIRTIO_BALLOON=y
CONFIG_VIRTIO_MMIO=y
CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
CONFIG_VFAT_FS=y
CONFIG_TMPFS=y
CONFIG_JFFS2_FS=y
CONFIG_UBIFS_FS=y
CONFIG_CRAMFS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_LZO=y
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
CONFIG_9P_FS=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_SCHED_DEBUG is not set
CONFIG_DEBUG_USER=y
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
---
qemu command:
/opt/buildbot/bin/qemu-system-arm -M vexpress-a9 -kernel arch/arm/boot/zImage \
-drive file=core-image-minimal-qemuarm.ext3,if=sd -no-reboot \
-append "root=/dev/mmcblk0 rw console=ttyAMA0,115200 console=tty1" \
-nographic -dtb arch/arm/boot/dts/vexpress-v2p-ca9.dtb
The root file system used in this test is available at
http://server.roeck-us.net/qemu/arm/core-image-minimal-qemuarm.ext3
---
Compiler:
poky 1.7, arm-poky-linux-gnueabi-gcc (GCC) 4.9.1
On Sat, Apr 18, 2015 at 04:23:25PM -0700, Guenter Roeck wrote:
> Hi all,
>
> my qemu test for arm:vexpress fails with the latest upstream kernel. It fails
> hard - I don't get any output from the console. Bisect points to commit
> 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking").
> Reverting this commit fixes the problem.
>
Additional observation: The system boots if I add "-smp cpus=4" to the qemu
options. It does still hang, however, with "-smp cpus=2" and "-smp cpus=3".
Guenter
On Sat, Apr 18, 2015 at 7:40 PM, Guenter Roeck <[email protected]> wrote:
> On Sat, Apr 18, 2015 at 04:23:25PM -0700, Guenter Roeck wrote:
>>
>> my qemu test for arm:vexpress fails with the latest upstream kernel. It fails
>> hard - I don't get any output from the console. Bisect points to commit
>> 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking").
>> Reverting this commit fixes the problem.
Hmm. It being qemu, can you look at where it seems to lock?
> Additional observation: The system boots if I add "-smp cpus=4" to the qemu
> options. It does still hang, however, with "-smp cpus=2" and "-smp cpus=3".
Funky.
That patch still looks obviously correct to me after looking at it
again, but I guess we need to revert it if somebody can't see what's
wrong.
It does make async (wait=0) smp_call_function_single() possibly be
*really* asynchronous, ie the 'csd' ends up being released and can be
re-used even before the call-single function has completed. That
should be a good thing, but I wonder if that triggers some ARM bug.
Instead of doing a full revert, what happens if you replace this part:
+ /* Do we wait until *after* callback? */
+ if (csd->flags & CSD_FLAG_SYNCHRONOUS) {
+ func(info);
+ csd_unlock(csd);
+ } else {
+ csd_unlock(csd);
+ func(info);
+ }
with just
+ func(info);
+ csd_unlock(csd);
ie keeping the csd locked until the function has actually completed? I
guess for completeness, we should do the same thing for the cpu ==
smp_processor_id() case (see the "We can unlock early" comment).
Now, if that makes a difference, I think it implies a bug in the
caller, so it's not the right fix, but it would be an interesting
thing to test.
Linus
On 04/18/2015 05:04 PM, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 7:40 PM, Guenter Roeck <[email protected]> wrote:
>> On Sat, Apr 18, 2015 at 04:23:25PM -0700, Guenter Roeck wrote:
>>>
>>> my qemu test for arm:vexpress fails with the latest upstream kernel. It fails
>>> hard - I don't get any output from the console. Bisect points to commit
>>> 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking").
>>> Reverting this commit fixes the problem.
>
> Hmm. It being qemu, can you look at where it seems to lock?
>
I'll try. It must be very early in the boot process, prior to console
initialization - if I load qemu without -nographic I only get "Guest
has not initialized the display (yet)".
>> Additional observation: The system boots if I add "-smp cpus=4" to the qemu
>> options. It does still hang, however, with "-smp cpus=2" and "-smp cpus=3".
>
> Funky.
>
> That patch still looks obviously correct to me after looking at it
> again, but I guess we need to revert it if somebody can't see what's
> wrong.
>
> It does make async (wait=0) smp_call_function_single() possibly be
> *really* asynchronous, ie the 'csd' ends up being released and can be
> re-used even before the call-single function has completed. That
> should be a good thing, but I wonder if that triggers some ARM bug.
>
> Instead of doing a full revert, what happens if you replace this part:
>
> + /* Do we wait until *after* callback? */
> + if (csd->flags & CSD_FLAG_SYNCHRONOUS) {
> + func(info);
> + csd_unlock(csd);
> + } else {
> + csd_unlock(csd);
> + func(info);
> + }
>
> with just
>
> + func(info);
> + csd_unlock(csd);
>
> ie keeping the csd locked until the function has actually completed? I
> guess for completeness, we should do the same thing for the cpu ==
> smp_processor_id() case (see the "We can unlock early" comment).
>
> Now, if that makes a difference, I think it implies a bug in the
> caller, so it's not the right fix, but it would be an interesting
> thing to test.
>
I applied the above. No difference. Applying the same change for the cpu ==
smp_processor_id() case does not make a difference either.
Guenter
On 04/18/2015 05:04 PM, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 7:40 PM, Guenter Roeck <[email protected]> wrote:
>> On Sat, Apr 18, 2015 at 04:23:25PM -0700, Guenter Roeck wrote:
>>>
>>> my qemu test for arm:vexpress fails with the latest upstream kernel. It fails
>>> hard - I don't get any output from the console. Bisect points to commit
>>> 8053871d0f7f ("smp: Fix smp_call_function_single_async() locking").
>>> Reverting this commit fixes the problem.
>
> Hmm. It being qemu, can you look at where it seems to lock?
>
static void csd_lock_wait(struct call_single_data *csd)
{
+#if 0
while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK)
cpu_relax();
+#else
+ pr_info("csd_lock_wait: flags=0x%x\n", smp_load_acquire(&csd->flags));
+#endif
}
prints
csd_lock_wait: flags=0x3
repeatedly for each call to csd_lock_wait() [and bypasses the problem].
Further debugging shows that wait==1, and that csd points to the
pre-initialized csd_stack (which has CSD_FLAG_LOCK set).
It seems that CSD_FLAG_LOCK is never reset (there is no call to csd_unlock(), ever).
Further debugging (with added WARN_ON if cpu != 0 in smp_call_function_single) shows:
[<800157ec>] (unwind_backtrace) from [<8001250c>] (show_stack+0x10/0x14)
[<8001250c>] (show_stack) from [<80494cb4>] (dump_stack+0x88/0x98)
[<80494cb4>] (dump_stack) from [<80024058>] (warn_slowpath_common+0x84/0xb4)
[<80024058>] (warn_slowpath_common) from [<80024124>] (warn_slowpath_null+0x1c/0x24)
[<80024124>] (warn_slowpath_null) from [<80078fc8>] (smp_call_function_single+0x170/0x178)
[<80078fc8>] (smp_call_function_single) from [<80090024>] (perf_event_exit_cpu+0x80/0xf0)
[<80090024>] (perf_event_exit_cpu) from [<80090110>] (perf_cpu_notify+0x30/0x48)
[<80090110>] (perf_cpu_notify) from [<8003d340>] (notifier_call_chain+0x44/0x84)
[<8003d340>] (notifier_call_chain) from [<8002451c>] (_cpu_up+0x120/0x168)
[<8002451c>] (_cpu_up) from [<800245d4>] (cpu_up+0x70/0x94)
[<800245d4>] (cpu_up) from [<80624234>] (smp_init+0xac/0xb0)
[<80624234>] (smp_init) from [<80618d84>] (kernel_init_freeable+0x118/0x268)
[<80618d84>] (kernel_init_freeable) from [<8049107c>] (kernel_init+0x8/0xe8)
[<8049107c>] (kernel_init) from [<8000f320>] (ret_from_fork+0x14/0x34)
---[ end trace 2f9f1bb8a47b3a1b ]---
smp_call_function_single, cpu=1, wait=1, csd_stack=87825ea0
generic_exec_single, cpu=1, smp_processor_id()=0
csd_lock_wait: csd=87825ea0, flags=0x3
This is repeated for each secondary CPU. But the secondary CPUs don't respond because
they are not enabled, which I guess explains why the lock is never released.
So, in other words, this happens because the system believes (presumably per configuration
/ fdt data) that there are four CPU cores, but that is not really the case. Previously that
did not matter, and was handled correctly. Now it is fatal.
Does this help ?
Guenter
On Sat, Apr 18, 2015 at 06:56:02PM -0700, Guenter Roeck wrote:
> Further debugging (with added WARN_ON if cpu != 0 in smp_call_function_single) shows:
>
> [<800157ec>] (unwind_backtrace) from [<8001250c>] (show_stack+0x10/0x14)
> [<8001250c>] (show_stack) from [<80494cb4>] (dump_stack+0x88/0x98)
> [<80494cb4>] (dump_stack) from [<80024058>] (warn_slowpath_common+0x84/0xb4)
> [<80024058>] (warn_slowpath_common) from [<80024124>] (warn_slowpath_null+0x1c/0x24)
> [<80024124>] (warn_slowpath_null) from [<80078fc8>] (smp_call_function_single+0x170/0x178)
> [<80078fc8>] (smp_call_function_single) from [<80090024>] (perf_event_exit_cpu+0x80/0xf0)
> [<80090024>] (perf_event_exit_cpu) from [<80090110>] (perf_cpu_notify+0x30/0x48)
> [<80090110>] (perf_cpu_notify) from [<8003d340>] (notifier_call_chain+0x44/0x84)
> [<8003d340>] (notifier_call_chain) from [<8002451c>] (_cpu_up+0x120/0x168)
> [<8002451c>] (_cpu_up) from [<800245d4>] (cpu_up+0x70/0x94)
> [<800245d4>] (cpu_up) from [<80624234>] (smp_init+0xac/0xb0)
> [<80624234>] (smp_init) from [<80618d84>] (kernel_init_freeable+0x118/0x268)
> [<80618d84>] (kernel_init_freeable) from [<8049107c>] (kernel_init+0x8/0xe8)
> [<8049107c>] (kernel_init) from [<8000f320>] (ret_from_fork+0x14/0x34)
> ---[ end trace 2f9f1bb8a47b3a1b ]---
> smp_call_function_single, cpu=1, wait=1, csd_stack=87825ea0
> generic_exec_single, cpu=1, smp_processor_id()=0
> csd_lock_wait: csd=87825ea0, flags=0x3
>
> This is repeated for each secondary CPU. But the secondary CPUs don't respond because
> they are not enabled, which I guess explains why the lock is never released.
>
> So, in other words, this happens because the system believes (presumably per configuration
> / fdt data) that there are four CPU cores, but that is not really the case. Previously that
> did not matter, and was handled correctly. Now it is fatal.
>
> Does this help ?
8053871d0f7f67c7efb7f226ef031f78877d6625 moved the csd locking to the
callers, but the offline check, which was earlier done before the csd
locking, was not moved. The following moves the checks to the earlier
point fixes your boot:
diff --git a/kernel/smp.c b/kernel/smp.c
index 2aaac2c..ba1fb01 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -159,9 +159,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
}
- if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))
- return -ENXIO;
-
csd->func = func;
csd->info = info;
@@ -289,6 +286,12 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
&& !oops_in_progress);
+ if (cpu != smp_processor_id()
+ && ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))) {
+ err = -ENXIO;
+ goto out;
+ }
+
csd = &csd_stack;
if (!wait) {
csd = this_cpu_ptr(&csd_data);
@@ -300,6 +303,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
if (wait)
csd_lock_wait(csd);
+out:
put_cpu();
return err;
@@ -328,6 +332,12 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
preempt_disable();
+ if (cpu != smp_processor_id()
+ && ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))) {
+ err = -ENXIO;
+ goto out;
+ }
+
/* We could deadlock if we have to wait here with interrupts disabled! */
if (WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK))
csd_lock_wait(csd);
@@ -336,8 +346,9 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
smp_wmb();
err = generic_exec_single(cpu, csd, csd->func, csd->info);
- preempt_enable();
+out:
+ preempt_enable();
return err;
}
EXPORT_SYMBOL_GPL(smp_call_function_single_async);
On 04/18/2015 08:39 PM, Rabin Vincent wrote:
> On Sat, Apr 18, 2015 at 06:56:02PM -0700, Guenter Roeck wrote:
>> Further debugging (with added WARN_ON if cpu != 0 in smp_call_function_single) shows:
>>
>> [<800157ec>] (unwind_backtrace) from [<8001250c>] (show_stack+0x10/0x14)
>> [<8001250c>] (show_stack) from [<80494cb4>] (dump_stack+0x88/0x98)
>> [<80494cb4>] (dump_stack) from [<80024058>] (warn_slowpath_common+0x84/0xb4)
>> [<80024058>] (warn_slowpath_common) from [<80024124>] (warn_slowpath_null+0x1c/0x24)
>> [<80024124>] (warn_slowpath_null) from [<80078fc8>] (smp_call_function_single+0x170/0x178)
>> [<80078fc8>] (smp_call_function_single) from [<80090024>] (perf_event_exit_cpu+0x80/0xf0)
>> [<80090024>] (perf_event_exit_cpu) from [<80090110>] (perf_cpu_notify+0x30/0x48)
>> [<80090110>] (perf_cpu_notify) from [<8003d340>] (notifier_call_chain+0x44/0x84)
>> [<8003d340>] (notifier_call_chain) from [<8002451c>] (_cpu_up+0x120/0x168)
>> [<8002451c>] (_cpu_up) from [<800245d4>] (cpu_up+0x70/0x94)
>> [<800245d4>] (cpu_up) from [<80624234>] (smp_init+0xac/0xb0)
>> [<80624234>] (smp_init) from [<80618d84>] (kernel_init_freeable+0x118/0x268)
>> [<80618d84>] (kernel_init_freeable) from [<8049107c>] (kernel_init+0x8/0xe8)
>> [<8049107c>] (kernel_init) from [<8000f320>] (ret_from_fork+0x14/0x34)
>> ---[ end trace 2f9f1bb8a47b3a1b ]---
>> smp_call_function_single, cpu=1, wait=1, csd_stack=87825ea0
>> generic_exec_single, cpu=1, smp_processor_id()=0
>> csd_lock_wait: csd=87825ea0, flags=0x3
>>
>> This is repeated for each secondary CPU. But the secondary CPUs don't respond because
>> they are not enabled, which I guess explains why the lock is never released.
>>
>> So, in other words, this happens because the system believes (presumably per configuration
>> / fdt data) that there are four CPU cores, but that is not really the case. Previously that
>> did not matter, and was handled correctly. Now it is fatal.
>>
>> Does this help ?
>
> 8053871d0f7f67c7efb7f226ef031f78877d6625 moved the csd locking to the
> callers, but the offline check, which was earlier done before the csd
> locking, was not moved. The following moves the checks to the earlier
> point fixes your boot:
>
Yes, your patch fixes the problem.
Tested-by: Guenter Roeck <[email protected]>
Thanks,
Guenter
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 2aaac2c..ba1fb01 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -159,9 +159,6 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
> }
>
>
> - if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))
> - return -ENXIO;
> -
> csd->func = func;
> csd->info = info;
>
> @@ -289,6 +286,12 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
> && !oops_in_progress);
>
> + if (cpu != smp_processor_id()
> + && ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))) {
> + err = -ENXIO;
> + goto out;
> + }
> +
> csd = &csd_stack;
> if (!wait) {
> csd = this_cpu_ptr(&csd_data);
> @@ -300,6 +303,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
> if (wait)
> csd_lock_wait(csd);
>
> +out:
> put_cpu();
>
> return err;
> @@ -328,6 +332,12 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
>
> preempt_disable();
>
> + if (cpu != smp_processor_id()
> + && ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))) {
> + err = -ENXIO;
> + goto out;
> + }
> +
> /* We could deadlock if we have to wait here with interrupts disabled! */
> if (WARN_ON_ONCE(csd->flags & CSD_FLAG_LOCK))
> csd_lock_wait(csd);
> @@ -336,8 +346,9 @@ int smp_call_function_single_async(int cpu, struct call_single_data *csd)
> smp_wmb();
>
> err = generic_exec_single(cpu, csd, csd->func, csd->info);
> - preempt_enable();
>
> +out:
> + preempt_enable();
> return err;
> }
> EXPORT_SYMBOL_GPL(smp_call_function_single_async);
>
On Sun, Apr 19, 2015 at 4:48 AM, Linus Torvalds
<[email protected]> wrote:
>
> Does that smaller patch work equally well?
.. and here's a properly formatted email and patch.
Linus
* Linus Torvalds <[email protected]> wrote:
> On Sun, Apr 19, 2015 at 4:48 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > Does that smaller patch work equally well?
>
> .. and here's a properly formatted email and patch.
>
> Linus
> kernel/smp.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 2aaac2c47683..07854477c164 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -159,8 +159,10 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
> }
>
>
> - if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))
> + if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
> + csd_unlock(csd);
> return -ENXIO;
> + }
Acked-by: Ingo Molnar <[email protected]>
Btw., in this case we should probably also generate a WARN_ONCE()
warning?
I _think_ most such callers calling an SMP function call for offline
or out of range CPUs are at minimum racy.
Thanks,
Ingo
On 04/19/2015 02:31 AM, Ingo Molnar wrote:
>
> * Linus Torvalds <[email protected]> wrote:
>
>> On Sun, Apr 19, 2015 at 4:48 AM, Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> Does that smaller patch work equally well?
>>
>> .. and here's a properly formatted email and patch.
>>
>> Linus
>
>> kernel/smp.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index 2aaac2c47683..07854477c164 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -159,8 +159,10 @@ static int generic_exec_single(int cpu, struct call_single_data *csd,
>> }
>>
>>
>> - if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))
>> + if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
>> + csd_unlock(csd);
>> return -ENXIO;
>> + }
>
> Acked-by: Ingo Molnar <[email protected]>
>
Tested-by: Guenter Roeck <[email protected]>
> Btw., in this case we should probably also generate a WARN_ONCE()
> warning?
>
> I _think_ most such callers calling an SMP function call for offline
> or out of range CPUs are at minimum racy.
>
Not really; at least the online cpu part is an absolutely normal use
case for qemu-arm.
Sure, you can argue that "this isn't the real system", and that
qemu-arm should be "fixed", but there are reasons - the emulation
is (much) slower if the number of CPUs is set to 4, and not everyone
who wants to use qemu has a system with as many CPUs as the emulated
system would normally have.
Thanks,
Guenter
* Guenter Roeck <[email protected]> wrote:
> > I _think_ most such callers calling an SMP function call for
> > offline or out of range CPUs are at minimum racy.
>
> Not really; at least the online cpu part is an absolutely normal use
> case for qemu-arm.
The problem is that an IPI is attempted to be sent to a non-existent
CPU AFAICS, right?
> Sure, you can argue that "this isn't the real system", and that
> qemu-arm should be "fixed", but there are reasons - the emulation is
> (much) slower if the number of CPUs is set to 4, and not everyone
> who wants to use qemu has a system with as many CPUs as the emulated
> system would normally have.
That's all fine and good, but why is an IPI sent to a non-existent
CPU? It's not like we don't know which CPU is up and down.
Thanks,
Ingo
On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar <[email protected]> wrote:
>
> That's all fine and good, but why is an IPI sent to a non-existent
> CPU? It's not like we don't know which CPU is up and down.
I agree that the qemu-arm behavior smells like a bug, plain and
simple. Nobody sane should send random IPI's to CPU's that they don't
even know are up or not.
That said, I could imagine that we would have some valid case where we
just do a cross-cpu call to (for example) do lock wakeup, and the code
would use some optimistic algorithm that prods the CPU after the lock
has been released, and there could be some random race where the lock
data structure has already been released (ie imagine the kind of
optimistic unlocked racy access to "sem->owner" that we discussed as
part of the rwsem_spin_on_owner() thread recently).
So I _could_ imagine that somebody would want to do optimistic "prod
other cpu" calls that in all normal cases are for existing cpus, but
could be racy in theory.
It doesn't sound like the qemu-arm case is that kind of situation,
though. That one just sounds like a stupid "let's send an ipi to a cpu
whether it exists or not".
But Icommitted it without any new warning, because I could in theory
see it as being a valid use.
Linus
* Linus Torvalds <[email protected]> wrote:
> On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar <[email protected]> wrote:
> >
> > That's all fine and good, but why is an IPI sent to a non-existent
> > CPU? It's not like we don't know which CPU is up and down.
>
> I agree that the qemu-arm behavior smells like a bug, plain and
> simple. Nobody sane should send random IPI's to CPU's that they
> don't even know are up or not.
Yeah, and a warning would have caught that bug a bit earlier, at the
cost of restricting the API:
> That said, I could imagine that we would have some valid case where
> we just do a cross-cpu call to (for example) do lock wakeup, and the
> code would use some optimistic algorithm that prods the CPU after
> the lock has been released, and there could be some random race
> where the lock data structure has already been released (ie imagine
> the kind of optimistic unlocked racy access to "sem->owner" that we
> discussed as part of the rwsem_spin_on_owner() thread recently).
>
> So I _could_ imagine that somebody would want to do optimistic "prod
> other cpu" calls that in all normal cases are for existing cpus, but
> could be racy in theory.
Yes, and I don't disagree with such optimizations in principle (it
allows less references to be taken in the fast path), but is it really
safe?
If a CPU is going down and we potentially race against that, and send
off an IPI, the IPI might be 'in flight' for an indeterminate amount
of time, especially on wildly non-deterministic hardware like virtual
platforms.
So if the CPU goes down during that time, the typical way a CPU goes
down is that it ignores all IPIs that arrive after that. End result:
the sender of the IPI may hang indefinitely (for the synchronous API),
waiting for the CSD lock to be released...
For the non-wait API we could also hang, but in an even more
interesting way: we won't hang trying to send to the downed CPU, we'd
hang on the _next_ cross-call call, possibly sending to another CPU,
because the CSD_FLAG_LOCK of the sender CPU is never released by the
offlined CPU which was supposed to unlock it.
Also note the existing warning we already have in
flush_smp_call_function_queue():
/* There shouldn't be any pending callbacks on an offline CPU. */
if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
!warned && !llist_empty(head))) {
warned = true;
WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
Couldn't this trigger in the opportunistic, imprecise IPI case, if
IRQs are ever enabled when or after the CPU is marked offline?
My suggested warning would simply catch this kind of unsafe looking
race a bit sooner, instead of silently ignoring the cross-call
attempt.
Now your suggestion could be made to work by:
- polling for CPU status in the CSD-lock code as well. (We don't do
this currently.)
- making sure that if a late IPI arrives to an already-down CPU it
does not attempt to execute CSD functions. (I.e. silence the
above, already existing warning.)
- auditing the CPU-down path to make sure it does not get surprised
by late external IPIs. (I think this should already be so, given
the existing warning.)
- IPIs can be pending indefinitely, so make sure a pending IPI won't
confuse the machinery after a CPU has been onlined again.
- pending IPIs may consume hardware resources when not received
properly. For example I think x86 will keep trying to resend it.
Not sure there's any timeout mechanism on the hardware level - the
sending APIC might even get stuck? Audit all hardware for this
kind of possibility.
So unless I'm missing something, to me it looks like that the current
code is only safe to be used on offline CPUs if the 'offline' CPU is
never ever brought online in the first place.
> It doesn't sound like the qemu-arm case is that kind of situation,
> though. That one just sounds like a stupid "let's send an ipi to a
> cpu whether it exists or not".
>
> But Icommitted it without any new warning, because I could in theory
> see it as being a valid use.
So my point is that we already have a 'late' warning, and I think it
actively hid the qemu-arm bug, because the 'offline' CPU was so
offline (due to being non-existent) that it never had a chance to
print a warning.
With an 'early' warning we could flush out such bugs (or code
uncleanlinesses: clearly the ARM code was at least functional before)
a bit sooner.
But I don't have strong feelings about it, SMP cross-call users are a
lot less frequent than say locking facility users, so the strength of
debugging isn't nearly as important and it's fine to me either way!
Thanks,
Ingo
On Sun, Apr 19, 2015 at 4:08 PM, Guenter Roeck <[email protected]> wrote:
> On 04/19/2015 02:31 AM, Ingo Molnar wrote:
>> * Linus Torvalds <[email protected]> wrote:
>>
>>> On Sun, Apr 19, 2015 at 4:48 AM, Linus Torvalds
>>> <[email protected]> wrote:
>>>>
>>>>
>>>> Does that smaller patch work equally well?
>>>
>>>
>>> .. and here's a properly formatted email and patch.
>>>
>>> Linus
>>
>>
>>> kernel/smp.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/smp.c b/kernel/smp.c
>>> index 2aaac2c47683..07854477c164 100644
>>> --- a/kernel/smp.c
>>> +++ b/kernel/smp.c
>>> @@ -159,8 +159,10 @@ static int generic_exec_single(int cpu, struct
>>> call_single_data *csd,
>>> }
>>>
>>>
>>> - if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu))
>>> + if ((unsigned)cpu >= nr_cpu_ids || !cpu_online(cpu)) {
>>> + csd_unlock(csd);
>>> return -ENXIO;
>>> + }
>>
>>
>> Acked-by: Ingo Molnar <[email protected]>
>>
> Tested-by: Guenter Roeck <[email protected]>
I've bisected a boot regression on a real system to the same commit
8053871d0f7f ("smp: Fix smp_call_function_single_async() locking").
Linus' patch fixes it, so
Tested-by: Geert Uytterhoeven <[email protected]>
>> Btw., in this case we should probably also generate a WARN_ONCE()
>> warning?
>>
>> I _think_ most such callers calling an SMP function call for offline
>> or out of range CPUs are at minimum racy.
>>
> Not really; at least the online cpu part is an absolutely normal use
> case for qemu-arm.
>
> Sure, you can argue that "this isn't the real system", and that
> qemu-arm should be "fixed", but there are reasons - the emulation
> is (much) slower if the number of CPUs is set to 4, and not everyone
> who wants to use qemu has a system with as many CPUs as the emulated
> system would normally have.
In my case boot failed on r8a73a4/ape6evm, where I had added nodes for all
CPU cores to the .dtsi, while the SoC code doesn't have SMP bringup code yet.
This worked fine before.
With CONFIG_DEBUG_LL=y, the boot hung after:
Calibrating delay loop (skipped), value calculated using timer
frequency.. 26.00 BogoMIPS (lpj=130000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
CPU: Testing write buffer coherency: ok
CPU0: update cpu_capacity 1516
CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Setting up static identity map for 0x40009000 - 0x40009058
With the fix, it continues as expected with:
Brought up 1 CPUs
SMP: Total of 1 processors activated (26.00 BogoMIPS).
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Mon, Apr 20, 2015 at 07:39:55AM +0200, Ingo Molnar wrote:
>
> * Linus Torvalds <[email protected]> wrote:
>
> > On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar <[email protected]> wrote:
> > >
> > > That's all fine and good, but why is an IPI sent to a non-existent
> > > CPU? It's not like we don't know which CPU is up and down.
> >
> > I agree that the qemu-arm behavior smells like a bug, plain and
> > simple. Nobody sane should send random IPI's to CPU's that they
> > don't even know are up or not.
>
> Yeah, and a warning would have caught that bug a bit earlier, at the
> cost of restricting the API:
>
> > That said, I could imagine that we would have some valid case where
> > we just do a cross-cpu call to (for example) do lock wakeup, and the
> > code would use some optimistic algorithm that prods the CPU after
> > the lock has been released, and there could be some random race
> > where the lock data structure has already been released (ie imagine
> > the kind of optimistic unlocked racy access to "sem->owner" that we
> > discussed as part of the rwsem_spin_on_owner() thread recently).
> >
> > So I _could_ imagine that somebody would want to do optimistic "prod
> > other cpu" calls that in all normal cases are for existing cpus, but
> > could be racy in theory.
>
> Yes, and I don't disagree with such optimizations in principle (it
> allows less references to be taken in the fast path), but is it really
> safe?
>
> If a CPU is going down and we potentially race against that, and send
> off an IPI, the IPI might be 'in flight' for an indeterminate amount
> of time, especially on wildly non-deterministic hardware like virtual
> platforms.
>
> So if the CPU goes down during that time, the typical way a CPU goes
> down is that it ignores all IPIs that arrive after that. End result:
> the sender of the IPI may hang indefinitely (for the synchronous API),
> waiting for the CSD lock to be released...
>
> For the non-wait API we could also hang, but in an even more
> interesting way: we won't hang trying to send to the downed CPU, we'd
> hang on the _next_ cross-call call, possibly sending to another CPU,
> because the CSD_FLAG_LOCK of the sender CPU is never released by the
> offlined CPU which was supposed to unlock it.
>
> Also note the existing warning we already have in
> flush_smp_call_function_queue():
>
> /* There shouldn't be any pending callbacks on an offline CPU. */
> if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
> !warned && !llist_empty(head))) {
> warned = true;
> WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
>
> Couldn't this trigger in the opportunistic, imprecise IPI case, if
> IRQs are ever enabled when or after the CPU is marked offline?
>
> My suggested warning would simply catch this kind of unsafe looking
> race a bit sooner, instead of silently ignoring the cross-call
> attempt.
>
> Now your suggestion could be made to work by:
>
> - polling for CPU status in the CSD-lock code as well. (We don't do
> this currently.)
>
> - making sure that if a late IPI arrives to an already-down CPU it
> does not attempt to execute CSD functions. (I.e. silence the
> above, already existing warning.)
>
> - auditing the CPU-down path to make sure it does not get surprised
> by late external IPIs. (I think this should already be so, given
> the existing warning.)
>
> - IPIs can be pending indefinitely, so make sure a pending IPI won't
> confuse the machinery after a CPU has been onlined again.
>
> - pending IPIs may consume hardware resources when not received
> properly. For example I think x86 will keep trying to resend it.
> Not sure there's any timeout mechanism on the hardware level - the
> sending APIC might even get stuck? Audit all hardware for this
> kind of possibility.
>
> So unless I'm missing something, to me it looks like that the current
> code is only safe to be used on offline CPUs if the 'offline' CPU is
> never ever brought online in the first place.
>
> > It doesn't sound like the qemu-arm case is that kind of situation,
> > though. That one just sounds like a stupid "let's send an ipi to a
> > cpu whether it exists or not".
> >
> > But Icommitted it without any new warning, because I could in theory
> > see it as being a valid use.
>
> So my point is that we already have a 'late' warning, and I think it
> actively hid the qemu-arm bug, because the 'offline' CPU was so
> offline (due to being non-existent) that it never had a chance to
> print a warning.
>
> With an 'early' warning we could flush out such bugs (or code
> uncleanlinesses: clearly the ARM code was at least functional before)
> a bit sooner.
>
> But I don't have strong feelings about it, SMP cross-call users are a
> lot less frequent than say locking facility users, so the strength of
> debugging isn't nearly as important and it's fine to me either way!
In the long term, support of "send an IPI to a CPU that might or might
not be leaving" will probably need an API that returns a success or
failure indication. This will probably simplify things a bit, because
currently the caller needs to participate in the IPI's synchronization.
With a conditional primitive, callers could instead simply rely on the
return value.
Thanx, Paul
On Sun, Apr 19, 2015 at 08:01:40PM +0200, Ingo Molnar wrote:
> That's all fine and good, but why is an IPI sent to a non-existent
> CPU? It's not like we don't know which CPU is up and down.
The perf events code is trying to call smp_call_function_single() on the
non-existent CPU in perf_event_exit_cpu_context() while handling the
CPU_UP_CANCELED notification. perf_cpu_notify() handles CPU_UP_CANCELED
and CPU_DOWN_PREPARE in the same way.
(cpu_up() is tried for the non-existing CPUs because in this case what
is specified in the device tree does not match reality.)
On Sun, Apr 19, 2015 at 10:39 PM, Ingo Molnar <[email protected]> wrote:
>
>>
>> So I _could_ imagine that somebody would want to do optimistic "prod
>> other cpu" calls that in all normal cases are for existing cpus, but
>> could be racy in theory.
>
> Yes, and I don't disagree with such optimizations in principle (it
> allows less references to be taken in the fast path), but is it really
> safe?
>
> If a CPU is going down and we potentially race against that, and send
> off an IPI, the IPI might be 'in flight' for an indeterminate amount
> of time, especially on wildly non-deterministic hardware like virtual
> platforms.
Well, it should be easy enough to handle that race in the cpu
offlining: after the cpu is marked "not present", just call
flush_smp_call_function_queue(), In fact, I thought we did exactly
that - it's the reason for the "warn_cpu_offline" argument, isn't it)?
So I don't think there should be any real race. Sure, the HW IPI
itself might be in flight, but from a sw perspective isn't all done.
No, I was talking about something even more optimistic - the CPU
number we optimisitcally loaded and sent an IPI to might be completely
bogus just because we loaded it using some unlocked sequence, and
maybe the memory got re-assigned. So it might not even be a CPU number
that is "stale", it could be entirely invalid.
And no, I don't claim that we should do this, I'm just saying that I
could imagine this being a valid thing to do. But it might be a good
idea to add a WARN_ON_ONCE() for now to find the users that are not
being clever like this, they are just being stupid and wrong-headed,
and sending IPI's to bogus CPU's not because they are doing really
subtle smart stuff, but just because they never noticed how stupid
they are..
Linus