2020-11-15 22:59:40

by Oleksandr Natalenko

[permalink] [raw]
Subject: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330

Hi.

I'm running v5.10-rc3-rt7 for some time, and I came across this splat in
dmesg:

```
[118769.951010] ------------[ cut here ]------------
[118769.951013] WARNING: CPU: 19 PID: 146 at kernel/sched/core.c:2013
migration_cpu_stop+0x2e3/0x330
[118769.951018] Modules linked in: uinput uas usb_storage
blocklayoutdriver xt_mark ip6table_nat ip6table_filter ip6_tables rfcomm
fuse rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
sunrpc nfs_ssc fscache iptable_nat xt_MASQUERADE nf_nat iptable_filter
xt_comment nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 cmac
algif_hash algif_skcipher nf_tables af_alg snd_hda_codec_realtek nct6775
bnep tun nfnetlink hwmon_vid snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg iwlmvm soundwire_intel
soundwire_generic_allocation soundwire_cadence nls_iso8859_1 nls_cp437
vfat edac_mce_amd snd_hda_codec fat eeepc_wmi snd_hda_core mac80211
uvcvideo kvm_amd asus_wmi libarc4 soundwire_bus btusb videobuf2_vmalloc
btrtl videobuf2_memops battery btbcm sparse_keymap wmi_bmof
snd_usb_audio mxm_wmi videobuf2_v4l2 btintel snd_usbmidi_lib
videobuf2_common snd_hwdep iwlwifi snd_soc_core snd_rawmidi bluetooth
kvm videodev snd_seq_device snd_compress joydev
[118769.951047] ecdh_generic ac97_bus irqbypass ecc snd_pcm_dmaengine
input_leds mousedev mc crc16 r8169 rapl cfg80211 realtek sp5100_tco
snd_pcm mdio_devres of_mdio k10temp i2c_piix4 snd_timer rfkill fixed_phy
ipmi_devintf igb snd ipmi_msghandler libphy dca soundcore evdev tpm_crb
mac_hid tpm_tis tpm_tis_core pinctrl_amd wmi acpi_cpufreq tcp_bbr
vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock
msr crypto_user ip_tables x_tables xfs dm_thin_pool dm_persistent_data
dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_crypt cbc
encrypted_keys trusted tpm hid_logitech_hidpp hid_logitech_dj
hid_generic usbhid hid dm_mod raid10 md_mod crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd
xhci_pci xhci_pci_renesas ccp cryptd ehci_pci glue_helper xhci_hcd
ehci_hcd rng_core amdgpu gpu_sched ttm i2c_algo_bit drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm agpgart
[118769.951079] CPU: 19 PID: 146 Comm: migration/19 Not tainted
5.10.0-pf0 #1
[118769.951080] Hardware name: System manufacturer System Product
Name/Pro WS X570-ACE, BIOS 2311 10/16/2020
[118769.951081] Stopper: migration_cpu_stop+0x0/0x330 <-
affine_move_task+0x42f/0x620
[118769.951083] RIP: 0010:migration_cpu_stop+0x2e3/0x330
[118769.951084] Code: ff ff 31 db 45 85 ed 0f 89 65 ff ff ff 8b b5 d0 0a
00 00 4c 89 ff e8 cc 43 ff ff 0f b6 d8 66 85 db 75 d8 0f 0b e9 f2 fd ff
ff <0f> 0b e9 eb fd ff ff 44 89 ee 4c 89 ff e8 ab 43 ff ff 84 c0 0f 84
[118769.951085] RSP: 0018:ffffb58c806c7e50 EFLAGS: 00010046
[118769.951086] RAX: ffffa136e7a9c300 RBX: 0000000000000000 RCX:
0000000000000000
[118769.951086] RDX: 000000000000000d RSI: 0000000000000013 RDI:
ffffa136e7a9bf80
[118769.951087] RBP: ffffa13d0eee99c0 R08: 000000000000002f R09:
ffffa13d0ef29af0
[118769.951087] R10: 00000000000000ec R11: 000000000000016a R12:
ffffb58c9118fdd0
[118769.951088] R13: 00000000ffffffff R14: ffffa136e7a9c820 R15:
ffffa136e7a9bf80
[118769.951089] FS: 0000000000000000(0000) GS:ffffa13d0eec0000(0000)
knlGS:0000000000000000
[118769.951089] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[118769.951090] CR2: 00007f687474a000 CR3: 000000021319c000 CR4:
0000000000350ee0
[118769.951090] Call Trace:
[118769.951094] ? set_cpus_allowed_ptr+0x10/0x10
[118769.951095] cpu_stopper_thread+0x89/0x130
[118769.951097] ? smpboot_register_percpu_thread+0xe0/0xe0
[118769.951099] smpboot_thread_fn+0x1d8/0x2c0
[118769.951100] kthread+0x190/0x1b0
[118769.951101] ? __kthread_init_worker+0x50/0x50
[118769.951102] ret_from_fork+0x22/0x30
[118769.951105] CPU: 19 PID: 146 Comm: migration/19 Not tainted
5.10.0-pf0 #1
[118769.951106] Hardware name: System manufacturer System Product
Name/Pro WS X570-ACE, BIOS 2311 10/16/2020
[118769.951106] Stopper: migration_cpu_stop+0x0/0x330 <-
affine_move_task+0x42f/0x620
[118769.951107] Call Trace:
[118769.951108] dump_stack+0x6d/0x88
[118769.951111] __warn.cold+0x24/0x3d
[118769.951113] ? migration_cpu_stop+0x2e3/0x330
[118769.951114] report_bug+0xd1/0x100
[118769.951116] handle_bug+0x3a/0xa0
[118769.951118] exc_invalid_op+0x15/0xd0
[118769.951119] asm_exc_invalid_op+0x12/0x20
[118769.951121] RIP: 0010:migration_cpu_stop+0x2e3/0x330
[118769.951122] Code: ff ff 31 db 45 85 ed 0f 89 65 ff ff ff 8b b5 d0 0a
00 00 4c 89 ff e8 cc 43 ff ff 0f b6 d8 66 85 db 75 d8 0f 0b e9 f2 fd ff
ff <0f> 0b e9 eb fd ff ff 44 89 ee 4c 89 ff e8 ab 43 ff ff 84 c0 0f 84
[118769.951122] RSP: 0018:ffffb58c806c7e50 EFLAGS: 00010046
[118769.951123] RAX: ffffa136e7a9c300 RBX: 0000000000000000 RCX:
0000000000000000
[118769.951123] RDX: 000000000000000d RSI: 0000000000000013 RDI:
ffffa136e7a9bf80
[118769.951124] RBP: ffffa13d0eee99c0 R08: 000000000000002f R09:
ffffa13d0ef29af0
[118769.951124] R10: 00000000000000ec R11: 000000000000016a R12:
ffffb58c9118fdd0
[118769.951124] R13: 00000000ffffffff R14: ffffa136e7a9c820 R15:
ffffa136e7a9bf80
[118769.951126] ? set_cpus_allowed_ptr+0x10/0x10
[118769.951127] cpu_stopper_thread+0x89/0x130
[118769.951128] ? smpboot_register_percpu_thread+0xe0/0xe0
[118769.951129] smpboot_thread_fn+0x1d8/0x2c0
[118769.951130] kthread+0x190/0x1b0
[118769.951130] ? __kthread_init_worker+0x50/0x50
[118769.951131] ret_from_fork+0x22/0x30
[118769.951133] ---[ end trace 0000000000000002 ]---
```

which corresponds to the following condition:

```
2007 /*
2008 * When this was migrate_enable() but we no longer have an
2009 * @pending, a concurrent SCA 'fixed' things and we should
be
2010 * valid again. Nothing to do.
2011 */
2012 if (!pending) {
2013 WARN_ON_ONCE(!is_cpu_allowed(p, cpu_of(rq)));
2014 goto out;
2015 }
```

I'm not sure what triggered this, and the system still looks usable
afterwards. I have no idea how to trigger it again ATM, so this is just
a heads up in case you know what could go wrong.

Thanks.

--
Oleksandr Natalenko (post-factum)


2020-11-16 10:29:56

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330

Hi.

On 16.11.2020 11:00, Valentin Schneider wrote:
> On 15/11/20 22:32, Oleksandr Natalenko wrote:
>> I'm running v5.10-rc3-rt7 for some time, and I came across this splat
>> in
>> dmesg:
>>
>> ```
>> [118769.951010] ------------[ cut here ]------------
>> [118769.951013] WARNING: CPU: 19 PID: 146 at kernel/sched/core.c:2013
>
> Err, I didn't pick up on this back then, but isn't that check bogus? If
> the
> task is enqueued elsewhere, it's valid for it not to be affined
> 'here'. Also that is_migration_disabled() check within is_cpu_allowed()
> makes me think this isn't the best thing to call on a remote task.
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1218f3ce1713..47d5b677585f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2010,7 +2010,7 @@ static int migration_cpu_stop(void *data)
> * valid again. Nothing to do.
> */
> if (!pending) {
> - WARN_ON_ONCE(!is_cpu_allowed(p, cpu_of(rq)));
> + WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), p->cpus_ptr));
> goto out;
> }

Not sure whether the check is legitimate, but FWIW I've managed to put a
test task [1] (it spawns a lot of threads and applies affinity) into a
permanent unkillable D state here:

```
[<0>] affine_move_task+0x2d3/0x620
[<0>] __set_cpus_allowed_ptr+0x164/0x210
[<0>] sched_setaffinity+0x21a/0x300
[<0>] __x64_sys_sched_setaffinity+0x8c/0xc0
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
```

I think this corresponds to something around here:

```
$ scripts/faddr2line kernel/sched/core.o affine_move_task+0x2d3
affine_move_task+0x2d3/0x620:
arch_atomic_fetch_sub at
/home/pf/linux-pf-edge/src/linux-5.10/./arch/x86/include/asm/atomic.h:190
(inlined by) atomic_fetch_sub_release at
/home/pf/linux-pf-edge/src/linux-5.10/./include/asm-generic/atomic-instrumented.h:221
(inlined by) __refcount_sub_and_test at
/home/pf/linux-pf-edge/src/linux-5.10/./include/linux/refcount.h:272
(inlined by) __refcount_dec_and_test at
/home/pf/linux-pf-edge/src/linux-5.10/./include/linux/refcount.h:315
(inlined by) refcount_dec_and_test at
/home/pf/linux-pf-edge/src/linux-5.10/./include/linux/refcount.h:333
(inlined by) affine_move_task at
/home/pf/linux-pf-edge/src/linux-5.10/kernel/sched/core.c:2334
```

or:

```
2332 wait_for_completion(&pending->done);
2333
2334 if (refcount_dec_and_test(&pending->refs))
2335 wake_up_var(&pending->refs);
```

I'm not positive about this being directly related to the original
report, but I think it is still worth mentioning.

Thanks.

[1] https://gitlab.com/post-factum/burn_scheduler

--
Oleksandr Natalenko (post-factum)

2020-11-16 10:35:09

by Valentin Schneider

[permalink] [raw]
Subject: Re: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330


On 16/11/20 10:27, Oleksandr Natalenko wrote:
> Hi.
[...]
> Not sure whether the check is legitimate, but FWIW I've managed to put a
> test task [1] (it spawns a lot of threads and applies affinity) into a
> permanent unkillable D state here:
>
> ```
> [<0>] affine_move_task+0x2d3/0x620
> [<0>] __set_cpus_allowed_ptr+0x164/0x210
> [<0>] sched_setaffinity+0x21a/0x300
> [<0>] __x64_sys_sched_setaffinity+0x8c/0xc0
> [<0>] do_syscall_64+0x33/0x40
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> ```
[...]
>
> I'm not positive about this being directly related to the original
> report, but I think it is still worth mentioning.
>

Aye, thanks, that one should be fixed by:

https://lore.kernel.org/r/[email protected]

> Thanks.
>
> [1] https://gitlab.com/post-factum/burn_scheduler

2020-11-16 10:37:42

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330

On 16.11.2020 11:27, Oleksandr Natalenko wrote:
> Not sure whether the check is legitimate, but FWIW I've managed to put
> a test task [1] (it spawns a lot of threads and applies affinity) into
> a permanent unkillable D state here:

Just to make it clear: I neither applied your patch nor saw a splat this
time, just a D state only.

--
Oleksandr Natalenko (post-factum)

2020-11-16 10:37:57

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330

On 16.11.2020 11:31, Valentin Schneider wrote:
> On 16/11/20 10:27, Oleksandr Natalenko wrote:
>> Hi.
> [...]
>> Not sure whether the check is legitimate, but FWIW I've managed to put
>> a
>> test task [1] (it spawns a lot of threads and applies affinity) into a
>> permanent unkillable D state here:
>>
>> ```
>> [<0>] affine_move_task+0x2d3/0x620
>> [<0>] __set_cpus_allowed_ptr+0x164/0x210
>> [<0>] sched_setaffinity+0x21a/0x300
>> [<0>] __x64_sys_sched_setaffinity+0x8c/0xc0
>> [<0>] do_syscall_64+0x33/0x40
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> ```
> [...]
>>
>> I'm not positive about this being directly related to the original
>> report, but I think it is still worth mentioning.
>>
>
> Aye, thanks, that one should be fixed by:
>
> https://lore.kernel.org/r/[email protected]

Great, thanks, I'll pick it up now.

>
>> Thanks.
>>
>> [1] https://gitlab.com/post-factum/burn_scheduler

--
Oleksandr Natalenko (post-factum)

2020-11-17 01:45:38

by Valentin Schneider

[permalink] [raw]
Subject: Re: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330


On 15/11/20 22:32, Oleksandr Natalenko wrote:
> Hi.
>
> I'm running v5.10-rc3-rt7 for some time, and I came across this splat in
> dmesg:
>
> ```
> [118769.951010] ------------[ cut here ]------------
> [118769.951013] WARNING: CPU: 19 PID: 146 at kernel/sched/core.c:2013

Err, I didn't pick up on this back then, but isn't that check bogus? If the
task is enqueued elsewhere, it's valid for it not to be affined
'here'. Also that is_migration_disabled() check within is_cpu_allowed()
makes me think this isn't the best thing to call on a remote task.

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1218f3ce1713..47d5b677585f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2010,7 +2010,7 @@ static int migration_cpu_stop(void *data)
* valid again. Nothing to do.
*/
if (!pending) {
- WARN_ON_ONCE(!is_cpu_allowed(p, cpu_of(rq)));
+ WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), p->cpus_ptr));
goto out;
}

2020-11-17 11:08:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330

On Mon, Nov 16, 2020 at 10:00:14AM +0000, Valentin Schneider wrote:
>
> On 15/11/20 22:32, Oleksandr Natalenko wrote:
> > Hi.
> >
> > I'm running v5.10-rc3-rt7 for some time, and I came across this splat in
> > dmesg:
> >
> > ```
> > [118769.951010] ------------[ cut here ]------------
> > [118769.951013] WARNING: CPU: 19 PID: 146 at kernel/sched/core.c:2013
>
> Err, I didn't pick up on this back then, but isn't that check bogus? If the
> task is enqueued elsewhere, it's valid for it not to be affined
> 'here'. Also that is_migration_disabled() check within is_cpu_allowed()
> makes me think this isn't the best thing to call on a remote task.
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1218f3ce1713..47d5b677585f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2010,7 +2010,7 @@ static int migration_cpu_stop(void *data)
> * valid again. Nothing to do.
> */
> if (!pending) {
> - WARN_ON_ONCE(!is_cpu_allowed(p, cpu_of(rq)));
> + WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), p->cpus_ptr));

Ho humm.. bit of a mess that. I'm trying to figure out if we need that
is_per_cpu_kthread() test here or not.

I suppose not, what we want here is to ensure the CPU is in cpus_mask
and not care about the whole hotplug mess.

Would it makes sense to replace both instances in migration_cpu_stop()
with:

WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), p->cpus_mask));

?

2020-11-17 11:32:13

by Valentin Schneider

[permalink] [raw]
Subject: Re: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330


On 17/11/20 11:06, Peter Zijlstra wrote:
> On Mon, Nov 16, 2020 at 10:00:14AM +0000, Valentin Schneider wrote:
>>
>> On 15/11/20 22:32, Oleksandr Natalenko wrote:
>> > Hi.
>> >
>> > I'm running v5.10-rc3-rt7 for some time, and I came across this splat in
>> > dmesg:
>> >
>> > ```
>> > [118769.951010] ------------[ cut here ]------------
>> > [118769.951013] WARNING: CPU: 19 PID: 146 at kernel/sched/core.c:2013
>>
>> Err, I didn't pick up on this back then, but isn't that check bogus? If the
>> task is enqueued elsewhere, it's valid for it not to be affined
>> 'here'. Also that is_migration_disabled() check within is_cpu_allowed()
>> makes me think this isn't the best thing to call on a remote task.
>>
>> ---
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 1218f3ce1713..47d5b677585f 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2010,7 +2010,7 @@ static int migration_cpu_stop(void *data)
>> * valid again. Nothing to do.
>> */
>> if (!pending) {
>> - WARN_ON_ONCE(!is_cpu_allowed(p, cpu_of(rq)));
>> + WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), p->cpus_ptr));
>
> Ho humm.. bit of a mess that. I'm trying to figure out if we need that
> is_per_cpu_kthread() test here or not.
>
> I suppose not, what we want here is to ensure the CPU is in cpus_mask
> and not care about the whole hotplug mess.
>

That was my thought as well. On top of that, is_cpu_allowed(p) does a
p->migration_disabled read, which isn't so great in the remote case.

> Would it makes sense to replace both instances in migration_cpu_stop()
> with:
>
> WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), p->cpus_mask));
>
> ?

I guess so; I was trying to see if we could factorize this, but stopped
mid-swing as I'm really wary of shuffling too much of this code (even with
the help of TLA+; well, maybe *because* of it).