Hi, this is your Linux kernel regression tracker.
On 03.04.22 13:22, Michele Ballabio wrote:
>
> I think I hit a regression in 5.16-stable.
> It is difficult to reproduce, and I'm not sure if it's
> still present in 5.17 and/or if it's caused by flaky hardware.
>
> The machine is a Ryzen 5 1600 with AMD graphics (RX 560).
>
> Kernels 5.16.10 do not have the following regression, 5.16.11-16
5.16.11-16 sounds like this is a distro kernel that might or might not
be patched. Or is 11-16 just meant as a range. Could you clarify?
> do. My machine would freeze completely about once a week, no oops in
> the logs, sysrq won't work either. I managed to log only the
> following (and only once) with netconsole, while running kernel 5.16.16.
> I could not reproduce the problem since.
Hmmm. Of course ideally all regressions get fixed, but that beeing said:
5.16 will likely be EOL in round about two weeks anway and getting to
the root of this problem might take some time and effort. That's why I'm
not sure myself what's the best way forward here. Maybe testing 5.17 to
see if the problem still shows up would be good; bisection would help,
but I guess that will be hard here. But I guess there is one thing that
could help: could you maybe decode the panic you have as described in
this document:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
Ciao, Thorsten
> ----------
> 4,1490,11865947234,-;ieee80211 phy0: rt2x00usb_watchdog_tx_dma: Warning - TX queue 2 DMA timed out, invoke forced reset
> SUBSYSTEM=ieee80211
> DEVICE=+ieee80211:phy0
> 4,1491,11872348272,-;ieee80211 phy0: rt2x00usb_watchdog_tx_dma: Warning - TX queue 2 DMA timed out, invoke forced reset
> SUBSYSTEM=ieee80211
> DEVICE=+ieee80211:phy0
> 0,1493,12767657117,-;traps: PANIC: double fault, error_code: 0x0
> 4,1494,12767657121,-;double fault: 0000 [#1] PREEMPT SMP NOPTI
> 4,1495,12767657123,-;CPU: 4 PID: 16786 Comm: MediaPD~der #12 Not tainted 5.16.16 #1
> 4,1496,12767657126,-;Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018
> 4,1497,12767657127,-;RIP: 0010:entry_SYSCALL_64+0x3/0x29
> 4,1498,12767657133,-;Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 01 f8 <65> 48 89 24 25 14 60 00 00 eb 12 0f 20 dc 0f 1f 44 00 00 48 81 e4
> 4,1499,12767657134,-;RSP: 0018:00007f2a8bcbd438 EFLAGS: 00010002
> 4,1500,12767657136,-;RAX: 00000000000000ca RBX: 000000000000005d RCX: 00007f2aa45e8aab
> 4,1501,12767657138,-;RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f2aa4400018
> 4,1502,12767657139,-;RBP: 00007f2aa4400018 R08: 0000000000000000 R09: 00007f2a8ed00000
> 4,1503,12767657140,-;R10: 0000000000000000 R11: 0000000000000282 R12: 00000000000000a8
> 4,1504,12767657141,-;R13: 0000000000000003 R14: 0000000000000030 R15: 00007f2aa4400000
> 4,1505,12767657142,-;FS: 00007f2a8bcbe640(0000) GS:ffff8b110ed00000(0000) knlGS:0000000000000000
> 4,1506,12767657143,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 4,1507,12767657144,-;CR2: 00007f2a8bcbd428 CR3: 00000002953f2000 CR4: 00000000003506e0
> 4,1508,12767657146,-;Call Trace:
> 4,1509,12767657146,-,ncfrag=0/986;Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic ecc netconsole uas usb_storage snd_seq_dummy snd_hrtimer snd_seq snd_seq_device iptable_filter xt_tcpudp ip_tables
> x_tables hwmon_vid 8021q garp mrp stp llc ipv6 fuse rt73usb rt2x00usb rt2x00lib mac80211 hid_logitech cfg80211 joydev hid_generic usbhid hid amdgpu intel_rapl_msr iommu_v2 intel_rapl_common gpu_sched eeepc_wmi asus_wmi drm_ttm_helper
> ttm platform_profile battery drm_kms_helper sparse_keymap edac_mce_amd rfkill drm kvm_amd snd_hda_codec_realtek video snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel agpgart snd_intel_dspcfg snd_intel_sdw_acpi
> wmi_bmof snd_hda_codec evdev i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hwdep mfd_core snd_pcm r8169 irqbypass snd_timer realtek snd xhci_pci xhci_pci_renesas xhci_hcd mdio_devres crct10dif_pclmul
> crc32_pclmul i2c_piix4 soundcore ccp libphy ghash_clmulni_intel i2c_co4,1509,12767657146,-,ncfrag=966/986;re rapl k10temp wmi
> 4,1510,12767657189,c; acpi_cpufreq gpio_amdpt button gpio_generic loop [last unloaded: netconsole]
> 4,1511,12767657207,-;------------[ cut here ]------------
> 4,1512,12767657207,-;WARNING: CPU: 4 PID: 16786 at kernel/softirq.c:362 __local_bh_enable_ip+0x43/0x70
> 4,1513,12767657212,-,ncfrag=0/986;Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic ecc netconsole uas usb_storage snd_seq_dummy snd_hrtimer snd_seq snd_seq_device iptable_filter xt_tcpudp ip_tables
> x_tables hwmon_vid 8021q garp mrp stp llc ipv6 fuse rt73usb rt2x00usb rt2x00lib mac80211 hid_logitech cfg80211 joydev hid_generic usbhid hid amdgpu intel_rapl_msr iommu_v2 intel_rapl_common gpu_sched eeepc_wmi asus_wmi drm_ttm_helper
> ttm platform_profile battery drm_kms_helper sparse_keymap edac_mce_amd rfkill drm kvm_amd snd_hda_codec_realtek video snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel agpgart snd_intel_dspcfg snd_intel_sdw_acpi
> wmi_bmof snd_hda_codec evdev i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hwdep mfd_core snd_pcm r8169 irqbypass snd_timer realtek snd xhci_pci xhci_pci_renesas xhci_hcd mdio_devres crct10dif_pclmul
> crc32_pclmul i2c_piix4 soundcore ccp libphy ghash_clmulni_intel i2c_co4,1513,12767657212,-,ncfrag=966/986;re rapl k10temp wmi
> 4,1514,12767657248,c; acpi_cpufreq gpio_amdpt button gpio_generic loop [last unloaded: netconsole]
> 4,1515,12767657252,-;CPU: 4 PID: 16786 Comm: MediaPD~der #12 Not tainted 5.16.16 #1
> 4,1516,12767657254,-;Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018
> 4,1517,12767657255,-;RIP: 0010:__local_bh_enable_ip+0x43/0x70
> 4,1518,12767657257,-;Code: 01 35 61 1d f3 7d 65 8b 05 5a 1d f3 7d a9 00 ff ff 00 74 1a bf 01 00 00 00 e8 99 b5 02 00 65 8b 05 42 1d f3 7d 85 c0 74 25 c3 <0f> 0b eb cc 48 c7 c7 d9 53 42 83 e8 4d ec a6 00 65 66 8b 05 25 19
> 4,1519,12767657259,-;RSP: 0018:fffffe00000f69a0 EFLAGS: 00010006
> 4,1520,12767657260,-;RAX: 0000000080110203 RBX: ffff8b0e05bd2000 RCX: ffff8b0e05bd2000
> 4,1521,12767657261,-;RDX: ffff8b0e0ac28000 RSI: 0000000000000201 RDI: ffffffffc12f12c3
> 4,1522,12767657262,-;RBP: ffff8b0e0c977a30 R08: fffffe00000f69e8 R09: ffff8b0e0d085000
> 4,1523,12767657263,-;R10: ffff8b0e03234300 R11: 0000000000000fff R12: ffff8b0e0d0850d0
> 4,1524,12767657264,-;R13: fffffe00000f69e8 R14: ffff8b0e0ddfc980 R15: ffff8b0e0d085a58
> 4,1525,12767657265,-;FS: 00007f2a8bcbe640(0000) GS:ffff8b110ed00000(0000) knlGS:0000000000000000
> 4,1526,12767657266,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> ----------
>
>
On Mon, 4 Apr 2022 09:12:41 +0200
Thorsten Leemhuis <[email protected]> wrote:
> > Kernels 5.16.10 do not have the following regression, 5.16.11-16
>
> 5.16.11-16 sounds like this is a distro kernel that might or might not
> be patched. Or is 11-16 just meant as a range. Could you clarify?
Sorry, I meant the problem occurred on 5.16.11, .12 and .16.
> > do. My machine would freeze completely about once a week, no oops in
> > the logs, sysrq won't work either. I managed to log only the
> > following (and only once) with netconsole, while running kernel
> > 5.16.16. I could not reproduce the problem since.
>
> Hmmm. Of course ideally all regressions get fixed, but that beeing
> said: 5.16 will likely be EOL in round about two weeks anway and
> getting to the root of this problem might take some time and effort.
> That's why I'm not sure myself what's the best way forward here.
I'm aware of this, but given the nature of the problem and how difficult
it is to reproduce, I thought it was better to report it.
Meanwhile I'm now on 5.17.1: let's say this is on hold until someone
has a similar problem with 5.17.x.
> Maybe testing 5.17 to see if the problem still shows up would be
> good; bisection would help, but I guess that will be hard here. But I
> guess there is one thing that could help: could you maybe decode the
> panic you have as described in this document:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
Thanks, I tried but I'm not sure it's of any help:
----------
0,1493,12767657117,-;traps: PANIC: double fault, error_code: 0x0
4,1494,12767657121,-;double fault: 0000 [#1] PREEMPT SMP NOPTI
4,1496,12767657126,-;Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018
4,1497,12767657127,-;RIP: entry_SYSCALL_64+0x3/0x29
4,1498,12767657133,-;Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 01 f8 <65> 48 89 24 25 14 60 00 00 eb 12 0f 20 dc 0f 1f 44 00 00 48 81 e4
All code
========
0: cc int3
1: cc int3
2: cc int3
3: cc int3
4: cc int3
5: cc int3
6: cc int3
7: cc int3
8: cc int3
9: cc int3
a: cc int3
b: cc int3
c: cc int3
d: cc int3
e: cc int3
f: cc int3
10: cc int3
11: cc int3
12: cc int3
13: cc int3
14: cc int3
15: cc int3
16: cc int3
17: cc int3
18: cc int3
19: cc int3
1a: cc int3
1b: cc int3
1c: cc int3
1d: cc int3
1e: cc int3
1f: cc int3
20: cc int3
21: cc int3
22: cc int3
23: cc int3
24: cc int3
25: cc int3
26: cc int3
27: 0f 01 f8 swapgs
2a:* 65 48 89 24 25 14 60 mov %rsp,%gs:0x6014 <-- trapping instruction
31: 00 00
33: eb 12 jmp 0x47
35: 0f 20 dc mov %cr3,%rsp
38: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3d: 48 rex.W
3e: 81 .byte 0x81
3f: e4 .byte 0xe4
Code starting with the faulting instruction
===========================================
0: 65 48 89 24 25 14 60 mov %rsp,%gs:0x6014
7: 00 00
9: eb 12 jmp 0x1d
b: 0f 20 dc mov %cr3,%rsp
e: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
13: 48 rex.W
14: 81 .byte 0x81
15: e4 .byte 0xe4
4,1499,12767657134,-;RSP: 0018:00007f2a8bcbd438 EFLAGS: 00010002
4,1500,12767657136,-;RAX: 00000000000000ca RBX: 000000000000005d RCX: 00007f2aa45e8aab
4,1501,12767657138,-;RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007f2aa4400018
4,1502,12767657139,-;RBP: 00007f2aa4400018 R08: 0000000000000000 R09: 00007f2a8ed00000
4,1503,12767657140,-;R10: 0000000000000000 R11: 0000000000000282 R12: 00000000000000a8
4,1504,12767657141,-;R13: 0000000000000003 R14: 0000000000000030 R15: 00007f2aa4400000
4,1505,12767657142,-;FS: 00007f2a8bcbe640(0000) GS:ffff8b110ed00000(0000) knlGS:0000000000000000
4,1506,12767657143,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
4,1507,12767657144,-;CR2: 00007f2a8bcbd428 CR3: 00000002953f2000 CR4: 00000000003506e0
4,1508,12767657146,-;Call Trace:
4,1509,12767657146,-,ncfrag=0/986;Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic ecc netconsole uas usb_storage snd_seq_dummy snd_hrtimer snd_seq snd_seq_device iptable_filter xt_tcpudp ip_tables
x_tables hwmon_vid 8021q garp mrp stp llc ipv6 fuse rt73usb rt2x00usb rt2x00lib mac80211 hid_logitech cfg80211 joydev hid_generic usbhid hid amdgpu intel_rapl_msr iommu_v2 intel_rapl_common gpu_sched eeepc_wmi asus_wmi drm_ttm_helper
ttm platform_profile battery drm_kms_helper sparse_keymap edac_mce_amd rfkill drm kvm_amd snd_hda_codec_realtek video snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel agpgart snd_intel_dspcfg snd_intel_sdw_acpi
wmi_bmof snd_hda_codec evdev i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hwdep mfd_core snd_pcm r8169 irqbypass snd_timer realtek snd xhci_pci xhci_pci_renesas xhci_hcd mdio_devres crct10dif_pclmul
crc32_pclmul i2c_piix4 soundcore ccp libphy ghash_clmulni_intel i2c_co4,1509,12767657146,-,ncfrag=966/986;re rapl k10temp wmi
4,1510,12767657189,c; acpi_cpufreq gpio_amdpt button gpio_generic loop [last unloaded: netconsole]
4,1511,12767657207,-;------------[ cut here ]------------
4,1512,12767657207,-;WARNING: CPU: 4 PID: 16786 at kernel/softirq.c:362 __local_bh_enable_ip+0x43/0x70
4,1513,12767657212,-,ncfrag=0/986;Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth ecdh_generic ecc netconsole uas usb_storage snd_seq_dummy snd_hrtimer snd_seq snd_seq_device iptable_filter xt_tcpudp ip_tables
x_tables hwmon_vid 8021q garp mrp stp llc ipv6 fuse rt73usb rt2x00usb rt2x00lib mac80211 hid_logitech cfg80211 joydev hid_generic usbhid hid amdgpu intel_rapl_msr iommu_v2 intel_rapl_common gpu_sched eeepc_wmi asus_wmi drm_ttm_helper
ttm platform_profile battery drm_kms_helper sparse_keymap edac_mce_amd rfkill drm kvm_amd snd_hda_codec_realtek video snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi snd_hda_intel agpgart snd_intel_dspcfg snd_intel_sdw_acpi
wmi_bmof snd_hda_codec evdev i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect sysimgblt snd_hwdep mfd_core snd_pcm r8169 irqbypass snd_timer realtek snd xhci_pci xhci_pci_renesas xhci_hcd mdio_devres crct10dif_pclmul
crc32_pclmul i2c_piix4 soundcore ccp libphy ghash_clmulni_intel i2c_co4,1513,12767657212,-,ncfrag=966/986;re rapl k10temp wmi
4,1514,12767657248,c; acpi_cpufreq gpio_amdpt button gpio_generic loop [last unloaded: netconsole]
4,1516,12767657254,-;Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 4011 04/19/2018
4,1517,12767657255,-;RIP: __local_bh_enable_ip+0x43/0x70
4,1518,12767657257,-;Code: 01 35 61 1d f3 7d 65 8b 05 5a 1d f3 7d a9 00 ff ff 00 74 1a bf 01 00 00 00 e8 99 b5 02 00 65 8b 05 42 1d f3 7d 85 c0 74 25 c3 <0f> 0b eb cc 48 c7 c7 d9 53 42 83 e8 4d ec a6 00 65 66 8b 05 25 19
All code
========
0: 01 35 61 1d f3 7d add %esi,0x7df31d61(%rip) # 0x7df31d67
6: 65 8b 05 5a 1d f3 7d mov %gs:0x7df31d5a(%rip),%eax # 0x7df31d67
d: a9 00 ff ff 00 test $0xffff00,%eax
12: 74 1a je 0x2e
14: bf 01 00 00 00 mov $0x1,%edi
19: e8 99 b5 02 00 call 0x2b5b7
1e: 65 8b 05 42 1d f3 7d mov %gs:0x7df31d42(%rip),%eax # 0x7df31d67
25: 85 c0 test %eax,%eax
27: 74 25 je 0x4e
29: c3 ret
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb cc jmp 0xfffffffffffffffa
2e: 48 c7 c7 d9 53 42 83 mov $0xffffffff834253d9,%rdi
35: e8 4d ec a6 00 call 0xa6ec87
3a: 65 gs
3b: 66 data16
3c: 8b .byte 0x8b
3d: 05 .byte 0x5
3e: 25 .byte 0x25
3f: 19 .byte 0x19
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb cc jmp 0xffffffffffffffd0
4: 48 c7 c7 d9 53 42 83 mov $0xffffffff834253d9,%rdi
b: e8 4d ec a6 00 call 0xa6ec5d
10: 65 gs
11: 66 data16
12: 8b .byte 0x8b
13: 05 .byte 0x5
14: 25 .byte 0x25
15: 19 .byte 0x19
4,1519,12767657259,-;RSP: 0018:fffffe00000f69a0 EFLAGS: 00010006
4,1520,12767657260,-;RAX: 0000000080110203 RBX: ffff8b0e05bd2000 RCX: ffff8b0e05bd2000
4,1521,12767657261,-;RDX: ffff8b0e0ac28000 RSI: 0000000000000201 RDI: ffffffffc12f12c3
4,1522,12767657262,-;RBP: ffff8b0e0c977a30 R08: fffffe00000f69e8 R09: ffff8b0e0d085000
4,1523,12767657263,-;R10: ffff8b0e03234300 R11: 0000000000000fff R12: ffff8b0e0d0850d0
4,1524,12767657264,-;R13: fffffe00000f69e8 R14: ffff8b0e0ddfc980 R15: ffff8b0e0d085a58
4,1525,12767657265,-;FS: 00007f2a8bcbe640(0000) GS:ffff8b110ed00000(0000) knlGS:0000000000000000
4,1526,12767657266,-;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
----------
Thanks,
Michele Ballabio