2021-04-07 20:43:34

by Mikhail Gavrilov

[permalink] [raw]
Subject: Unexpected multihop in swaput - likely driver bug.

Hi!
During the 5.12 testing cycle I observed the repeatable bug when
launching heavy graphic applications.
The kernel log is flooded with the message "Unexpected multihop in
swaput - likely driver bug.".

Trace:
[ 8707.814899] ------------[ cut here ]------------
[ 8707.814920] Unexpected multihop in swaput - likely driver bug.
[ 8707.814998] WARNING: CPU: 19 PID: 28231 at
drivers/gpu/drm/ttm/ttm_bo.c:1484 ttm_bo_swapout+0x40b/0x420 [ttm]
[ 8707.815011] Modules linked in: tun uinput snd_seq_dummy rfcomm
snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
hid_logitech_dj intel_rapl_msr snd_hda_codec_realtek intel_rapl_common
mt76x2u snd_hda_codec_generic mt76x2_common mt76x02_usb iwlmvm
ledtrig_audio snd_hda_codec_hdmi mt76_usb mt76x02_lib snd_hda_intel
mt76 snd_intel_dspcfg snd_intel_sdw_acpi mac80211 joydev snd_usb_audio
snd_hda_codec uvcvideo edac_mce_amd videobuf2_vmalloc snd_hda_core
snd_usbmidi_lib videobuf2_memops snd_hwdep iwlwifi snd_rawmidi btusb
videobuf2_v4l2 kvm_amd snd_seq videobuf2_common btrtl btbcm videodev
btintel snd_seq_device kvm mc cfg80211 bluetooth snd_pcm libarc4
eeepc_wmi snd_timer asus_wmi irqbypass xpad sp5100_tco
[ 8707.815065] sparse_keymap ecdh_generic ff_memless video ecc
wmi_bmof i2c_piix4 snd rapl k10temp soundcore rfkill acpi_cpufreq
ip_tables amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper
crct10dif_pclmul crc32_pclmul crc32c_intel cec drm ghash_clmulni_intel
igb ccp nvme dca nvme_core i2c_algo_bit wmi pinctrl_amd fuse
[ 8707.815096] CPU: 19 PID: 28231 Comm: kworker/u64:1 Tainted: G
W --------- --- 5.12.0-0.rc6.184.fc35.x86_64+debug #1
[ 8707.815101] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 3603 03/20/2021
[ 8707.815106] Workqueue: ttm_swap ttm_shrink_work [ttm]
[ 8707.815114] RIP: 0010:ttm_bo_swapout+0x40b/0x420 [ttm]
[ 8707.815122] Code: 10 00 00 48 c1 e2 0c 48 c1 e6 0c e8 3f 37 fa c8
e9 71 fe ff ff 83 f8 b8 0f 85 a9 fe ff ff 48 c7 c7 28 32 37 c0 e8 02
2b 98 c9 <0f> 0b e9 96 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
00 0f
[ 8707.815126] RSP: 0018:ffffa306d20e7d58 EFLAGS: 00010292
[ 8707.815130] RAX: 0000000000000032 RBX: ffffffffc0379260 RCX: 0000000000000027
[ 8707.815133] RDX: ffff918c091daae8 RSI: 0000000000000001 RDI: ffff918c091daae0
[ 8707.815136] RBP: ffff918602210058 R08: 0000000000000000 R09: 0000000000000000
[ 8707.815138] R10: ffffa306d20e7b90 R11: ffff918c2e2fffe8 R12: 00000000ffffffb8
[ 8707.815141] R13: ffffffffc03792a0 R14: ffff9186022102c0 R15: 0000000000000001
[ 8707.815145] FS: 0000000000000000(0000) GS:ffff918c09000000(0000)
knlGS:0000000000000000
[ 8707.815148] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8707.815151] CR2: 0000325c84d12000 CR3: 0000000776c28000 CR4: 0000000000350ee0
[ 8707.815154] Call Trace:
[ 8707.815164] ttm_shrink+0xa6/0xe0 [ttm]
[ 8707.815171] ttm_shrink_work+0x36/0x40 [ttm]
[ 8707.815177] process_one_work+0x2b0/0x5e0
[ 8707.815185] worker_thread+0x55/0x3c0
[ 8707.815188] ? process_one_work+0x5e0/0x5e0
[ 8707.815192] kthread+0x13a/0x150
[ 8707.815196] ? __kthread_bind_mask+0x60/0x60
[ 8707.815199] ret_from_fork+0x22/0x30
[ 8707.815207] irq event stamp: 0
[ 8707.815209] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 8707.815213] hardirqs last disabled at (0): [<ffffffff890ddafb>]
copy_process+0x91b/0x1e10
[ 8707.815218] softirqs last enabled at (0): [<ffffffff890ddafb>]
copy_process+0x91b/0x1e10
[ 8707.815222] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 8707.815224] ---[ end trace 29252aa87289bbaa ]---

Full kernel log: https://pastebin.com/mmAxwBYc

$ /usr/src/kernels/`uname -r`/scripts/faddr2line
/lib/debug/lib/modules/`uname
-r`/kernel/drivers/gpu/drm/ttm/ttm.ko.debug ttm_bo_swapout+0x40b
ttm_bo_swapout+0x40b/0x420:
ttm_bo_swapout at
/usr/src/debug/kernel-5.12-rc6/linux-5.12.0-0.rc6.184.fc35.x86_64/drivers/gpu/drm/ttm/ttm_bo.c:1484
(discriminator 1)


$ git blame drivers/gpu/drm/ttm/ttm_bo.c -L 1475,1494
Blaming lines: 1% (20/1530), done.
ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1475)
memset(&hop, 0, sizeof(hop));
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1476)
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1477)
evict_mem = bo->mem;
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1478)
evict_mem.mm_node = NULL;
ce65b874001d7 (Christian König 2020-09-30 16:44:16 +0200 1479)
evict_mem.placement = 0;
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1480)
evict_mem.mem_type = TTM_PL_SYSTEM;
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1481)
ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1482)
ret = ttm_bo_handle_move_mem(bo, &evict_mem, true, &ctx,
&hop);
ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1483)
if (unlikely(ret != 0)) {
ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1484)
WARN(ret == -EMULTIHOP, "Unexpected multihop in
swaput - likely driver bug.\n");
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1485)
goto out;
ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1486)
}
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1487) }
ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1488)
61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1489) /**
61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1490)
* Make sure BO is idle.
61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1491) */
61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1492)
61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1493)
ret = ttm_bo_wait(bo, false, false);
61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1494)
if (unlikely(ret != 0))




--
Best Regards,
Mike Gavrilov.


2021-04-07 20:54:29

by Christian König

[permalink] [raw]
Subject: Re: Unexpected multihop in swaput - likely driver bug.

What hardware are you using and how do you exactly trigger this?

Thanks,
Christian.

Am 07.04.21 um 11:30 schrieb Mikhail Gavrilov:
> Hi!
> During the 5.12 testing cycle I observed the repeatable bug when
> launching heavy graphic applications.
> The kernel log is flooded with the message "Unexpected multihop in
> swaput - likely driver bug.".
>
> Trace:
> [ 8707.814899] ------------[ cut here ]------------
> [ 8707.814920] Unexpected multihop in swaput - likely driver bug.
> [ 8707.814998] WARNING: CPU: 19 PID: 28231 at
> drivers/gpu/drm/ttm/ttm_bo.c:1484 ttm_bo_swapout+0x40b/0x420 [ttm]
> [ 8707.815011] Modules linked in: tun uinput snd_seq_dummy rfcomm
> snd_hrtimer netconsole nft_objref nf_conntrack_netbios_ns
> nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
> nf_tables nfnetlink cmac bnep sunrpc vfat fat hid_logitech_hidpp
> hid_logitech_dj intel_rapl_msr snd_hda_codec_realtek intel_rapl_common
> mt76x2u snd_hda_codec_generic mt76x2_common mt76x02_usb iwlmvm
> ledtrig_audio snd_hda_codec_hdmi mt76_usb mt76x02_lib snd_hda_intel
> mt76 snd_intel_dspcfg snd_intel_sdw_acpi mac80211 joydev snd_usb_audio
> snd_hda_codec uvcvideo edac_mce_amd videobuf2_vmalloc snd_hda_core
> snd_usbmidi_lib videobuf2_memops snd_hwdep iwlwifi snd_rawmidi btusb
> videobuf2_v4l2 kvm_amd snd_seq videobuf2_common btrtl btbcm videodev
> btintel snd_seq_device kvm mc cfg80211 bluetooth snd_pcm libarc4
> eeepc_wmi snd_timer asus_wmi irqbypass xpad sp5100_tco
> [ 8707.815065] sparse_keymap ecdh_generic ff_memless video ecc
> wmi_bmof i2c_piix4 snd rapl k10temp soundcore rfkill acpi_cpufreq
> ip_tables amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched drm_kms_helper
> crct10dif_pclmul crc32_pclmul crc32c_intel cec drm ghash_clmulni_intel
> igb ccp nvme dca nvme_core i2c_algo_bit wmi pinctrl_amd fuse
> [ 8707.815096] CPU: 19 PID: 28231 Comm: kworker/u64:1 Tainted: G
> W --------- --- 5.12.0-0.rc6.184.fc35.x86_64+debug #1
> [ 8707.815101] Hardware name: System manufacturer System Product
> Name/ROG STRIX X570-I GAMING, BIOS 3603 03/20/2021
> [ 8707.815106] Workqueue: ttm_swap ttm_shrink_work [ttm]
> [ 8707.815114] RIP: 0010:ttm_bo_swapout+0x40b/0x420 [ttm]
> [ 8707.815122] Code: 10 00 00 48 c1 e2 0c 48 c1 e6 0c e8 3f 37 fa c8
> e9 71 fe ff ff 83 f8 b8 0f 85 a9 fe ff ff 48 c7 c7 28 32 37 c0 e8 02
> 2b 98 c9 <0f> 0b e9 96 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> 00 0f
> [ 8707.815126] RSP: 0018:ffffa306d20e7d58 EFLAGS: 00010292
> [ 8707.815130] RAX: 0000000000000032 RBX: ffffffffc0379260 RCX: 0000000000000027
> [ 8707.815133] RDX: ffff918c091daae8 RSI: 0000000000000001 RDI: ffff918c091daae0
> [ 8707.815136] RBP: ffff918602210058 R08: 0000000000000000 R09: 0000000000000000
> [ 8707.815138] R10: ffffa306d20e7b90 R11: ffff918c2e2fffe8 R12: 00000000ffffffb8
> [ 8707.815141] R13: ffffffffc03792a0 R14: ffff9186022102c0 R15: 0000000000000001
> [ 8707.815145] FS: 0000000000000000(0000) GS:ffff918c09000000(0000)
> knlGS:0000000000000000
> [ 8707.815148] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8707.815151] CR2: 0000325c84d12000 CR3: 0000000776c28000 CR4: 0000000000350ee0
> [ 8707.815154] Call Trace:
> [ 8707.815164] ttm_shrink+0xa6/0xe0 [ttm]
> [ 8707.815171] ttm_shrink_work+0x36/0x40 [ttm]
> [ 8707.815177] process_one_work+0x2b0/0x5e0
> [ 8707.815185] worker_thread+0x55/0x3c0
> [ 8707.815188] ? process_one_work+0x5e0/0x5e0
> [ 8707.815192] kthread+0x13a/0x150
> [ 8707.815196] ? __kthread_bind_mask+0x60/0x60
> [ 8707.815199] ret_from_fork+0x22/0x30
> [ 8707.815207] irq event stamp: 0
> [ 8707.815209] hardirqs last enabled at (0): [<0000000000000000>] 0x0
> [ 8707.815213] hardirqs last disabled at (0): [<ffffffff890ddafb>]
> copy_process+0x91b/0x1e10
> [ 8707.815218] softirqs last enabled at (0): [<ffffffff890ddafb>]
> copy_process+0x91b/0x1e10
> [ 8707.815222] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 8707.815224] ---[ end trace 29252aa87289bbaa ]---
>
> Full kernel log: https://pastebin.com/mmAxwBYc
>
> $ /usr/src/kernels/`uname -r`/scripts/faddr2line
> /lib/debug/lib/modules/`uname
> -r`/kernel/drivers/gpu/drm/ttm/ttm.ko.debug ttm_bo_swapout+0x40b
> ttm_bo_swapout+0x40b/0x420:
> ttm_bo_swapout at
> /usr/src/debug/kernel-5.12-rc6/linux-5.12.0-0.rc6.184.fc35.x86_64/drivers/gpu/drm/ttm/ttm_bo.c:1484
> (discriminator 1)
>
>
> $ git blame drivers/gpu/drm/ttm/ttm_bo.c -L 1475,1494
> Blaming lines: 1% (20/1530), done.
> ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1475)
> memset(&hop, 0, sizeof(hop));
> ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1476)
> ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1477)
> evict_mem = bo->mem;
> ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1478)
> evict_mem.mm_node = NULL;
> ce65b874001d7 (Christian König 2020-09-30 16:44:16 +0200 1479)
> evict_mem.placement = 0;
> ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1480)
> evict_mem.mem_type = TTM_PL_SYSTEM;
> ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1481)
> ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1482)
> ret = ttm_bo_handle_move_mem(bo, &evict_mem, true, &ctx,
> &hop);
> ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1483)
> if (unlikely(ret != 0)) {
> ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1484)
> WARN(ret == -EMULTIHOP, "Unexpected multihop in
> swaput - likely driver bug.\n");
> ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1485)
> goto out;
> ebdf565169af0 (Dave Airlie 2020-10-29 13:58:52 +1000 1486)
> }
> ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1487) }
> ba4e7d973dd09 (Thomas Hellstrom 2009-06-10 15:20:19 +0200 1488)
> 61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1489) /**
> 61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1490)
> * Make sure BO is idle.
> 61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1491) */
> 61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1492)
> 61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1493)
> ret = ttm_bo_wait(bo, false, false);
> 61ede07055539 (Christian König 2016-06-06 10:17:57 +0200 1494)
> if (unlikely(ret != 0))
>
>
>
>

2021-04-07 21:52:15

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: Unexpected multihop in swaput - likely driver bug.

On Wed, 7 Apr 2021 at 15:46, Christian König
<[email protected]> wrote:
>
> What hardware are you using

$ inxi -bM
System: Host: fedora Kernel: 5.12.0-0.rc6.184.fc35.x86_64+debug
x86_64 bits: 64 Desktop: GNOME 40.0
Distro: Fedora release 35 (Rawhide)
Machine: Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-I GAMING
v: Rev X.0x serial: <superuser required>
UEFI: American Megatrends v: 3603 date: 03/20/2021
Battery: ID-1: hidpp_battery_0 charge: N/A condition: N/A
CPU: Info: 16-Core (2-Die) AMD Ryzen 9 3950X [MT MCP MCM] speed:
2365 MHz min/max: 2200/3500 MHz
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Navi 21 [Radeon
RX 6800/6800 XT / 6900 XT] driver: amdgpu v: kernel
Device-2: AVerMedia Live Streamer CAM 513 type: USB driver:
hid-generic,usbhid,uvcvideo
Device-3: AVerMedia Live Gamer Ultra-Video type: USB
driver: hid-generic,snd-usb-audio,usbhid,uvcvideo
Display: wayland server: X.Org 1.21.1 driver: loaded:
amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa
resolution: 3840x2160~60Hz
OpenGL: renderer: AMD SIENNA_CICHLID (DRM 3.40.0
5.12.0-0.rc6.184.fc35.x86_64+debug LLVM 12.0.0)
v: 4.6 Mesa 21.1.0-devel
Network: Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi
Device-2: Intel I211 Gigabit Network driver: igb
Drives: Local Storage: total: 11.35 TiB used: 10.82 TiB (95.3%)
Info: Processes: 805 Uptime: 12h 56m Memory: 31.18 GiB used:
21.88 GiB (70.2%) Shell: Bash inxi: 3.3.02


> and how do you exactly trigger this?

I am running heavy games like "Zombie Army 4: Dead War" and switching
to Gnome Activities and other applications while the game is running.


--
Best Regards,
Mike Gavrilov.

2021-04-12 11:12:02

by Christian König

[permalink] [raw]
Subject: Re: Unexpected multihop in swaput - likely driver bug.

Hi Mikhail,

thanks a lot for pointing this out.

Turned out that this is a known issue, but I've forgot to push the fix
to drm-misc-fixes and just queued it up for the next release.

Please re-test drm-misc-fixes and let's hope there is another -rc before
the final 5.12 kernel.

Thanks,
Christian.

Am 07.04.21 um 20:06 schrieb Mikhail Gavrilov:
> On Wed, 7 Apr 2021 at 15:46, Christian König
> <[email protected]> wrote:
>> What hardware are you using
> $ inxi -bM
> System: Host: fedora Kernel: 5.12.0-0.rc6.184.fc35.x86_64+debug
> x86_64 bits: 64 Desktop: GNOME 40.0
> Distro: Fedora release 35 (Rawhide)
> Machine: Type: Desktop Mobo: ASUSTeK model: ROG STRIX X570-I GAMING
> v: Rev X.0x serial: <superuser required>
> UEFI: American Megatrends v: 3603 date: 03/20/2021
> Battery: ID-1: hidpp_battery_0 charge: N/A condition: N/A
> CPU: Info: 16-Core (2-Die) AMD Ryzen 9 3950X [MT MCP MCM] speed:
> 2365 MHz min/max: 2200/3500 MHz
> Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Navi 21 [Radeon
> RX 6800/6800 XT / 6900 XT] driver: amdgpu v: kernel
> Device-2: AVerMedia Live Streamer CAM 513 type: USB driver:
> hid-generic,usbhid,uvcvideo
> Device-3: AVerMedia Live Gamer Ultra-Video type: USB
> driver: hid-generic,snd-usb-audio,usbhid,uvcvideo
> Display: wayland server: X.Org 1.21.1 driver: loaded:
> amdgpu,ati unloaded: fbdev,modesetting,radeon,vesa
> resolution: 3840x2160~60Hz
> OpenGL: renderer: AMD SIENNA_CICHLID (DRM 3.40.0
> 5.12.0-0.rc6.184.fc35.x86_64+debug LLVM 12.0.0)
> v: 4.6 Mesa 21.1.0-devel
> Network: Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi
> Device-2: Intel I211 Gigabit Network driver: igb
> Drives: Local Storage: total: 11.35 TiB used: 10.82 TiB (95.3%)
> Info: Processes: 805 Uptime: 12h 56m Memory: 31.18 GiB used:
> 21.88 GiB (70.2%) Shell: Bash inxi: 3.3.02
>
>
>> and how do you exactly trigger this?
> I am running heavy games like "Zombie Army 4: Dead War" and switching
> to Gnome Activities and other applications while the game is running.
>
>