2024-02-19 22:02:19

by Erhard Furtner

[permalink] [raw]
Subject: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

Greetings!

'modprobe -v ttm-device-test' on my Ryzen 5950X amd64 box and on my Talos II (ppc64) leads to immediate list_add corruption.

The machines stay useable via VNC but the issue seems to cause memory corruption which shows up later on when PAGE_POISONING is enabled:

[...]
KTAP version 1
1..1
KTAP version 1
# Subtest: ttm_device
# module: ttm_device_test
1..5
ok 1 ttm_device_init_basic
# ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
num_dev == 3 (0x3)
not ok 2 ttm_device_init_multiple
list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:32!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 6 PID: 2129 Comm: kunit_try_catch Tainted: G N 6.7.5-Zen3 #1
Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
RIP: 0010:__list_add_valid_or_report+0x67/0x9c
Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
Call Trace:
<TASK>
? __die_body+0x15/0x65
? die+0x2f/0x48
? do_trap+0x76/0x109
? __list_add_valid_or_report+0x67/0x9c
? __list_add_valid_or_report+0x67/0x9c
? do_error_trap+0x69/0xa6
? __list_add_valid_or_report+0x67/0x9c
? exc_invalid_op+0x4d/0x71
? __list_add_valid_or_report+0x67/0x9c
? asm_exc_invalid_op+0x1a/0x20
? __list_add_valid_or_report+0x67/0x9c
? __list_add_valid_or_report+0x67/0x9c
ttm_device_init+0x10e/0x157 [ttm]
ttm_device_kunit_init+0x3d/0x51 [ttm_kunit_helpers]
ttm_device_fini_basic+0x6d/0x1b3 [ttm_device_test]
? timekeeping_get_ns+0x19/0x3b
? srso_alias_return_thunk+0x5/0xfbef5
? ktime_get_ts64+0x40/0x92
kunit_try_run_case+0xaf/0x163 [kunit]
? kunit_try_catch_throw+0x1b/0x1b [kunit]
? kunit_try_catch_throw+0x1b/0x1b [kunit]
kunit_generic_run_threadfn_adapter+0x15/0x20 [kunit]
kthread+0xcf/0xd7
? kthread_complete_and_exit+0x1a/0x1a
ret_from_fork+0x23/0x35
? kthread_complete_and_exit+0x1a/0x1a
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_add_valid_or_report+0x67/0x9c
Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
Key type dns_resolver registered
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
# ttm_device_fini_basic: try timed out
general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#2] SMP NOPTI
CPU: 26 PID: 2119 Comm: modprobe Tainted: G D N 6.7.5-Zen3 #1
Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
RIP: 0010:kthread_stop+0x3c/0x78
Code: f0 0f c1 43 28 be 02 00 00 00 85 c0 74 0c 8d 50 01 09 c2 79 0a be 01 00 00 00 e8 f5 31 37 00 48 89 df e8 35 f1 ff ff 48 89 c5 <f0> 80 08 02 48 89 df e8 6a ff ff ff f0 80 4b 02 02 48 89 df e8 f6
RSP: 0018:ffffb23b01fff938 EFLAGS: 00010246
RAX: 6b6b6b6b6b6b6b6b RBX: ffffa0b170ab6040 RCX: 0000000000000000
RDX: 000000006b6b6b6f RSI: 0000000000000002 RDI: 0000000000000000
RBP: 6b6b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b170ab6040
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
Call Trace:
<TASK>
? __die_body+0x15/0x65
? die_addr+0x37/0x50
? exc_general_protection+0x1b6/0x1ec
? asm_exc_general_protection+0x26/0x30
? kthread_stop+0x3c/0x78
? kthread_stop+0x39/0x78
kunit_try_catch_run+0xc9/0x155 [kunit]
kunit_run_case_catch_errors+0x3f/0x93 [kunit]
kunit_run_tests+0x182/0x516 [kunit]
? kunit_try_run_case_cleanup+0x39/0x39 [kunit]
? kunit_catch_run_case_cleanup+0x85/0x85 [kunit]
__kunit_test_suites_init+0x64/0x83 [kunit]
kunit_module_notify+0xda/0x177 [kunit]
notifier_call_chain+0x5a/0x92
blocking_notifier_call_chain+0x3e/0x60
do_init_module+0xcb/0x218
init_module_from_file+0x7a/0x99
__do_sys_finit_module+0x162/0x223
do_syscall_64+0x6e/0xd8
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f9321f7a479
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 87 89 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe2e350908 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 00005590b57cef40 RCX: 00007f9321f7a479
RDX: 0000000000000000 RSI: 00005590b5100c7c RDI: 0000000000000007
RBP: 0000000000000000 R08: 00007f9322043b20 R09: 0000000000000000
R10: 0000000000000050 R11: 0000000000000246 R12: 0000000000040000
R13: 00005590b5100c7c R14: 00005590b57cefe0 R15: 0000000000000000
</TASK>
Modules linked in: nfsv4 dns_resolver nfs lockd grace ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_add_valid_or_report+0x67/0x9c
Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
=============================================================================
BUG task_struct (Tainted: G D N): Poison overwritten
-----------------------------------------------------------------------------

0xffffa0b170ab6068-0xffffa0b170ab6068 @offset=24680. First byte 0x6c instead of 0x6b
Slab 0xffffea8944c2ac00 objects=8 used=8 fp=0x0000000000000000 flags=0x4000000000000840(slab|head|zone=1)
Object 0xffffa0b170ab6040 @offset=24640 fp=0x0000000000000000

Redzone ffffa0b170ab6000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone ffffa0b170ab6010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone ffffa0b170ab6020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone ffffa0b170ab6030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Object ffffa0b170ab6040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffffa0b170ab6050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffffa0b170ab6060: 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b kkkkkkkklkkkkkkk
Object ffffa0b170ab6070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[...]
Object ffffa0b170ab6fb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffffa0b170ab6fc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone ffffa0b170ab6fd0: bb bb bb bb bb bb bb bb ........
Padding ffffa0b170ab6fe0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding ffffa0b170ab6ff0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
CPU: 13 PID: 2 Comm: kthreadd Tainted: G D N 6.7.5-Zen3 #1
Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
Call Trace:
<TASK>
dump_stack_lvl+0x37/0x52
check_bytes_and_report+0xa7/0x107
check_object+0x157/0x253
alloc_debug_processing+0x5d/0x111
___slab_alloc+0x288/0x561
? copy_process+0x35f/0x2276
? kthread_is_per_cpu+0x22/0x22
ret_from_fork+0x23/0x35
? kthread_is_per_cpu+0x22/0x22
ret_from_fork_asm+0x11/0x20
</TASK>
FIX task_struct: Restoring Poison 0xffffa0b170ab6068-0xffffa0b170ab6068=0x6b
FIX task_struct: Marking all objects used


The Talos II ppc64 trace looks a bit different:

[...]
KTAP version 1
1..1
KTAP version 1
# Subtest: ttm_pool
# module: ttm_pool_test
1..8
KTAP version 1
# Subtest: ttm_pool_alloc_basic
ok 1 One page
ok 2 More than one page
ok 3 Above the allocation limit
# ttm_pool_alloc_basic: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_pool_test.c:162
Expected err == 0, but
err == -12 (0xfffffffffffffff4)
not ok 4 One page, with coherent DMA mappings enabled
list_add corruption. prev->next should be next (c00800000cf64fc0), but was 0000000000000000. (prev=c0002000061a4ad0).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:32!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
CPU: 29 PID: 934 Comm: kunit_try_catch Tainted: G TN 6.7.5-gentoo-P9 #1
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
NIP: c000000000864744 LR: c000000000864740 CTR: 0000000000000000
REGS: c000200015333a30 TRAP: 0700 Tainted: G TN (6.7.5-gentoo-P9)
MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000222 XER: 00000000
CFAR: c0000000001d5620 IRQMASK: 0
GPR00: 0000000000000000 c000200015333cd0 c0000000011b4700 0000000000000075
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 c0002007fa4d5e00 c000000000182548 c0002000066aa1c0
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c0002000061a4010 c00800000cf64fc0 c0002000061a4020
GPR28: c0002000061a4ad0 c00800000cf64fa8 c00800000cf64fa0 c0002000061a4010
NIP [c000000000864744] __list_add_valid_or_report+0xd4/0x120
LR [c000000000864740] __list_add_valid_or_report+0xd0/0x120
Call Trace:
[c000200015333cd0] [c000000000864740] __list_add_valid_or_report+0xd0/0x120 (unreliable)
[c000200015333d30] [c00800000cf5eed8] ttm_pool_type_init+0xa0/0x120 [ttm]
[c000200015333d80] [c00800000cf5efec] ttm_pool_init+0x94/0x170 [ttm]
[c000200015333de0] [c00800000cc6b324] ttm_pool_alloc_basic+0x9c/0x670 [ttm_pool_test]
[c000200015333ea0] [c00800000bddf7f0] kunit_try_run_case+0xb8/0x220 [kunit]
[c000200015333f60] [c00800000bde27c8] kunit_generic_run_threadfn_adapter+0x30/0x50 [kunit]
[c000200015333f90] [c000000000182670] kthread+0x130/0x140
[c000200015333fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
Code: f8010070 4b970ea9 60000000 0fe00000 7c0802a6 3c62fff1 7d064378 7d244b78 38639600 f8010070 4b970e85 60000000 <0fe00000> 7c0802a6 3c62fff1 7ca62b78
---[ end trace 0000000000000000 ]---

note: kunit_try_catch[934] exited with irqs disabled
# ttm_pool_alloc_basic: try timed out
BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6b6b
Faulting instruction address: 0xc000000000181ae4
Oops: Kernel access of bad area, sig: 11 [#2]
BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
CPU: 17 PID: 921 Comm: modprobe Tainted: G D TN 6.7.5-gentoo-P9 #1
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
NIP: c000000000181ae4 LR: c00800000bde2a54 CTR: c000000000181a80
REGS: c0002000153871b0 TRAP: 0380 Tainted: G D TN (6.7.5-gentoo-P9)
MSR: 900000000280b032 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 44422282 XER: 00000000
CFAR: c00800000bde53ec IRQMASK: 0
GPR00: c00800000bde2a54 c000200015387450 c0000000011b4700 c0000000b1e34d00
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 000000006b6b6b6c c00800000bde53d8
GPR12: c000000000181a80 c0002007fa4dd600 0000000020000000 0000000020000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000002 0000000020000000 c0000000023d78f8 c0000000023d78a8
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: c0002000153876c0 6b6b6b6b6b6b6b6b c0000000b1e34d00 c0000000b1e34eb8
NIP [c000000000181ae4] kthread_stop+0x64/0x1c0
LR [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
Call Trace:
[c000200015387450] [c0000000001d5934] vprintk+0x84/0xc0 (unreliable)
[c000200015387490] [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
[c000200015387540] [c00800000bde4f14] kunit_run_case_catch_errors+0x60/0xf0 [kunit]
[c0002000153875a0] [c00800000bddf448] kunit_run_tests+0x560/0x680 [kunit]
[c0002000153878d0] [c00800000bddf614] __kunit_test_suites_init+0xac/0x160 [kunit]
[c000200015387970] [c00800000bde349c] kunit_exec_run_tests+0x44/0xb0 [kunit]
[c0002000153879f0] [c00800000bddecbc] kunit_module_notify+0x4d4/0x590 [kunit]
[c000200015387a90] [c0000000001842f0] notifier_call_chain+0xa0/0x190
[c000200015387b30] [c00000000018480c] blocking_notifier_call_chain+0x5c/0xb0
[c000200015387b70] [c00000000020cf64] do_init_module+0x234/0x330
[c000200015387bf0] [c00000000021054c] init_module_from_file+0x9c/0xf0
[c000200015387cc0] [c000000000210740] sys_finit_module+0x190/0x420
[c000200015387d80] [c00000000002b808] system_call_exception+0x1b8/0x3a0
[c000200015387e50] [c00000000000c270] system_call_vectored_common+0xf0/0x280
--- interrupt: 3000 at 0x3fff9eb3d7c8
NIP: 00003fff9eb3d7c8 LR: 0000000000000000 CTR: 0000000000000000
REGS: c000200015387e80 TRAP: 3000 Tainted: G D TN (6.7.5-gentoo-P9)
MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 48422244 XER: 00000000
IRQMASK: 0
GPR00: 0000000000000161 00003fffc80d3ab0 00003fff9ec37100 0000000000000007
GPR04: 0000000134f6df90 0000000000000000 000000000000001f 0000000000000045
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00003fff9ef7fbe0 0000000020000000 0000000020000000
GPR16: 0000000000000000 0000000000000000 0000000000000020 0000000020000000
GPR20: 0000000161994850 0000000020000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000161993f90
GPR28: 0000000134f6df90 0000000000040000 0000000000000000 0000000161993cc0
NIP [00003fff9eb3d7c8] 0x3fff9eb3d7c8
LR [0000000000000000] 0x0
--- interrupt: 3000
Code: 40c2fff4 2c090000 41820164 39490001 7d494b78 2c090000 418000f4 813e01a8 6d290020 79295fe2 0b090000 ebbe0738 <7d20e8a8> 61290002 7d20e9ad 40c2fff4
---[ end trace 0000000000000000 ]---

note: modprobe[921] exited with irqs disabled
=============================================================================
BUG task_struct (Tainted: G D TN): Poison overwritten
-----------------------------------------------------------------------------

0xc0000000b1e34ebb-0xc0000000b1e34ebb @offset=20155. First byte 0x6c instead of 0x6b
Slab 0xc00c000002c78c00 objects=5 used=4 fp=0xc0000000b1e33380 flags=0x7ffc0000000840(slab|head|node=0|zone=0|lastcpupid=0x1fff)
Object 0xc0000000b1e34d00 @offset=19712 fp=0xc0000000b1e33380

Redzone c0000000b1e34c80: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone c0000000b1e34c90: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone c0000000b1e34ca0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone c0000000b1e34cb0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone c0000000b1e34cc0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone c0000000b1e34cd0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone c0000000b1e34ce0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Redzone c0000000b1e34cf0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
Object c0000000b1e34d00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34d90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34da0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34db0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34dc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34dd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34de0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34df0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34e90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34ea0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c0000000b1e34eb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b kkkkkkkkkkklkkkk
Object c0000000b1e34ec0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[...]
Object c0000000b1e35cf0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Redzone c0000000b1e36580: bb bb bb bb bb bb bb bb ........
Padding c0000000b1e36590: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c0000000b1e365a0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c0000000b1e365b0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c0000000b1e365c0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c0000000b1e365d0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c0000000b1e365e0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c0000000b1e365f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
CPU: 28 PID: 2 Comm: kthreadd Tainted: G D TN 6.7.5-gentoo-P9 #1
Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
Call Trace:
[c00000000593b890] [c000000000e8ecf8] dump_stack_lvl+0x6c/0xb0 (unreliable)
[c00000000593b8c0] [c00000000041dad0] print_trailer+0x1e0/0x22c
[c00000000593b940] [c0000000004155f4] check_bytes_and_report+0x224/0x240
[c00000000593b9f0] [c00000000041596c] check_object+0x35c/0x4a0
[c00000000593ba40] [c0000000004168dc] alloc_debug_processing+0xdc/0x270
[c00000000593bac0] [c000000000416c8c] get_partial_node.part.0+0x21c/0x460
[c00000000593bb80] [c000000000417148] ___slab_alloc+0x278/0xb20
[c00000000593bc90] [c000000000417b3c] kmem_cache_alloc_node+0x14c/0x630
[c00000000593bd20] [c000000000140618] copy_process+0x408/0x3270
[c00000000593be00] [c0000000001435f4] kernel_clone+0xc4/0x5b0
[c00000000593be80] [c000000000143dc4] kernel_thread+0x84/0xc0
[c00000000593bf40] [c0000000001829bc] kthreadd+0x1ec/0x290
[c00000000593bfe0] [c00000000000d030] start_kernel_thread+0x14/0x18
FIX task_struct: Restoring Poison 0xc0000000b1e34ebb-0xc0000000b1e34ebb=0x6b
FIX task_struct: Marking all objects used


Full dmesg and kernel .config of both machines attached.

Regards,
Erhard


Attachments:
(No filename) (24.59 kB)
dmesg_675_zen3_v0.txt (110.87 kB)
dmesg_675_p9_v0.txt (56.74 kB)
config_675_zen3-van (143.48 kB)
config_675_p9 (125.68 kB)
Download all attachments

2024-02-20 09:23:33

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

On Mon, Feb 19, 2024 at 11:01:16PM +0100, Erhard Furtner wrote:
> Greetings!
>
> 'modprobe -v ttm-device-test' on my Ryzen 5950X amd64 box and on my Talos II (ppc64) leads to immediate list_add corruption.
>
> The machines stay useable via VNC but the issue seems to cause memory corruption which shows up later on when PAGE_POISONING is enabled:
>
> [...]
> KTAP version 1
> 1..1
> KTAP version 1
> # Subtest: ttm_device
> # module: ttm_device_test
> 1..5
> ok 1 ttm_device_init_basic
> # ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
> Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
> list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
> num_dev == 3 (0x3)
> not ok 2 ttm_device_init_multiple
> list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0).
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:32!
> invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 6 PID: 2129 Comm: kunit_try_catch Tainted: G N 6.7.5-Zen3 #1
> Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
> Call Trace:
> <TASK>
> ? __die_body+0x15/0x65
> ? die+0x2f/0x48
> ? do_trap+0x76/0x109
> ? __list_add_valid_or_report+0x67/0x9c
> ? __list_add_valid_or_report+0x67/0x9c
> ? do_error_trap+0x69/0xa6
> ? __list_add_valid_or_report+0x67/0x9c
> ? exc_invalid_op+0x4d/0x71
> ? __list_add_valid_or_report+0x67/0x9c
> ? asm_exc_invalid_op+0x1a/0x20
> ? __list_add_valid_or_report+0x67/0x9c
> ? __list_add_valid_or_report+0x67/0x9c
> ttm_device_init+0x10e/0x157 [ttm]
> ttm_device_kunit_init+0x3d/0x51 [ttm_kunit_helpers]
> ttm_device_fini_basic+0x6d/0x1b3 [ttm_device_test]
> ? timekeeping_get_ns+0x19/0x3b
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? ktime_get_ts64+0x40/0x92
> kunit_try_run_case+0xaf/0x163 [kunit]
> ? kunit_try_catch_throw+0x1b/0x1b [kunit]
> ? kunit_try_catch_throw+0x1b/0x1b [kunit]
> kunit_generic_run_threadfn_adapter+0x15/0x20 [kunit]
> kthread+0xcf/0xd7
> ? kthread_complete_and_exit+0x1a/0x1a
> ret_from_fork+0x23/0x35
> ? kthread_complete_and_exit+0x1a/0x1a
> ret_from_fork_asm+0x11/0x20
> </TASK>
> Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
> Key type dns_resolver registered
> NFS: Registering the id_resolver key type
> Key type id_resolver registered
> Key type id_legacy registered
> # ttm_device_fini_basic: try timed out
> general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#2] SMP NOPTI
> CPU: 26 PID: 2119 Comm: modprobe Tainted: G D N 6.7.5-Zen3 #1
> Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> RIP: 0010:kthread_stop+0x3c/0x78
> Code: f0 0f c1 43 28 be 02 00 00 00 85 c0 74 0c 8d 50 01 09 c2 79 0a be 01 00 00 00 e8 f5 31 37 00 48 89 df e8 35 f1 ff ff 48 89 c5 <f0> 80 08 02 48 89 df e8 6a ff ff ff f0 80 4b 02 02 48 89 df e8 f6
> RSP: 0018:ffffb23b01fff938 EFLAGS: 00010246
> RAX: 6b6b6b6b6b6b6b6b RBX: ffffa0b170ab6040 RCX: 0000000000000000
> RDX: 000000006b6b6b6f RSI: 0000000000000002 RDI: 0000000000000000
> RBP: 6b6b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b170ab6040
> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
> Call Trace:
> <TASK>
> ? __die_body+0x15/0x65
> ? die_addr+0x37/0x50
> ? exc_general_protection+0x1b6/0x1ec
> ? asm_exc_general_protection+0x26/0x30
> ? kthread_stop+0x3c/0x78
> ? kthread_stop+0x39/0x78
> kunit_try_catch_run+0xc9/0x155 [kunit]
> kunit_run_case_catch_errors+0x3f/0x93 [kunit]
> kunit_run_tests+0x182/0x516 [kunit]
> ? kunit_try_run_case_cleanup+0x39/0x39 [kunit]
> ? kunit_catch_run_case_cleanup+0x85/0x85 [kunit]
> __kunit_test_suites_init+0x64/0x83 [kunit]
> kunit_module_notify+0xda/0x177 [kunit]
> notifier_call_chain+0x5a/0x92
> blocking_notifier_call_chain+0x3e/0x60
> do_init_module+0xcb/0x218
> init_module_from_file+0x7a/0x99
> __do_sys_finit_module+0x162/0x223
> do_syscall_64+0x6e/0xd8
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f9321f7a479
> Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 87 89 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffe2e350908 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 00005590b57cef40 RCX: 00007f9321f7a479
> RDX: 0000000000000000 RSI: 00005590b5100c7c RDI: 0000000000000007
> RBP: 0000000000000000 R08: 00007f9322043b20 R09: 0000000000000000
> R10: 0000000000000050 R11: 0000000000000246 R12: 0000000000040000
> R13: 00005590b5100c7c R14: 00005590b57cefe0 R15: 0000000000000000
> </TASK>
> Modules linked in: nfsv4 dns_resolver nfs lockd grace ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
> =============================================================================
> BUG task_struct (Tainted: G D N): Poison overwritten
> -----------------------------------------------------------------------------
>
> 0xffffa0b170ab6068-0xffffa0b170ab6068 @offset=24680. First byte 0x6c instead of 0x6b
> Slab 0xffffea8944c2ac00 objects=8 used=8 fp=0x0000000000000000 flags=0x4000000000000840(slab|head|zone=1)
> Object 0xffffa0b170ab6040 @offset=24640 fp=0x0000000000000000
>
> Redzone ffffa0b170ab6000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone ffffa0b170ab6010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone ffffa0b170ab6020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone ffffa0b170ab6030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Object ffffa0b170ab6040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object ffffa0b170ab6050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object ffffa0b170ab6060: 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b kkkkkkkklkkkkkkk
> Object ffffa0b170ab6070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [...]
> Object ffffa0b170ab6fb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object ffffa0b170ab6fc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
> Redzone ffffa0b170ab6fd0: bb bb bb bb bb bb bb bb ........
> Padding ffffa0b170ab6fe0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Padding ffffa0b170ab6ff0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> CPU: 13 PID: 2 Comm: kthreadd Tainted: G D N 6.7.5-Zen3 #1
> Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> Call Trace:
> <TASK>
> dump_stack_lvl+0x37/0x52
> check_bytes_and_report+0xa7/0x107
> check_object+0x157/0x253
> alloc_debug_processing+0x5d/0x111
> ___slab_alloc+0x288/0x561
> ? copy_process+0x35f/0x2276
> ? kthread_is_per_cpu+0x22/0x22
> ret_from_fork+0x23/0x35
> ? kthread_is_per_cpu+0x22/0x22
> ret_from_fork_asm+0x11/0x20
> </TASK>
> FIX task_struct: Restoring Poison 0xffffa0b170ab6068-0xffffa0b170ab6068=0x6b
> FIX task_struct: Marking all objects used
>
>
> The Talos II ppc64 trace looks a bit different:
>
> [...]
> KTAP version 1
> 1..1
> KTAP version 1
> # Subtest: ttm_pool
> # module: ttm_pool_test
> 1..8
> KTAP version 1
> # Subtest: ttm_pool_alloc_basic
> ok 1 One page
> ok 2 More than one page
> ok 3 Above the allocation limit
> # ttm_pool_alloc_basic: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_pool_test.c:162
> Expected err == 0, but
> err == -12 (0xfffffffffffffff4)
> not ok 4 One page, with coherent DMA mappings enabled
> list_add corruption. prev->next should be next (c00800000cf64fc0), but was 0000000000000000. (prev=c0002000061a4ad0).
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:32!
> Oops: Exception in kernel mode, sig: 5 [#1]
> BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
> Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
> CPU: 29 PID: 934 Comm: kunit_try_catch Tainted: G TN 6.7.5-gentoo-P9 #1
> Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> NIP: c000000000864744 LR: c000000000864740 CTR: 0000000000000000
> REGS: c000200015333a30 TRAP: 0700 Tainted: G TN (6.7.5-gentoo-P9)
> MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000222 XER: 00000000
> CFAR: c0000000001d5620 IRQMASK: 0
> GPR00: 0000000000000000 c000200015333cd0 c0000000011b4700 0000000000000075
> GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR12: 0000000000000000 c0002007fa4d5e00 c000000000182548 c0002000066aa1c0
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 c0002000061a4010 c00800000cf64fc0 c0002000061a4020
> GPR28: c0002000061a4ad0 c00800000cf64fa8 c00800000cf64fa0 c0002000061a4010
> NIP [c000000000864744] __list_add_valid_or_report+0xd4/0x120
> LR [c000000000864740] __list_add_valid_or_report+0xd0/0x120
> Call Trace:
> [c000200015333cd0] [c000000000864740] __list_add_valid_or_report+0xd0/0x120 (unreliable)
> [c000200015333d30] [c00800000cf5eed8] ttm_pool_type_init+0xa0/0x120 [ttm]
> [c000200015333d80] [c00800000cf5efec] ttm_pool_init+0x94/0x170 [ttm]
> [c000200015333de0] [c00800000cc6b324] ttm_pool_alloc_basic+0x9c/0x670 [ttm_pool_test]
> [c000200015333ea0] [c00800000bddf7f0] kunit_try_run_case+0xb8/0x220 [kunit]
> [c000200015333f60] [c00800000bde27c8] kunit_generic_run_threadfn_adapter+0x30/0x50 [kunit]
> [c000200015333f90] [c000000000182670] kthread+0x130/0x140
> [c000200015333fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> Code: f8010070 4b970ea9 60000000 0fe00000 7c0802a6 3c62fff1 7d064378 7d244b78 38639600 f8010070 4b970e85 60000000 <0fe00000> 7c0802a6 3c62fff1 7ca62b78
> ---[ end trace 0000000000000000 ]---
>
> note: kunit_try_catch[934] exited with irqs disabled
> # ttm_pool_alloc_basic: try timed out
> BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6b6b
> Faulting instruction address: 0xc000000000181ae4
> Oops: Kernel access of bad area, sig: 11 [#2]
> BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
> Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
> CPU: 17 PID: 921 Comm: modprobe Tainted: G D TN 6.7.5-gentoo-P9 #1
> Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> NIP: c000000000181ae4 LR: c00800000bde2a54 CTR: c000000000181a80
> REGS: c0002000153871b0 TRAP: 0380 Tainted: G D TN (6.7.5-gentoo-P9)
> MSR: 900000000280b032 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 44422282 XER: 00000000
> CFAR: c00800000bde53ec IRQMASK: 0
> GPR00: c00800000bde2a54 c000200015387450 c0000000011b4700 c0000000b1e34d00
> GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR08: 0000000000000000 0000000000000000 000000006b6b6b6c c00800000bde53d8
> GPR12: c000000000181a80 c0002007fa4dd600 0000000020000000 0000000020000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000002 0000000020000000 c0000000023d78f8 c0000000023d78a8
> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR28: c0002000153876c0 6b6b6b6b6b6b6b6b c0000000b1e34d00 c0000000b1e34eb8
> NIP [c000000000181ae4] kthread_stop+0x64/0x1c0
> LR [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
> Call Trace:
> [c000200015387450] [c0000000001d5934] vprintk+0x84/0xc0 (unreliable)
> [c000200015387490] [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
> [c000200015387540] [c00800000bde4f14] kunit_run_case_catch_errors+0x60/0xf0 [kunit]
> [c0002000153875a0] [c00800000bddf448] kunit_run_tests+0x560/0x680 [kunit]
> [c0002000153878d0] [c00800000bddf614] __kunit_test_suites_init+0xac/0x160 [kunit]
> [c000200015387970] [c00800000bde349c] kunit_exec_run_tests+0x44/0xb0 [kunit]
> [c0002000153879f0] [c00800000bddecbc] kunit_module_notify+0x4d4/0x590 [kunit]
> [c000200015387a90] [c0000000001842f0] notifier_call_chain+0xa0/0x190
> [c000200015387b30] [c00000000018480c] blocking_notifier_call_chain+0x5c/0xb0
> [c000200015387b70] [c00000000020cf64] do_init_module+0x234/0x330
> [c000200015387bf0] [c00000000021054c] init_module_from_file+0x9c/0xf0
> [c000200015387cc0] [c000000000210740] sys_finit_module+0x190/0x420
> [c000200015387d80] [c00000000002b808] system_call_exception+0x1b8/0x3a0
> [c000200015387e50] [c00000000000c270] system_call_vectored_common+0xf0/0x280
> --- interrupt: 3000 at 0x3fff9eb3d7c8
> NIP: 00003fff9eb3d7c8 LR: 0000000000000000 CTR: 0000000000000000
> REGS: c000200015387e80 TRAP: 3000 Tainted: G D TN (6.7.5-gentoo-P9)
> MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 48422244 XER: 00000000
> IRQMASK: 0
> GPR00: 0000000000000161 00003fffc80d3ab0 00003fff9ec37100 0000000000000007
> GPR04: 0000000134f6df90 0000000000000000 000000000000001f 0000000000000045
> GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR12: 0000000000000000 00003fff9ef7fbe0 0000000020000000 0000000020000000
> GPR16: 0000000000000000 0000000000000000 0000000000000020 0000000020000000
> GPR20: 0000000161994850 0000000020000000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000161993f90
> GPR28: 0000000134f6df90 0000000000040000 0000000000000000 0000000161993cc0
> NIP [00003fff9eb3d7c8] 0x3fff9eb3d7c8
> LR [0000000000000000] 0x0
> --- interrupt: 3000
> Code: 40c2fff4 2c090000 41820164 39490001 7d494b78 2c090000 418000f4 813e01a8 6d290020 79295fe2 0b090000 ebbe0738 <7d20e8a8> 61290002 7d20e9ad 40c2fff4
> ---[ end trace 0000000000000000 ]---
>
> note: modprobe[921] exited with irqs disabled
> =============================================================================
> BUG task_struct (Tainted: G D TN): Poison overwritten
> -----------------------------------------------------------------------------
>
> 0xc0000000b1e34ebb-0xc0000000b1e34ebb @offset=20155. First byte 0x6c instead of 0x6b
> Slab 0xc00c000002c78c00 objects=5 used=4 fp=0xc0000000b1e33380 flags=0x7ffc0000000840(slab|head|node=0|zone=0|lastcpupid=0x1fff)
> Object 0xc0000000b1e34d00 @offset=19712 fp=0xc0000000b1e33380
>
> Redzone c0000000b1e34c80: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone c0000000b1e34c90: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone c0000000b1e34ca0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone c0000000b1e34cb0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone c0000000b1e34cc0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone c0000000b1e34cd0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone c0000000b1e34ce0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Redzone c0000000b1e34cf0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> Object c0000000b1e34d00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34d90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34da0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34db0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34dc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34dd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34de0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34df0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34e90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34ea0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object c0000000b1e34eb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b kkkkkkkkkkklkkkk
> Object c0000000b1e34ec0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [...]
> Object c0000000b1e35cf0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Redzone c0000000b1e36580: bb bb bb bb bb bb bb bb ........
> Padding c0000000b1e36590: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Padding c0000000b1e365a0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Padding c0000000b1e365b0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Padding c0000000b1e365c0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Padding c0000000b1e365d0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Padding c0000000b1e365e0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> Padding c0000000b1e365f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> CPU: 28 PID: 2 Comm: kthreadd Tainted: G D TN 6.7.5-gentoo-P9 #1
> Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> Call Trace:
> [c00000000593b890] [c000000000e8ecf8] dump_stack_lvl+0x6c/0xb0 (unreliable)
> [c00000000593b8c0] [c00000000041dad0] print_trailer+0x1e0/0x22c
> [c00000000593b940] [c0000000004155f4] check_bytes_and_report+0x224/0x240
> [c00000000593b9f0] [c00000000041596c] check_object+0x35c/0x4a0
> [c00000000593ba40] [c0000000004168dc] alloc_debug_processing+0xdc/0x270
> [c00000000593bac0] [c000000000416c8c] get_partial_node.part.0+0x21c/0x460
> [c00000000593bb80] [c000000000417148] ___slab_alloc+0x278/0xb20
> [c00000000593bc90] [c000000000417b3c] kmem_cache_alloc_node+0x14c/0x630
> [c00000000593bd20] [c000000000140618] copy_process+0x408/0x3270
> [c00000000593be00] [c0000000001435f4] kernel_clone+0xc4/0x5b0
> [c00000000593be80] [c000000000143dc4] kernel_thread+0x84/0xc0
> [c00000000593bf40] [c0000000001829bc] kthreadd+0x1ec/0x290
> [c00000000593bfe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> FIX task_struct: Restoring Poison 0xc0000000b1e34ebb-0xc0000000b1e34ebb=0x6b
> FIX task_struct: Marking all objects used
>
>
> Full dmesg and kernel .config of both machines attached.
>
> Regards,
> Erhard

> [ 0.000000] Linux version 6.7.5-Zen3 (root@supah) (gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113, GNU ld (Gentoo 2.41 p5) 2.41.0) #1 SMP Mon Feb 19 12:44:46 -00 2024

Is it vanilla kernel (i.e. no patches applied)? Can you also check current
mainline (v6.8-rc5)?

Confused...

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (25.73 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-20 12:46:50

by Erhard Furtner

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

On Tue, 20 Feb 2024 16:12:44 +0700
Bagas Sanjaya <[email protected]> wrote:

> > [ 0.000000] Linux version 6.7.5-Zen3 (root@supah) (gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113, GNU ld (Gentoo 2.41 p5) 2.41.0) #1 SMP Mon Feb 19 12:44:46 -00 2024
>
> Is it vanilla kernel (i.e. no patches applied)? Can you also check current
> mainline (v6.8-rc5)?
>
> Confused...

Yes, this kernel was built from upstream git stable sources, no additional patches.

It's just that I use my own custom kernel .config that's why I attached it. But the kernel should run in qemu too.

Also the issue is reproducible on v6.8-rc5 (dmesg attached).

Additionally I tried 'modprobe -v ttm-device-test' on v6.8-rc5 with KASAN enabled instead of KFENCE, same kernel .config otherwise. With KASAN I get a different dmesg and the test completes with a failure. And I don't seem to get memory corruption afterwards:

[...]
KTAP version 1
1..1
KTAP version 1
# Subtest: ttm_device
# module: ttm_device_test
1..5
ok 1 ttm_device_init_basic
# ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
num_dev == 3 (0x3)
not ok 2 ttm_device_init_multiple
ok 3 ttm_device_fini_basic
------------[ cut here ]------------
WARNING: CPU: 5 PID: 2146 at drivers/gpu/drm/ttm/ttm_device.c:206 ttm_device_init+0x23/0x281 [ttm]
Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher amdgpu wmi_bmof amd64_edac edac_mce_amd snd_hda_codec_hdmi input_leds snd_hda_intel amdxcp snd_intel_dspcfg kvm_amd snd_hda_codec snd_hwdep snd_hda_core mfd_core snd_pcm gpu_sched snd_timer video drm_suballoc_helper snd i2c_algo_bit drm_ttm_helper gpio_amdpt soundcore ttm drm_exec button drm_display_helper rapl gpio_generic wmi drm_buddy k10temp evdev joydev lz4 lz4_compress lz4_decompress sg zram nct6775 nct6775_core hwmon_vid hwmon loop configfs hid_generic usbhid hid sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel xhci_pci libaes xhci_hcd crypto_simd ccp cryptd usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
CPU: 5 PID: 2146 Comm: kunit_try_catch Tainted: G B N 6.8.0-rc5-Zen3 #3
Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
RIP: 0010:ttm_device_init+0x23/0x281 [ttm]
Code: 31 ff e9 fa e4 d5 e6 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 48 83 ec 18 8b 44 24 50 48 89 14 24 89 44 24 0c 4d 85 c0 75 0c <0f> 0b bd ea ff ff ff e9 2f 02 00 00 48 89 fb 49 89 f7 49 89 ce 4d
RSP: 0018:ffffc9000611fcf8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888190184000 RCX: ffff888100651b18
RDX: ffff88817d4a6400 RSI: ffffffffc2033d40 RDI: ffff888106abc000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888106abc000 R14: 0000000000000000 R15: ffff888100651b18
FS: 0000000000000000(0000) GS:ffff8887de880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb67e03b20 CR3: 00000001608ac000 CR4: 0000000000b50ef0
Call Trace:
<TASK>
? __warn+0x113/0x14c
? ttm_device_init+0x23/0x281 [ttm]
? report_bug+0x1b3/0x229
? ttm_device_init+0x23/0x281 [ttm]
? handle_bug+0x3c/0x7c
? exc_invalid_op+0x17/0x46
? asm_exc_invalid_op+0x1a/0x20
? ttm_device_init+0x23/0x281 [ttm]
? local_clock_noinstr+0xc/0xa8
ttm_device_kunit_init+0xf1/0x10f [ttm_kunit_helpers]
ttm_device_init_no_vma_man+0x145/0x1e7 [ttm_device_test]
? ttm_device_init_pools+0x61e/0x61e [ttm_device_test]
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? timekeeping_get_ns+0x60/0xf8
? srso_alias_return_thunk+0x5/0xfbef5
? ktime_get_ts64+0x68/0x109
kunit_try_run_case+0x269/0x3cc [kunit]
? kunit_try_run_case_cleanup+0xc2/0xc2 [kunit]
? srso_alias_return_thunk+0x5/0xfbef5
? do_raw_spin_unlock+0x5d/0x1b6
? srso_alias_return_thunk+0x5/0xfbef5
? kunit_try_catch_throw+0x6a/0x6a [kunit]
? kunit_try_run_case_cleanup+0xc2/0xc2 [kunit]
kunit_generic_run_threadfn_adapter+0x54/0x86 [kunit]
kthread+0x25e/0x26d
? kthread_complete_and_exit+0x1f/0x1f
ret_from_fork+0x23/0x54
? kthread_complete_and_exit+0x1f/0x1f
ret_from_fork_asm+0x11/0x20
</TASK>
---[ end trace 0000000000000000 ]---
ok 4 ttm_device_init_no_vma_man
KTAP version 1
# Subtest: ttm_device_init_pools
ok 1 No DMA allocations, no DMA32 required
ok 2 DMA allocations, DMA32 required
ok 3 No DMA allocations, DMA32 required
ok 4 DMA allocations, no DMA32 required
# ttm_device_init_pools: pass:4 fail:0 skip:0 total:4
ok 5 ttm_device_init_pools
# ttm_device: pass:4 fail:1 skip:0 total:5
# Totals: pass:7 fail:1 skip:0 total:8
not ok 1 ttm_device
[...]


Regards,
Erhard


Attachments:
(No filename) (5.01 kB)
dmesg_68-rc5_zen3 (84.53 kB)
Download all attachments

2024-02-20 13:29:23

by Christian König

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

Am 20.02.24 um 10:12 schrieb Bagas Sanjaya:
> On Mon, Feb 19, 2024 at 11:01:16PM +0100, Erhard Furtner wrote:
>> Greetings!
>>
>> 'modprobe -v ttm-device-test' on my Ryzen 5950X amd64 box and on my Talos II (ppc64) leads to immediate list_add corruption.
>>
>> The machines stay useable via VNC but the issue seems to cause memory corruption which shows up later on when PAGE_POISONING is enabled:
>>
>> [...]
>> KTAP version 1
>> 1..1
>> KTAP version 1
>> # Subtest: ttm_device
>> # module: ttm_device_test
>> 1..5
>> ok 1 ttm_device_init_basic
>> # ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
>> Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
>> list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
>> num_dev == 3 (0x3)
>> not ok 2 ttm_device_init_multiple
>> list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0).
>> ------------[ cut here ]------------
>> kernel BUG at lib/list_debug.c:32!
>> invalid opcode: 0000 [#1] SMP NOPTI
>> CPU: 6 PID: 2129 Comm: kunit_try_catch Tainted: G N 6.7.5-Zen3 #1
>> Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
>> RIP: 0010:__list_add_valid_or_report+0x67/0x9c
>> Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
>> RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
>> RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
>> R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
>> FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
>> Call Trace:
>> <TASK>
>> ? __die_body+0x15/0x65
>> ? die+0x2f/0x48
>> ? do_trap+0x76/0x109
>> ? __list_add_valid_or_report+0x67/0x9c
>> ? __list_add_valid_or_report+0x67/0x9c
>> ? do_error_trap+0x69/0xa6
>> ? __list_add_valid_or_report+0x67/0x9c
>> ? exc_invalid_op+0x4d/0x71
>> ? __list_add_valid_or_report+0x67/0x9c
>> ? asm_exc_invalid_op+0x1a/0x20
>> ? __list_add_valid_or_report+0x67/0x9c
>> ? __list_add_valid_or_report+0x67/0x9c
>> ttm_device_init+0x10e/0x157 [ttm]
>> ttm_device_kunit_init+0x3d/0x51 [ttm_kunit_helpers]
>> ttm_device_fini_basic+0x6d/0x1b3 [ttm_device_test]
>> ? timekeeping_get_ns+0x19/0x3b
>> ? srso_alias_return_thunk+0x5/0xfbef5
>> ? ktime_get_ts64+0x40/0x92
>> kunit_try_run_case+0xaf/0x163 [kunit]
>> ? kunit_try_catch_throw+0x1b/0x1b [kunit]
>> ? kunit_try_catch_throw+0x1b/0x1b [kunit]
>> kunit_generic_run_threadfn_adapter+0x15/0x20 [kunit]
>> kthread+0xcf/0xd7
>> ? kthread_complete_and_exit+0x1a/0x1a
>> ret_from_fork+0x23/0x35
>> ? kthread_complete_and_exit+0x1a/0x1a
>> ret_from_fork_asm+0x11/0x20
>> </TASK>
>> Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:__list_add_valid_or_report+0x67/0x9c
>> Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
>> RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
>> RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
>> R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
>> FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
>> Key type dns_resolver registered
>> NFS: Registering the id_resolver key type
>> Key type id_resolver registered
>> Key type id_legacy registered
>> # ttm_device_fini_basic: try timed out
>> general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#2] SMP NOPTI
>> CPU: 26 PID: 2119 Comm: modprobe Tainted: G D N 6.7.5-Zen3 #1
>> Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
>> RIP: 0010:kthread_stop+0x3c/0x78
>> Code: f0 0f c1 43 28 be 02 00 00 00 85 c0 74 0c 8d 50 01 09 c2 79 0a be 01 00 00 00 e8 f5 31 37 00 48 89 df e8 35 f1 ff ff 48 89 c5 <f0> 80 08 02 48 89 df e8 6a ff ff ff f0 80 4b 02 02 48 89 df e8 f6
>> RSP: 0018:ffffb23b01fff938 EFLAGS: 00010246
>> RAX: 6b6b6b6b6b6b6b6b RBX: ffffa0b170ab6040 RCX: 0000000000000000
>> RDX: 000000006b6b6b6f RSI: 0000000000000002 RDI: 0000000000000000
>> RBP: 6b6b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b170ab6040
>> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
>> FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
>> Call Trace:
>> <TASK>
>> ? __die_body+0x15/0x65
>> ? die_addr+0x37/0x50
>> ? exc_general_protection+0x1b6/0x1ec
>> ? asm_exc_general_protection+0x26/0x30
>> ? kthread_stop+0x3c/0x78
>> ? kthread_stop+0x39/0x78
>> kunit_try_catch_run+0xc9/0x155 [kunit]
>> kunit_run_case_catch_errors+0x3f/0x93 [kunit]
>> kunit_run_tests+0x182/0x516 [kunit]
>> ? kunit_try_run_case_cleanup+0x39/0x39 [kunit]
>> ? kunit_catch_run_case_cleanup+0x85/0x85 [kunit]
>> __kunit_test_suites_init+0x64/0x83 [kunit]
>> kunit_module_notify+0xda/0x177 [kunit]
>> notifier_call_chain+0x5a/0x92
>> blocking_notifier_call_chain+0x3e/0x60
>> do_init_module+0xcb/0x218
>> init_module_from_file+0x7a/0x99
>> __do_sys_finit_module+0x162/0x223
>> do_syscall_64+0x6e/0xd8
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>> RIP: 0033:0x7f9321f7a479
>> Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 87 89 0c 00 f7 d8 64 89 01 48
>> RSP: 002b:00007ffe2e350908 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>> RAX: ffffffffffffffda RBX: 00005590b57cef40 RCX: 00007f9321f7a479
>> RDX: 0000000000000000 RSI: 00005590b5100c7c RDI: 0000000000000007
>> RBP: 0000000000000000 R08: 00007f9322043b20 R09: 0000000000000000
>> R10: 0000000000000050 R11: 0000000000000246 R12: 0000000000040000
>> R13: 00005590b5100c7c R14: 00005590b57cefe0 R15: 0000000000000000
>> </TASK>
>> Modules linked in: nfsv4 dns_resolver nfs lockd grace ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:__list_add_valid_or_report+0x67/0x9c
>> Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
>> RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
>> RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
>> R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
>> FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
>> =============================================================================
>> BUG task_struct (Tainted: G D N): Poison overwritten
>> -----------------------------------------------------------------------------
>>
>> 0xffffa0b170ab6068-0xffffa0b170ab6068 @offset=24680. First byte 0x6c instead of 0x6b
>> Slab 0xffffea8944c2ac00 objects=8 used=8 fp=0x0000000000000000 flags=0x4000000000000840(slab|head|zone=1)
>> Object 0xffffa0b170ab6040 @offset=24640 fp=0x0000000000000000
>>
>> Redzone ffffa0b170ab6000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone ffffa0b170ab6010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone ffffa0b170ab6020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone ffffa0b170ab6030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Object ffffa0b170ab6040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object ffffa0b170ab6050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object ffffa0b170ab6060: 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b kkkkkkkklkkkkkkk
>> Object ffffa0b170ab6070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> [...]
>> Object ffffa0b170ab6fb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object ffffa0b170ab6fc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
>> Redzone ffffa0b170ab6fd0: bb bb bb bb bb bb bb bb ........
>> Padding ffffa0b170ab6fe0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> Padding ffffa0b170ab6ff0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> CPU: 13 PID: 2 Comm: kthreadd Tainted: G D N 6.7.5-Zen3 #1
>> Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
>> Call Trace:
>> <TASK>
>> dump_stack_lvl+0x37/0x52
>> check_bytes_and_report+0xa7/0x107
>> check_object+0x157/0x253
>> alloc_debug_processing+0x5d/0x111
>> ___slab_alloc+0x288/0x561
>> ? copy_process+0x35f/0x2276
>> ? kthread_is_per_cpu+0x22/0x22
>> ret_from_fork+0x23/0x35
>> ? kthread_is_per_cpu+0x22/0x22
>> ret_from_fork_asm+0x11/0x20
>> </TASK>
>> FIX task_struct: Restoring Poison 0xffffa0b170ab6068-0xffffa0b170ab6068=0x6b
>> FIX task_struct: Marking all objects used
>>
>>
>> The Talos II ppc64 trace looks a bit different:
>>
>> [...]
>> KTAP version 1
>> 1..1
>> KTAP version 1
>> # Subtest: ttm_pool
>> # module: ttm_pool_test
>> 1..8
>> KTAP version 1
>> # Subtest: ttm_pool_alloc_basic
>> ok 1 One page
>> ok 2 More than one page
>> ok 3 Above the allocation limit
>> # ttm_pool_alloc_basic: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_pool_test.c:162
>> Expected err == 0, but
>> err == -12 (0xfffffffffffffff4)
>> not ok 4 One page, with coherent DMA mappings enabled
>> list_add corruption. prev->next should be next (c00800000cf64fc0), but was 0000000000000000. (prev=c0002000061a4ad0).
>> ------------[ cut here ]------------
>> kernel BUG at lib/list_debug.c:32!
>> Oops: Exception in kernel mode, sig: 5 [#1]
>> BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
>> Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
>> CPU: 29 PID: 934 Comm: kunit_try_catch Tainted: G TN 6.7.5-gentoo-P9 #1
>> Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
>> NIP: c000000000864744 LR: c000000000864740 CTR: 0000000000000000
>> REGS: c000200015333a30 TRAP: 0700 Tainted: G TN (6.7.5-gentoo-P9)
>> MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000222 XER: 00000000
>> CFAR: c0000000001d5620 IRQMASK: 0
>> GPR00: 0000000000000000 c000200015333cd0 c0000000011b4700 0000000000000075
>> GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR12: 0000000000000000 c0002007fa4d5e00 c000000000182548 c0002000066aa1c0
>> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR24: 0000000000000000 c0002000061a4010 c00800000cf64fc0 c0002000061a4020
>> GPR28: c0002000061a4ad0 c00800000cf64fa8 c00800000cf64fa0 c0002000061a4010
>> NIP [c000000000864744] __list_add_valid_or_report+0xd4/0x120
>> LR [c000000000864740] __list_add_valid_or_report+0xd0/0x120
>> Call Trace:
>> [c000200015333cd0] [c000000000864740] __list_add_valid_or_report+0xd0/0x120 (unreliable)
>> [c000200015333d30] [c00800000cf5eed8] ttm_pool_type_init+0xa0/0x120 [ttm]
>> [c000200015333d80] [c00800000cf5efec] ttm_pool_init+0x94/0x170 [ttm]
>> [c000200015333de0] [c00800000cc6b324] ttm_pool_alloc_basic+0x9c/0x670 [ttm_pool_test]
>> [c000200015333ea0] [c00800000bddf7f0] kunit_try_run_case+0xb8/0x220 [kunit]
>> [c000200015333f60] [c00800000bde27c8] kunit_generic_run_threadfn_adapter+0x30/0x50 [kunit]
>> [c000200015333f90] [c000000000182670] kthread+0x130/0x140
>> [c000200015333fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
>> Code: f8010070 4b970ea9 60000000 0fe00000 7c0802a6 3c62fff1 7d064378 7d244b78 38639600 f8010070 4b970e85 60000000 <0fe00000> 7c0802a6 3c62fff1 7ca62b78
>> ---[ end trace 0000000000000000 ]---
>>
>> note: kunit_try_catch[934] exited with irqs disabled
>> # ttm_pool_alloc_basic: try timed out
>> BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6b6b
>> Faulting instruction address: 0xc000000000181ae4
>> Oops: Kernel access of bad area, sig: 11 [#2]
>> BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
>> Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
>> CPU: 17 PID: 921 Comm: modprobe Tainted: G D TN 6.7.5-gentoo-P9 #1
>> Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
>> NIP: c000000000181ae4 LR: c00800000bde2a54 CTR: c000000000181a80
>> REGS: c0002000153871b0 TRAP: 0380 Tainted: G D TN (6.7.5-gentoo-P9)
>> MSR: 900000000280b032 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 44422282 XER: 00000000
>> CFAR: c00800000bde53ec IRQMASK: 0
>> GPR00: c00800000bde2a54 c000200015387450 c0000000011b4700 c0000000b1e34d00
>> GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR08: 0000000000000000 0000000000000000 000000006b6b6b6c c00800000bde53d8
>> GPR12: c000000000181a80 c0002007fa4dd600 0000000020000000 0000000020000000
>> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR20: 0000000000000002 0000000020000000 c0000000023d78f8 c0000000023d78a8
>> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR28: c0002000153876c0 6b6b6b6b6b6b6b6b c0000000b1e34d00 c0000000b1e34eb8
>> NIP [c000000000181ae4] kthread_stop+0x64/0x1c0
>> LR [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
>> Call Trace:
>> [c000200015387450] [c0000000001d5934] vprintk+0x84/0xc0 (unreliable)
>> [c000200015387490] [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
>> [c000200015387540] [c00800000bde4f14] kunit_run_case_catch_errors+0x60/0xf0 [kunit]
>> [c0002000153875a0] [c00800000bddf448] kunit_run_tests+0x560/0x680 [kunit]
>> [c0002000153878d0] [c00800000bddf614] __kunit_test_suites_init+0xac/0x160 [kunit]
>> [c000200015387970] [c00800000bde349c] kunit_exec_run_tests+0x44/0xb0 [kunit]
>> [c0002000153879f0] [c00800000bddecbc] kunit_module_notify+0x4d4/0x590 [kunit]
>> [c000200015387a90] [c0000000001842f0] notifier_call_chain+0xa0/0x190
>> [c000200015387b30] [c00000000018480c] blocking_notifier_call_chain+0x5c/0xb0
>> [c000200015387b70] [c00000000020cf64] do_init_module+0x234/0x330
>> [c000200015387bf0] [c00000000021054c] init_module_from_file+0x9c/0xf0
>> [c000200015387cc0] [c000000000210740] sys_finit_module+0x190/0x420
>> [c000200015387d80] [c00000000002b808] system_call_exception+0x1b8/0x3a0
>> [c000200015387e50] [c00000000000c270] system_call_vectored_common+0xf0/0x280
>> --- interrupt: 3000 at 0x3fff9eb3d7c8
>> NIP: 00003fff9eb3d7c8 LR: 0000000000000000 CTR: 0000000000000000
>> REGS: c000200015387e80 TRAP: 3000 Tainted: G D TN (6.7.5-gentoo-P9)
>> MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 48422244 XER: 00000000
>> IRQMASK: 0
>> GPR00: 0000000000000161 00003fffc80d3ab0 00003fff9ec37100 0000000000000007
>> GPR04: 0000000134f6df90 0000000000000000 000000000000001f 0000000000000045
>> GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> GPR12: 0000000000000000 00003fff9ef7fbe0 0000000020000000 0000000020000000
>> GPR16: 0000000000000000 0000000000000000 0000000000000020 0000000020000000
>> GPR20: 0000000161994850 0000000020000000 0000000000000000 0000000000000000
>> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000161993f90
>> GPR28: 0000000134f6df90 0000000000040000 0000000000000000 0000000161993cc0
>> NIP [00003fff9eb3d7c8] 0x3fff9eb3d7c8
>> LR [0000000000000000] 0x0
>> --- interrupt: 3000
>> Code: 40c2fff4 2c090000 41820164 39490001 7d494b78 2c090000 418000f4 813e01a8 6d290020 79295fe2 0b090000 ebbe0738 <7d20e8a8> 61290002 7d20e9ad 40c2fff4
>> ---[ end trace 0000000000000000 ]---
>>
>> note: modprobe[921] exited with irqs disabled
>> =============================================================================
>> BUG task_struct (Tainted: G D TN): Poison overwritten
>> -----------------------------------------------------------------------------
>>
>> 0xc0000000b1e34ebb-0xc0000000b1e34ebb @offset=20155. First byte 0x6c instead of 0x6b
>> Slab 0xc00c000002c78c00 objects=5 used=4 fp=0xc0000000b1e33380 flags=0x7ffc0000000840(slab|head|node=0|zone=0|lastcpupid=0x1fff)
>> Object 0xc0000000b1e34d00 @offset=19712 fp=0xc0000000b1e33380
>>
>> Redzone c0000000b1e34c80: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone c0000000b1e34c90: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone c0000000b1e34ca0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone c0000000b1e34cb0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone c0000000b1e34cc0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone c0000000b1e34cd0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone c0000000b1e34ce0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Redzone c0000000b1e34cf0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
>> Object c0000000b1e34d00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34d90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34da0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34db0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34dc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34dd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34de0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34df0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34e90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34ea0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object c0000000b1e34eb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b kkkkkkkkkkklkkkk
>> Object c0000000b1e34ec0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> [...]
>> Object c0000000b1e35cf0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Redzone c0000000b1e36580: bb bb bb bb bb bb bb bb ........
>> Padding c0000000b1e36590: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> Padding c0000000b1e365a0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> Padding c0000000b1e365b0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> Padding c0000000b1e365c0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> Padding c0000000b1e365d0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> Padding c0000000b1e365e0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> Padding c0000000b1e365f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
>> CPU: 28 PID: 2 Comm: kthreadd Tainted: G D TN 6.7.5-gentoo-P9 #1
>> Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
>> Call Trace:
>> [c00000000593b890] [c000000000e8ecf8] dump_stack_lvl+0x6c/0xb0 (unreliable)
>> [c00000000593b8c0] [c00000000041dad0] print_trailer+0x1e0/0x22c
>> [c00000000593b940] [c0000000004155f4] check_bytes_and_report+0x224/0x240
>> [c00000000593b9f0] [c00000000041596c] check_object+0x35c/0x4a0
>> [c00000000593ba40] [c0000000004168dc] alloc_debug_processing+0xdc/0x270
>> [c00000000593bac0] [c000000000416c8c] get_partial_node.part.0+0x21c/0x460
>> [c00000000593bb80] [c000000000417148] ___slab_alloc+0x278/0xb20
>> [c00000000593bc90] [c000000000417b3c] kmem_cache_alloc_node+0x14c/0x630
>> [c00000000593bd20] [c000000000140618] copy_process+0x408/0x3270
>> [c00000000593be00] [c0000000001435f4] kernel_clone+0xc4/0x5b0
>> [c00000000593be80] [c000000000143dc4] kernel_thread+0x84/0xc0
>> [c00000000593bf40] [c0000000001829bc] kthreadd+0x1ec/0x290
>> [c00000000593bfe0] [c00000000000d030] start_kernel_thread+0x14/0x18
>> FIX task_struct: Restoring Poison 0xc0000000b1e34ebb-0xc0000000b1e34ebb=0x6b
>> FIX task_struct: Marking all objects used
>>
>>
>> Full dmesg and kernel .config of both machines attached.
>>
>> Regards,
>> Erhard
>> [ 0.000000] Linux version 6.7.5-Zen3 (root@supah) (gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113, GNU ld (Gentoo 2.41 p5) 2.41.0) #1 SMP Mon Feb 19 12:44:46 -00 2024
> Is it vanilla kernel (i.e. no patches applied)? Can you also check current
> mainline (v6.8-rc5)?
>
> Confused...

Oh, that is most likely kind of expected behavior.

This kunit test is not meant to be run on real hardware, but rather just
as stand a long kunit tests within user mode linux. I was assuming that
it doesn't even compiles on bare metal.

We should probably either double check the kconfig options to prevent
compiling it or modify the test so that it can run on real hardware as well.

Regards,
Christian.

2024-02-20 13:52:05

by Christian König

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

Hi Erhard,

Am 20.02.24 um 13:45 schrieb Erhard Furtner:
> On Tue, 20 Feb 2024 16:12:44 +0700
> Bagas Sanjaya <[email protected]> wrote:
>
>>> [ 0.000000] Linux version 6.7.5-Zen3 (root@supah) (gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113, GNU ld (Gentoo 2.41 p5) 2.41.0) #1 SMP Mon Feb 19 12:44:46 -00 2024
>> Is it vanilla kernel (i.e. no patches applied)? Can you also check current
>> mainline (v6.8-rc5)?
>>
>> Confused...
> Yes, this kernel was built from upstream git stable sources, no additional patches.
>
> It's just that I use my own custom kernel .config that's why I attached it. But the kernel should run in qemu too.

Yeah and that's probably the problem. The test is not supposed to be
compiled and executed on bare metal, but rather just as unit test
through user mode Linux.

We probably don't check that correctly in the kconfig for some reason.
Can you provide your .config file?

Thanks,
Christian.

>
> Also the issue is reproducible on v6.8-rc5 (dmesg attached).
>
> Additionally I tried 'modprobe -v ttm-device-test' on v6.8-rc5 with KASAN enabled instead of KFENCE, same kernel .config otherwise. With KASAN I get a different dmesg and the test completes with a failure. And I don't seem to get memory corruption afterwards:
>
> [...]
> KTAP version 1
> 1..1
> KTAP version 1
> # Subtest: ttm_device
> # module: ttm_device_test
> 1..5
> ok 1 ttm_device_init_basic
> # ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
> Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
> list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
> num_dev == 3 (0x3)
> not ok 2 ttm_device_init_multiple
> ok 3 ttm_device_fini_basic
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 2146 at drivers/gpu/drm/ttm/ttm_device.c:206 ttm_device_init+0x23/0x281 [ttm]
> Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher amdgpu wmi_bmof amd64_edac edac_mce_amd snd_hda_codec_hdmi input_leds snd_hda_intel amdxcp snd_intel_dspcfg kvm_amd snd_hda_codec snd_hwdep snd_hda_core mfd_core snd_pcm gpu_sched snd_timer video drm_suballoc_helper snd i2c_algo_bit drm_ttm_helper gpio_amdpt soundcore ttm drm_exec button drm_display_helper rapl gpio_generic wmi drm_buddy k10temp evdev joydev lz4 lz4_compress lz4_decompress sg zram nct6775 nct6775_core hwmon_vid hwmon loop configfs hid_generic usbhid hid sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel xhci_pci libaes xhci_hcd crypto_simd ccp cryptd usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
> CPU: 5 PID: 2146 Comm: kunit_try_catch Tainted: G B N 6.8.0-rc5-Zen3 #3
> Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> RIP: 0010:ttm_device_init+0x23/0x281 [ttm]
> Code: 31 ff e9 fa e4 d5 e6 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 53 48 83 ec 18 8b 44 24 50 48 89 14 24 89 44 24 0c 4d 85 c0 75 0c <0f> 0b bd ea ff ff ff e9 2f 02 00 00 48 89 fb 49 89 f7 49 89 ce 4d
> RSP: 0018:ffffc9000611fcf8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff888190184000 RCX: ffff888100651b18
> RDX: ffff88817d4a6400 RSI: ffffffffc2033d40 RDI: ffff888106abc000
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff888106abc000 R14: 0000000000000000 R15: ffff888100651b18
> FS: 0000000000000000(0000) GS:ffff8887de880000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007feb67e03b20 CR3: 00000001608ac000 CR4: 0000000000b50ef0
> Call Trace:
> <TASK>
> ? __warn+0x113/0x14c
> ? ttm_device_init+0x23/0x281 [ttm]
> ? report_bug+0x1b3/0x229
> ? ttm_device_init+0x23/0x281 [ttm]
> ? handle_bug+0x3c/0x7c
> ? exc_invalid_op+0x17/0x46
> ? asm_exc_invalid_op+0x1a/0x20
> ? ttm_device_init+0x23/0x281 [ttm]
> ? local_clock_noinstr+0xc/0xa8
> ttm_device_kunit_init+0xf1/0x10f [ttm_kunit_helpers]
> ttm_device_init_no_vma_man+0x145/0x1e7 [ttm_device_test]
> ? ttm_device_init_pools+0x61e/0x61e [ttm_device_test]
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? timekeeping_get_ns+0x60/0xf8
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? ktime_get_ts64+0x68/0x109
> kunit_try_run_case+0x269/0x3cc [kunit]
> ? kunit_try_run_case_cleanup+0xc2/0xc2 [kunit]
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? do_raw_spin_unlock+0x5d/0x1b6
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? kunit_try_catch_throw+0x6a/0x6a [kunit]
> ? kunit_try_run_case_cleanup+0xc2/0xc2 [kunit]
> kunit_generic_run_threadfn_adapter+0x54/0x86 [kunit]
> kthread+0x25e/0x26d
> ? kthread_complete_and_exit+0x1f/0x1f
> ret_from_fork+0x23/0x54
> ? kthread_complete_and_exit+0x1f/0x1f
> ret_from_fork_asm+0x11/0x20
> </TASK>
> ---[ end trace 0000000000000000 ]---
> ok 4 ttm_device_init_no_vma_man
> KTAP version 1
> # Subtest: ttm_device_init_pools
> ok 1 No DMA allocations, no DMA32 required
> ok 2 DMA allocations, DMA32 required
> ok 3 No DMA allocations, DMA32 required
> ok 4 DMA allocations, no DMA32 required
> # ttm_device_init_pools: pass:4 fail:0 skip:0 total:4
> ok 5 ttm_device_init_pools
> # ttm_device: pass:4 fail:1 skip:0 total:5
> # Totals: pass:7 fail:1 skip:0 total:8
> not ok 1 ttm_device
> [...]
>
>
> Regards,
> Erhard


2024-02-20 14:57:18

by Maxime Ripard

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

On Tue, Feb 20, 2024 at 02:28:53PM +0100, Christian K?nig wrote:
> Am 20.02.24 um 10:12 schrieb Bagas Sanjaya:
> > On Mon, Feb 19, 2024 at 11:01:16PM +0100, Erhard Furtner wrote:
> > > Greetings!
> > >
> > > 'modprobe -v ttm-device-test' on my Ryzen 5950X amd64 box and on my Talos II (ppc64) leads to immediate list_add corruption.
> > >
> > > The machines stay useable via VNC but the issue seems to cause memory corruption which shows up later on when PAGE_POISONING is enabled:
> > >
> > > [...]
> > > KTAP version 1
> > > 1..1
> > > KTAP version 1
> > > # Subtest: ttm_device
> > > # module: ttm_device_test
> > > 1..5
> > > ok 1 ttm_device_init_basic
> > > # ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
> > > Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
> > > list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
> > > num_dev == 3 (0x3)
> > > not ok 2 ttm_device_init_multiple
> > > list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0).
> > > ------------[ cut here ]------------
> > > kernel BUG at lib/list_debug.c:32!
> > > invalid opcode: 0000 [#1] SMP NOPTI
> > > CPU: 6 PID: 2129 Comm: kunit_try_catch Tainted: G N 6.7.5-Zen3 #1
> > > Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> > > RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> > > Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> > > RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> > > RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> > > R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> > > FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
> > > Call Trace:
> > > <TASK>
> > > ? __die_body+0x15/0x65
> > > ? die+0x2f/0x48
> > > ? do_trap+0x76/0x109
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? do_error_trap+0x69/0xa6
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? exc_invalid_op+0x4d/0x71
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? asm_exc_invalid_op+0x1a/0x20
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ttm_device_init+0x10e/0x157 [ttm]
> > > ttm_device_kunit_init+0x3d/0x51 [ttm_kunit_helpers]
> > > ttm_device_fini_basic+0x6d/0x1b3 [ttm_device_test]
> > > ? timekeeping_get_ns+0x19/0x3b
> > > ? srso_alias_return_thunk+0x5/0xfbef5
> > > ? ktime_get_ts64+0x40/0x92
> > > kunit_try_run_case+0xaf/0x163 [kunit]
> > > ? kunit_try_catch_throw+0x1b/0x1b [kunit]
> > > ? kunit_try_catch_throw+0x1b/0x1b [kunit]
> > > kunit_generic_run_threadfn_adapter+0x15/0x20 [kunit]
> > > kthread+0xcf/0xd7
> > > ? kthread_complete_and_exit+0x1a/0x1a
> > > ret_from_fork+0x23/0x35
> > > ? kthread_complete_and_exit+0x1a/0x1a
> > > ret_from_fork_asm+0x11/0x20
> > > </TASK>
> > > Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
> > > ---[ end trace 0000000000000000 ]---
> > > RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> > > Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> > > RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> > > RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> > > R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> > > FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
> > > Key type dns_resolver registered
> > > NFS: Registering the id_resolver key type
> > > Key type id_resolver registered
> > > Key type id_legacy registered
> > > # ttm_device_fini_basic: try timed out
> > > general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#2] SMP NOPTI
> > > CPU: 26 PID: 2119 Comm: modprobe Tainted: G D N 6.7.5-Zen3 #1
> > > Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> > > RIP: 0010:kthread_stop+0x3c/0x78
> > > Code: f0 0f c1 43 28 be 02 00 00 00 85 c0 74 0c 8d 50 01 09 c2 79 0a be 01 00 00 00 e8 f5 31 37 00 48 89 df e8 35 f1 ff ff 48 89 c5 <f0> 80 08 02 48 89 df e8 6a ff ff ff f0 80 4b 02 02 48 89 df e8 f6
> > > RSP: 0018:ffffb23b01fff938 EFLAGS: 00010246
> > > RAX: 6b6b6b6b6b6b6b6b RBX: ffffa0b170ab6040 RCX: 0000000000000000
> > > RDX: 000000006b6b6b6f RSI: 0000000000000002 RDI: 0000000000000000
> > > RBP: 6b6b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b170ab6040
> > > R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> > > FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
> > > Call Trace:
> > > <TASK>
> > > ? __die_body+0x15/0x65
> > > ? die_addr+0x37/0x50
> > > ? exc_general_protection+0x1b6/0x1ec
> > > ? asm_exc_general_protection+0x26/0x30
> > > ? kthread_stop+0x3c/0x78
> > > ? kthread_stop+0x39/0x78
> > > kunit_try_catch_run+0xc9/0x155 [kunit]
> > > kunit_run_case_catch_errors+0x3f/0x93 [kunit]
> > > kunit_run_tests+0x182/0x516 [kunit]
> > > ? kunit_try_run_case_cleanup+0x39/0x39 [kunit]
> > > ? kunit_catch_run_case_cleanup+0x85/0x85 [kunit]
> > > __kunit_test_suites_init+0x64/0x83 [kunit]
> > > kunit_module_notify+0xda/0x177 [kunit]
> > > notifier_call_chain+0x5a/0x92
> > > blocking_notifier_call_chain+0x3e/0x60
> > > do_init_module+0xcb/0x218
> > > init_module_from_file+0x7a/0x99
> > > __do_sys_finit_module+0x162/0x223
> > > do_syscall_64+0x6e/0xd8
> > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > RIP: 0033:0x7f9321f7a479
> > > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 87 89 0c 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007ffe2e350908 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > > RAX: ffffffffffffffda RBX: 00005590b57cef40 RCX: 00007f9321f7a479
> > > RDX: 0000000000000000 RSI: 00005590b5100c7c RDI: 0000000000000007
> > > RBP: 0000000000000000 R08: 00007f9322043b20 R09: 0000000000000000
> > > R10: 0000000000000050 R11: 0000000000000246 R12: 0000000000040000
> > > R13: 00005590b5100c7c R14: 00005590b57cefe0 R15: 0000000000000000
> > > </TASK>
> > > Modules linked in: nfsv4 dns_resolver nfs lockd grace ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
> > > ---[ end trace 0000000000000000 ]---
> > > RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> > > Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> > > RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> > > RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> > > R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> > > FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
> > > =============================================================================
> > > BUG task_struct (Tainted: G D N): Poison overwritten
> > > -----------------------------------------------------------------------------
> > >
> > > 0xffffa0b170ab6068-0xffffa0b170ab6068 @offset=24680. First byte 0x6c instead of 0x6b
> > > Slab 0xffffea8944c2ac00 objects=8 used=8 fp=0x0000000000000000 flags=0x4000000000000840(slab|head|zone=1)
> > > Object 0xffffa0b170ab6040 @offset=24640 fp=0x0000000000000000
> > >
> > > Redzone ffffa0b170ab6000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone ffffa0b170ab6010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone ffffa0b170ab6020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone ffffa0b170ab6030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Object ffffa0b170ab6040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object ffffa0b170ab6050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object ffffa0b170ab6060: 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b kkkkkkkklkkkkkkk
> > > Object ffffa0b170ab6070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > [...]
> > > Object ffffa0b170ab6fb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object ffffa0b170ab6fc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
> > > Redzone ffffa0b170ab6fd0: bb bb bb bb bb bb bb bb ........
> > > Padding ffffa0b170ab6fe0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding ffffa0b170ab6ff0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > CPU: 13 PID: 2 Comm: kthreadd Tainted: G D N 6.7.5-Zen3 #1
> > > Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl+0x37/0x52
> > > check_bytes_and_report+0xa7/0x107
> > > check_object+0x157/0x253
> > > alloc_debug_processing+0x5d/0x111
> > > ___slab_alloc+0x288/0x561
> > > ? copy_process+0x35f/0x2276
> > > ? kthread_is_per_cpu+0x22/0x22
> > > ret_from_fork+0x23/0x35
> > > ? kthread_is_per_cpu+0x22/0x22
> > > ret_from_fork_asm+0x11/0x20
> > > </TASK>
> > > FIX task_struct: Restoring Poison 0xffffa0b170ab6068-0xffffa0b170ab6068=0x6b
> > > FIX task_struct: Marking all objects used
> > >
> > >
> > > The Talos II ppc64 trace looks a bit different:
> > >
> > > [...]
> > > KTAP version 1
> > > 1..1
> > > KTAP version 1
> > > # Subtest: ttm_pool
> > > # module: ttm_pool_test
> > > 1..8
> > > KTAP version 1
> > > # Subtest: ttm_pool_alloc_basic
> > > ok 1 One page
> > > ok 2 More than one page
> > > ok 3 Above the allocation limit
> > > # ttm_pool_alloc_basic: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_pool_test.c:162
> > > Expected err == 0, but
> > > err == -12 (0xfffffffffffffff4)
> > > not ok 4 One page, with coherent DMA mappings enabled
> > > list_add corruption. prev->next should be next (c00800000cf64fc0), but was 0000000000000000. (prev=c0002000061a4ad0).
> > > ------------[ cut here ]------------
> > > kernel BUG at lib/list_debug.c:32!
> > > Oops: Exception in kernel mode, sig: 5 [#1]
> > > BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
> > > Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
> > > CPU: 29 PID: 934 Comm: kunit_try_catch Tainted: G TN 6.7.5-gentoo-P9 #1
> > > Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> > > NIP: c000000000864744 LR: c000000000864740 CTR: 0000000000000000
> > > REGS: c000200015333a30 TRAP: 0700 Tainted: G TN (6.7.5-gentoo-P9)
> > > MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000222 XER: 00000000
> > > CFAR: c0000000001d5620 IRQMASK: 0
> > > GPR00: 0000000000000000 c000200015333cd0 c0000000011b4700 0000000000000075
> > > GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR12: 0000000000000000 c0002007fa4d5e00 c000000000182548 c0002000066aa1c0
> > > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR24: 0000000000000000 c0002000061a4010 c00800000cf64fc0 c0002000061a4020
> > > GPR28: c0002000061a4ad0 c00800000cf64fa8 c00800000cf64fa0 c0002000061a4010
> > > NIP [c000000000864744] __list_add_valid_or_report+0xd4/0x120
> > > LR [c000000000864740] __list_add_valid_or_report+0xd0/0x120
> > > Call Trace:
> > > [c000200015333cd0] [c000000000864740] __list_add_valid_or_report+0xd0/0x120 (unreliable)
> > > [c000200015333d30] [c00800000cf5eed8] ttm_pool_type_init+0xa0/0x120 [ttm]
> > > [c000200015333d80] [c00800000cf5efec] ttm_pool_init+0x94/0x170 [ttm]
> > > [c000200015333de0] [c00800000cc6b324] ttm_pool_alloc_basic+0x9c/0x670 [ttm_pool_test]
> > > [c000200015333ea0] [c00800000bddf7f0] kunit_try_run_case+0xb8/0x220 [kunit]
> > > [c000200015333f60] [c00800000bde27c8] kunit_generic_run_threadfn_adapter+0x30/0x50 [kunit]
> > > [c000200015333f90] [c000000000182670] kthread+0x130/0x140
> > > [c000200015333fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> > > Code: f8010070 4b970ea9 60000000 0fe00000 7c0802a6 3c62fff1 7d064378 7d244b78 38639600 f8010070 4b970e85 60000000 <0fe00000> 7c0802a6 3c62fff1 7ca62b78
> > > ---[ end trace 0000000000000000 ]---
> > >
> > > note: kunit_try_catch[934] exited with irqs disabled
> > > # ttm_pool_alloc_basic: try timed out
> > > BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6b6b
> > > Faulting instruction address: 0xc000000000181ae4
> > > Oops: Kernel access of bad area, sig: 11 [#2]
> > > BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
> > > Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
> > > CPU: 17 PID: 921 Comm: modprobe Tainted: G D TN 6.7.5-gentoo-P9 #1
> > > Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> > > NIP: c000000000181ae4 LR: c00800000bde2a54 CTR: c000000000181a80
> > > REGS: c0002000153871b0 TRAP: 0380 Tainted: G D TN (6.7.5-gentoo-P9)
> > > MSR: 900000000280b032 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 44422282 XER: 00000000
> > > CFAR: c00800000bde53ec IRQMASK: 0
> > > GPR00: c00800000bde2a54 c000200015387450 c0000000011b4700 c0000000b1e34d00
> > > GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR08: 0000000000000000 0000000000000000 000000006b6b6b6c c00800000bde53d8
> > > GPR12: c000000000181a80 c0002007fa4dd600 0000000020000000 0000000020000000
> > > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR20: 0000000000000002 0000000020000000 c0000000023d78f8 c0000000023d78a8
> > > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR28: c0002000153876c0 6b6b6b6b6b6b6b6b c0000000b1e34d00 c0000000b1e34eb8
> > > NIP [c000000000181ae4] kthread_stop+0x64/0x1c0
> > > LR [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
> > > Call Trace:
> > > [c000200015387450] [c0000000001d5934] vprintk+0x84/0xc0 (unreliable)
> > > [c000200015387490] [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
> > > [c000200015387540] [c00800000bde4f14] kunit_run_case_catch_errors+0x60/0xf0 [kunit]
> > > [c0002000153875a0] [c00800000bddf448] kunit_run_tests+0x560/0x680 [kunit]
> > > [c0002000153878d0] [c00800000bddf614] __kunit_test_suites_init+0xac/0x160 [kunit]
> > > [c000200015387970] [c00800000bde349c] kunit_exec_run_tests+0x44/0xb0 [kunit]
> > > [c0002000153879f0] [c00800000bddecbc] kunit_module_notify+0x4d4/0x590 [kunit]
> > > [c000200015387a90] [c0000000001842f0] notifier_call_chain+0xa0/0x190
> > > [c000200015387b30] [c00000000018480c] blocking_notifier_call_chain+0x5c/0xb0
> > > [c000200015387b70] [c00000000020cf64] do_init_module+0x234/0x330
> > > [c000200015387bf0] [c00000000021054c] init_module_from_file+0x9c/0xf0
> > > [c000200015387cc0] [c000000000210740] sys_finit_module+0x190/0x420
> > > [c000200015387d80] [c00000000002b808] system_call_exception+0x1b8/0x3a0
> > > [c000200015387e50] [c00000000000c270] system_call_vectored_common+0xf0/0x280
> > > --- interrupt: 3000 at 0x3fff9eb3d7c8
> > > NIP: 00003fff9eb3d7c8 LR: 0000000000000000 CTR: 0000000000000000
> > > REGS: c000200015387e80 TRAP: 3000 Tainted: G D TN (6.7.5-gentoo-P9)
> > > MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 48422244 XER: 00000000
> > > IRQMASK: 0
> > > GPR00: 0000000000000161 00003fffc80d3ab0 00003fff9ec37100 0000000000000007
> > > GPR04: 0000000134f6df90 0000000000000000 000000000000001f 0000000000000045
> > > GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR12: 0000000000000000 00003fff9ef7fbe0 0000000020000000 0000000020000000
> > > GPR16: 0000000000000000 0000000000000000 0000000000000020 0000000020000000
> > > GPR20: 0000000161994850 0000000020000000 0000000000000000 0000000000000000
> > > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000161993f90
> > > GPR28: 0000000134f6df90 0000000000040000 0000000000000000 0000000161993cc0
> > > NIP [00003fff9eb3d7c8] 0x3fff9eb3d7c8
> > > LR [0000000000000000] 0x0
> > > --- interrupt: 3000
> > > Code: 40c2fff4 2c090000 41820164 39490001 7d494b78 2c090000 418000f4 813e01a8 6d290020 79295fe2 0b090000 ebbe0738 <7d20e8a8> 61290002 7d20e9ad 40c2fff4
> > > ---[ end trace 0000000000000000 ]---
> > >
> > > note: modprobe[921] exited with irqs disabled
> > > =============================================================================
> > > BUG task_struct (Tainted: G D TN): Poison overwritten
> > > -----------------------------------------------------------------------------
> > >
> > > 0xc0000000b1e34ebb-0xc0000000b1e34ebb @offset=20155. First byte 0x6c instead of 0x6b
> > > Slab 0xc00c000002c78c00 objects=5 used=4 fp=0xc0000000b1e33380 flags=0x7ffc0000000840(slab|head|node=0|zone=0|lastcpupid=0x1fff)
> > > Object 0xc0000000b1e34d00 @offset=19712 fp=0xc0000000b1e33380
> > >
> > > Redzone c0000000b1e34c80: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34c90: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34ca0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34cb0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34cc0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34cd0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34ce0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34cf0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Object c0000000b1e34d00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34da0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34db0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34dc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34dd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34de0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34df0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34ea0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34eb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b kkkkkkkkkkklkkkk
> > > Object c0000000b1e34ec0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > [...]
> > > Object c0000000b1e35cf0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Redzone c0000000b1e36580: bb bb bb bb bb bb bb bb ........
> > > Padding c0000000b1e36590: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365a0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365b0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365c0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365d0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365e0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > CPU: 28 PID: 2 Comm: kthreadd Tainted: G D TN 6.7.5-gentoo-P9 #1
> > > Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> > > Call Trace:
> > > [c00000000593b890] [c000000000e8ecf8] dump_stack_lvl+0x6c/0xb0 (unreliable)
> > > [c00000000593b8c0] [c00000000041dad0] print_trailer+0x1e0/0x22c
> > > [c00000000593b940] [c0000000004155f4] check_bytes_and_report+0x224/0x240
> > > [c00000000593b9f0] [c00000000041596c] check_object+0x35c/0x4a0
> > > [c00000000593ba40] [c0000000004168dc] alloc_debug_processing+0xdc/0x270
> > > [c00000000593bac0] [c000000000416c8c] get_partial_node.part.0+0x21c/0x460
> > > [c00000000593bb80] [c000000000417148] ___slab_alloc+0x278/0xb20
> > > [c00000000593bc90] [c000000000417b3c] kmem_cache_alloc_node+0x14c/0x630
> > > [c00000000593bd20] [c000000000140618] copy_process+0x408/0x3270
> > > [c00000000593be00] [c0000000001435f4] kernel_clone+0xc4/0x5b0
> > > [c00000000593be80] [c000000000143dc4] kernel_thread+0x84/0xc0
> > > [c00000000593bf40] [c0000000001829bc] kthreadd+0x1ec/0x290
> > > [c00000000593bfe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> > > FIX task_struct: Restoring Poison 0xc0000000b1e34ebb-0xc0000000b1e34ebb=0x6b
> > > FIX task_struct: Marking all objects used
> > >
> > >
> > > Full dmesg and kernel .config of both machines attached.
> > >
> > > Regards,
> > > Erhard
> > > [ 0.000000] Linux version 6.7.5-Zen3 (root@supah) (gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113, GNU ld (Gentoo 2.41 p5) 2.41.0) #1 SMP Mon Feb 19 12:44:46 -00 2024
> > Is it vanilla kernel (i.e. no patches applied)? Can you also check current
> > mainline (v6.8-rc5)?
> >
> > Confused...
>
> Oh, that is most likely kind of expected behavior.
>
> This kunit test is not meant to be run on real hardware, but rather just as
> stand a long kunit tests within user mode linux. I was assuming that it
> doesn't even compiles on bare metal.
>
> We should probably either double check the kconfig options to prevent
> compiling it or modify the test so that it can run on real hardware as well.

I think any cross-compiled kunit run will be impossible to differentiate
from running on real hardware. We should just make it work there.

Maxime


Attachments:
(No filename) (27.86 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-20 17:43:50

by Erhard Furtner

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

On Tue, 20 Feb 2024 14:50:04 +0100
Christian König <[email protected]> wrote:

> Yeah and that's probably the problem. The test is not supposed to be
> compiled and executed on bare metal, but rather just as unit test
> through user mode Linux.
>
> We probably don't check that correctly in the kconfig for some reason.
> Can you provide your .config file?
>

Here's my v6.8-rc5 .config attached.

Regards,
Erhard


Attachments:
(No filename) (442.00 B)
config_68-rc5_zen3+ (144.20 kB)
Download all attachments

2024-02-21 14:12:06

by Christian König

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

Am 20.02.24 um 18:43 schrieb Erhard Furtner:
> On Tue, 20 Feb 2024 14:50:04 +0100
> Christian König <[email protected]> wrote:
>
>> Yeah and that's probably the problem. The test is not supposed to be
>> compiled and executed on bare metal, but rather just as unit test
>> through user mode Linux.
>>
>> We probably don't check that correctly in the kconfig for some reason.
>> Can you provide your .config file?
>>
> Here's my v6.8-rc5 .config attached.

Thanks for that.

As long as nobody comes up with an approach how to run the test even
when other drivers want to interact with TTM the attached patch is my
best idea.

It basically disabled compiling the TTM tests as long as neither
compiling for UML or COMPILE_TEST are set.

Opinions?

Thanks,
Christian.

>
> Regards,
> Erhard


Attachments:
0001-drm-ttm-tests-depend-on-UML-COMPILE_TEST.patch (1.57 kB)

2024-02-21 15:09:46

by Alex Deucher

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

On Wed, Feb 21, 2024 at 9:13 AM Christian König
<[email protected]> wrote:
>
> Am 20.02.24 um 18:43 schrieb Erhard Furtner:
> > On Tue, 20 Feb 2024 14:50:04 +0100
> > Christian König <[email protected]> wrote:
> >
> >> Yeah and that's probably the problem. The test is not supposed to be
> >> compiled and executed on bare metal, but rather just as unit test
> >> through user mode Linux.
> >>
> >> We probably don't check that correctly in the kconfig for some reason.
> >> Can you provide your .config file?
> >>
> > Here's my v6.8-rc5 .config attached.
>
> Thanks for that.
>
> As long as nobody comes up with an approach how to run the test even
> when other drivers want to interact with TTM the attached patch is my
> best idea.
>
> It basically disabled compiling the TTM tests as long as neither
> compiling for UML or COMPILE_TEST are set.
>
> Opinions?

Makes sense to me.

Acked-by: Alex Deucher <[email protected]>

>
> Thanks,
> Christian.
>
> >
> > Regards,
> > Erhard

2024-02-21 15:11:49

by Maxime Ripard

[permalink] [raw]
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0) (kernel 6.7.5)

Hi Christian,

On Tue, Feb 20, 2024 at 04:03:57PM +0100, Christian K?nig wrote:
> Am 20.02.24 um 15:56 schrieb Maxime Ripard:
> > On Tue, Feb 20, 2024 at 02:28:53PM +0100, Christian K?nig wrote:
> > > [SNIP]
> > > This kunit test is not meant to be run on real hardware, but rather just as
> > > stand a long kunit tests within user mode linux. I was assuming that it
> > > doesn't even compiles on bare metal.
> > >
> > > We should probably either double check the kconfig options to prevent
> > > compiling it or modify the test so that it can run on real hardware as well.
> > I think any cross-compiled kunit run will be impossible to differentiate
> > from running on real hardware. We should just make it work there.
>
> The problem is what the unit test basically does is registering and
> destroying a dummy device to see if initializing and tear down of the global
> pools work correctly.
>
> If you run on real hardware and have a real device

I assume you mean a real DRM device backed by TTM here, right?

> additionally to the dummy device the reference count of the global
> pool never goes down to zero and so it is never torn down.
>
> So running this test just doesn't make any sense in that environment.
> Any idea how to work around that?

I've added David, Brendan and Rae in Cc.

To sum up the problem, your tests are relying on the mock device created
to run a kunit test to be the sole DRM device in the system. But if you
compile a kernel with the kunit tests enabled and boot that on a real
hardware, then that assumption might not be true anymore and things
break apart. Is that a fair description?

If so, maybe we could detect if it's running under qemu or UML (if
that's something we can do in the first place), and then extend
kunit_attributes to only run that test if it's in a simulated
environment.

Maxime


Attachments:
(No filename) (1.84 kB)
signature.asc (235.00 B)
Download all attachments