2023-03-09 16:11:32

by Naresh Kamboju

[permalink] [raw]
Subject: selftests: gpio: crash on arm64

Following kernel warnings and crash notices on arm64 Rpi4 device while
running selftests: gpio on Linux mainline 6.3.0-rc1 kernel and Linux next.

Reported-by: Linux Kernel Functional Testing <[email protected]>

Please refer to test log links for detailed test plan and kernel crash logs.
It is reproducible on arm64 juno-r2, Rpi4 and Qualcomm dragonboard 410c
and qemu-arm64.

Test log:
-----------
kselftest: Running tests in gpio
TAP version 13
1..2
# selftests: gpio: gpio-mockup.sh
# 1. Module load tests
[ 61.176149] =============================================================================
[ 61.176802]
[ 61.176807] ======================================================
[ 61.176809] WARNING: possible circular locking dependency detected
[ 61.176811] 6.3.0-rc1-next-20230307 #1 Not tainted
[ 61.176814] ------------------------------------------------------
[ 61.176816] modprobe/510 is trying to acquire lock:
[ 61.176818] ffff80000b2284e8 (console_owner){..-.}-{0:0}, at:
console_flush_all (kernel/printk/printk.c:2879
kernel/printk/printk.c:2942)
[ 61.176846]
[ 61.176846] but task is already holding lock:
[ 61.176848] ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at:
get_partial_node.part.0 (mm/slub.c:2271)
[ 61.176861]
[ 61.176861] which lock already depends on the new lock.
[ 61.176861]
[ 61.176863]
[ 61.176863] the existing dependency chain (in reverse order) is:
[ 61.176864]
[ 61.176864] -> #2 (&n->list_lock){-.-.}-{2:2}:
[ 61.176871] lock_acquire (kernel/locking/lockdep.c:5673)
[ 61.176879] _raw_spin_lock_irqsave
(include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 61.176885] get_partial_node.part.0 (mm/slub.c:2271)
[ 61.176890] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188)
[ 61.176894] __slab_alloc.constprop.0 (mm/slub.c:3292)
[ 61.176899] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442
mm/slub.c:3491)
[ 61.176903] __kmalloc (mm/slab_common.c:968 mm/slab_common.c:980)
[ 61.176908] tty_buffer_alloc (drivers/tty/tty_buffer.c:182)
[ 61.176914] __tty_buffer_request_room (drivers/tty/tty_buffer.c:279)
[ 61.176919] __tty_insert_flip_char (drivers/tty/tty_buffer.c:398)
[ 61.176924] uart_insert_char (drivers/tty/serial/serial_core.c:3341)
[ 61.176929] pl011_fifo_to_tty.isra.0 (drivers/tty/serial/amba-pl011.c:314)
[ 61.176934] pl011_int (include/linux/spinlock.h:390
drivers/tty/serial/amba-pl011.c:1396
drivers/tty/serial/amba-pl011.c:1571)
[ 61.176937] __handle_irq_event_percpu (kernel/irq/handle.c:158)
[ 61.176941] handle_irq_event (kernel/irq/handle.c:193
kernel/irq/handle.c:210)
[ 61.176944] handle_fasteoi_irq (kernel/irq/chip.c:716)
[ 61.176950] generic_handle_domain_irq (kernel/irq/irqdesc.c:652
kernel/irq/irqdesc.c:707)
[ 61.176953] gic_handle_irq (arch/arm64/include/asm/io.h:75
include/asm-generic/io.h:335 drivers/irqchip/irq-gic.c:344)
[ 61.176958] call_on_irq_stack (arch/arm64/kernel/entry.S:905)
[ 61.176962] do_interrupt_handler (arch/arm64/kernel/entry-common.c:274)
[ 61.176968] el1_interrupt (arch/arm64/kernel/entry-common.c:472
arch/arm64/kernel/entry-common.c:486)
[ 61.176971] el1h_64_irq_handler (arch/arm64/kernel/entry-common.c:492)
[ 61.176975] el1h_64_irq (arch/arm64/kernel/entry.S:587)
[ 61.176978] __kmem_cache_alloc_node (mm/slub.c:3490)
[ 61.176983] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4))
[ 61.176986] inet6_dump_fib (net/ipv6/ip6_fib.c:657)
[ 61.176991] rtnl_dump_all (net/core/rtnetlink.c:3964)
[ 61.176997] netlink_dump (net/netlink/af_netlink.c:2296)
[ 61.177004] netlink_recvmsg (net/netlink/af_netlink.c:2024)
[ 61.177009] ____sys_recvmsg (net/socket.c:1015 net/socket.c:1036
net/socket.c:2723)
[ 61.177014] ___sys_recvmsg (net/socket.c:2765)
[ 61.177019] __sys_recvmsg (include/linux/file.h:31 net/socket.c:2797)
[ 61.177025] __arm64_sys_recvmsg (net/socket.c:2802)
[ 61.177030] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:57)
[ 61.177037] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149)
[ 61.177043] do_el0_svc (arch/arm64/kernel/syscall.c:194)
[ 61.177049] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:142
arch/arm64/kernel/entry-common.c:638)
[ 61.177052] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656)
[ 61.177055] el0t_64_sync (arch/arm64/kernel/entry.S:591)
[ 61.177058]
[ 61.177058] -> #1 (&port_lock_key){-.-.}-{2:2}:
[ 61.177065] lock_acquire (kernel/locking/lockdep.c:5673)
[ 61.177071] _raw_spin_lock_irqsave
(include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162)
[ 61.177074] serial8250_console_write
(drivers/tty/serial/8250/8250_port.c:3394)
[ 61.177082] univ8250_console_write (drivers/tty/serial/8250/8250_core.c:585)
[ 61.177087] console_flush_all (kernel/printk/printk.c:2888
kernel/printk/printk.c:2942)
[ 61.177093] console_unlock.part.0 (kernel/printk/printk.c:3017)
[ 61.177098] vprintk_emit (kernel/printk/printk.c:2317)
[ 61.177104] vprintk_default (kernel/printk/printk.c:2328)
[ 61.177110] vprintk (kernel/printk/printk_safe.c:50)
[ 61.177116] _printk (kernel/printk/printk.c:2341)
[ 61.177121] register_console (kernel/printk/printk.c:3468)
[ 61.177126] uart_add_one_port
(drivers/tty/serial/serial_core.c:2579
drivers/tty/serial/serial_core.c:3100)
[ 61.177130] serial8250_register_8250_port
(drivers/tty/serial/8250/8250_core.c:1093)
[ 61.177135] bcm2835aux_serial_probe
(drivers/tty/serial/8250/8250_bcm2835aux.c:184)
[ 61.177141] platform_probe (drivers/base/platform.c:1405)
[ 61.177148] really_probe (drivers/base/dd.c:552 drivers/base/dd.c:631)
[ 61.177152] __driver_probe_device (drivers/base/dd.c:768)
[ 61.177157] driver_probe_device (drivers/base/dd.c:798)
[ 61.177161] __driver_attach (drivers/base/dd.c:1185)
[ 61.177166] bus_for_each_dev (drivers/base/bus.c:368)
[ 61.177170] driver_attach (drivers/base/dd.c:1202)
[ 61.177173] bus_add_driver (drivers/base/bus.c:673)
[ 61.177177] driver_register (drivers/base/driver.c:246)
[ 61.177182] __platform_driver_register (drivers/base/platform.c:868)
[ 61.177188] bcm2835aux_serial_driver_init
(drivers/tty/serial/8250/8250_bcm2835aux.c:233)
[ 61.177195] do_one_initcall (init/main.c:1306)
[ 61.177199] kernel_init_freeable (init/main.c:1378 init/main.c:1395
init/main.c:1414 init/main.c:1634)
[ 61.177207] kernel_init (init/main.c:1524)
[ 61.177212] ret_from_fork (arch/arm64/kernel/entry.S:871)
[ 61.177216]
[ 61.177216] -> #0 (console_owner){..-.}-{0:0}:
[ 61.177222] __lock_acquire (kernel/locking/lockdep.c:3099
kernel/locking/lockdep.c:3217 kernel/locking/lockdep.c:3832
kernel/locking/lockdep.c:5056)
[ 61.177228] lock_acquire.part.0 (arch/arm64/include/asm/percpu.h:40
kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5671)
[ 61.177233] lock_acquire (kernel/locking/lockdep.c:5673)
[ 61.177238] console_flush_all (kernel/printk/printk.c:2883
kernel/printk/printk.c:2942)
[ 61.177244] console_unlock.part.0 (kernel/printk/printk.c:3017)
[ 61.177250] vprintk_emit (kernel/printk/printk.c:2317)
[ 61.177255] vprintk_default (kernel/printk/printk.c:2328)
[ 61.177261] vprintk (kernel/printk/printk_safe.c:50)
[ 61.177267] _printk (kernel/printk/printk.c:2341)
[ 61.177271] slab_bug (mm/slub.c:892)
[ 61.177274] check_bytes_and_report (mm/slub.c:1054)
[ 61.177279] check_object (mm/slub.c:1196 (discriminator 2))
[ 61.177283] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425)
[ 61.177287] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279)
[ 61.177291] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188)
[ 61.177295] __slab_alloc.constprop.0 (mm/slub.c:3292)
[ 61.177300] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442
mm/slub.c:3491)
[ 61.177304] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4))
[ 61.177308] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486)
[ 61.177311] platform_device_add (drivers/base/platform.c:717)
[ 61.177317] platform_device_register_full (drivers/base/platform.c:844)
[ 61.177323] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup
[ 61.177342] gpio_mockup_init+0xf0/0xd40 gpio_mockup
[ 61.177352] do_one_initcall (init/main.c:1306)
[ 61.177356] do_init_module (kernel/module/main.c:2457)
[ 61.177363] load_module (kernel/module/main.c:2859)
[ 61.177369] __do_sys_finit_module (kernel/module/main.c:2961)
[ 61.177375] __arm64_sys_finit_module (kernel/module/main.c:2928)
[ 61.177381] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:57)
[ 61.177387] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149)
[ 61.177393] do_el0_svc (arch/arm64/kernel/syscall.c:194)
[ 61.177398] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:142
arch/arm64/kernel/entry-common.c:638)
[ 61.177402] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656)
[ 61.177405] el0t_64_sync (arch/arm64/kernel/entry.S:591)
[ 61.177408]
[ 61.177408] other info that might help us debug this:
[ 61.177408]
[ 61.177410] Chain exists of:
[ 61.177410] console_owner --> &port_lock_key --> &n->list_lock
[ 61.177410]
[ 61.177417] Possible unsafe locking scenario:
[ 61.177417]
[ 61.177418] CPU0 CPU1
[ 61.177419] ---- ----
[ 61.177420] lock(&n->list_lock);
[ 61.177423] lock(&port_lock_key);
[ 61.177426] lock(&n->list_lock);
[ 61.177429] lock(console_owner);
[ 61.177432]
[ 61.177432] *** DEADLOCK ***
[ 61.177432]
[ 61.177434] 3 locks held by modprobe/510:
[ 61.177436] #0: ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at:
get_partial_node.part.0 (mm/slub.c:2271)
[ 61.177448] #1: ffff80000b227f18 (console_lock){+.+.}-{0:0}, at:
vprintk_emit (kernel/printk/printk.c:1936 kernel/printk/printk.c:2315)
[ 61.177460] #2: ffff80000b228388 (console_srcu){....}-{0:0}, at:
console_flush_all (include/linux/srcu.h:200 kernel/printk/printk.c:290
kernel/printk/printk.c:2934)
[ 61.177471]
[ 61.177471] stack backtrace:
[ 61.177474] CPU: 3 PID: 510 Comm: modprobe Not tainted
6.3.0-rc1-next-20230307 #1
[ 61.177479] Hardware name: Raspberry Pi 4 Model B (DT)
[ 61.177482] Call trace:
[ 61.177483] dump_backtrace (arch/arm64/kernel/stacktrace.c:160)
[ 61.177487] show_stack (arch/arm64/kernel/stacktrace.c:167)
[ 61.177490] dump_stack_lvl (lib/dump_stack.c:107)
[ 61.177498] dump_stack (lib/dump_stack.c:114)
[ 61.177504] print_circular_bug (kernel/locking/lockdep.c:2057)
[ 61.177509] check_noncircular (kernel/locking/lockdep.c:2181)
[ 61.177514] __lock_acquire (kernel/locking/lockdep.c:3099
kernel/locking/lockdep.c:3217 kernel/locking/lockdep.c:3832
kernel/locking/lockdep.c:5056)
[ 61.177520] lock_acquire.part.0 (arch/arm64/include/asm/percpu.h:40
kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5671)
[ 61.177525] lock_acquire (kernel/locking/lockdep.c:5673)
[ 61.177530] console_flush_all (kernel/printk/printk.c:2883
kernel/printk/printk.c:2942)
[ 61.177536] console_unlock.part.0 (kernel/printk/printk.c:3017)
[ 61.177542] vprintk_emit (kernel/printk/printk.c:2317)
[ 61.177547] vprintk_default (kernel/printk/printk.c:2328)
[ 61.177553] vprintk (kernel/printk/printk_safe.c:50)
[ 61.177559] _printk (kernel/printk/printk.c:2341)
[ 61.177564] slab_bug (mm/slub.c:892)
[ 61.177567] check_bytes_and_report (mm/slub.c:1054)
[ 61.177571] check_object (mm/slub.c:1196 (discriminator 2))
[ 61.177575] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425)
[ 61.177579] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279)
[ 61.177583] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188)
[ 61.177587] __slab_alloc.constprop.0 (mm/slub.c:3292)
[ 61.177592] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442
mm/slub.c:3491)
[ 61.177596] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4))
[ 61.177600] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486)
[ 61.177603] platform_device_add (drivers/base/platform.c:717)
[ 61.177609] platform_device_register_full (drivers/base/platform.c:844)
[ 61.177615] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup
[ 61.177625] gpio_mockup_init+0xf0/0xd40 gpio_mockup
[ 61.177634] do_one_initcall (init/main.c:1306)
[ 61.177638] do_init_module (kernel/module/main.c:2457)
[ 61.177644] load_module (kernel/module/main.c:2859)
[ 61.177650] __do_sys_finit_module (kernel/module/main.c:2961)
[ 61.177656] __arm64_sys_finit_module (kernel/module/main.c:2928)
[ 61.177662] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:57)
[ 61.177668] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149)
[ 61.177674] do_el0_svc (arch/arm64/kernel/syscall.c:194)
[ 61.177680] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:142
arch/arm64/kernel/entry-common.c:638)
[ 61.177683] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656)
[ 61.177686] el0t_64_sync (arch/arm64/kernel/entry.S:591)
[ 62.011685] BUG kmalloc-512 (Not tainted): Poison overwritten
[ 62.017513] -----------------------------------------------------------------------------
[ 62.017513]
[ 62.027300] 0xffff00004ecb7a38-0xffff00004ecb7a47 @offset=31288.
First byte 0x6a instead of 0x6b
[ 62.036210] Allocated in swnode_register+0x40/0x218 age=808 cpu=3 pid=386
[ 62.043101] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442
mm/slub.c:3491)
[ 62.047784] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4))
[ 62.051406] swnode_register (drivers/base/swnode.c:776)
[ 62.055293] fwnode_create_software_node (drivers/base/swnode.c:934
(discriminator 4))
[ 62.060238] gpio_mockup_register_chip+0x1c4/0x2b8 gpio_mockup
[ 62.066337] gpio_mockup_init+0xf0/0xd40 gpio_mockup
[ 62.071551] do_one_initcall (init/main.c:1306)
[ 62.075437] do_init_module (kernel/module/main.c:2457)
[ 62.079238] load_module (kernel/module/main.c:2859)
[ 62.083037] __do_sys_finit_module (kernel/module/main.c:2961)
[ 62.087455] __arm64_sys_finit_module (kernel/module/main.c:2928)
[ 62.092048] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:57)
[ 62.095848] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149)
[ 62.100793] do_el0_svc (arch/arm64/kernel/syscall.c:194)
[ 62.104151] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:142
arch/arm64/kernel/entry-common.c:638)
[ 62.107244] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656)
[ 62.111570] Freed in software_node_release+0xdc/0x108 age=632 cpu=0 pid=428
[ 62.118633] __kmem_cache_free (mm/slub.c:3732 mm/slub.c:3788 mm/slub.c:3800)
[ 62.122784] kfree (mm/slab_common.c:1020)
[ 62.125788] software_node_release (drivers/base/swnode.c:761)
[ 62.130204] kobject_put (lib/kobject.c:685 lib/kobject.c:712
include/linux/kref.h:65 lib/kobject.c:729)
[ 62.133739] software_node_notify_remove (drivers/base/swnode.c:1093)
[ 62.138597] device_del (drivers/base/core.c:2265 drivers/base/core.c:3778)
[ 62.142134] platform_device_del.part.0 (drivers/base/platform.c:753)
[ 62.146903] platform_device_unregister (drivers/base/platform.c:551
drivers/base/platform.c:794)
[ 62.151672] gpio_mockup_exit+0x54/0x280 gpio_mockup
[ 62.156888] __arm64_sys_delete_module (kernel/module/main.c:756
kernel/module/main.c:698 kernel/module/main.c:698)
[ 62.161745] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:57)
[ 62.165545] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149)
[ 62.170490] do_el0_svc (arch/arm64/kernel/syscall.c:194)
[ 62.173850] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:142
arch/arm64/kernel/entry-common.c:638)
[ 62.176941] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656)
[ 62.181267] el0t_64_sync (arch/arm64/kernel/entry.S:591)
[ 62.184975] Slab 0xfffffc00013b2c00 objects=21 used=7
fp=0xffff00004ecb7400
flags=0x7fffc0000010200(slab|head|node=0|zone=1|lastcpupid=0xffff)
[ 62.197943] Object 0xffff00004ecb7a00 @offset=31232 fp=0xffff00004ecb7400
[ 62.197943]
[ 62.206325] Redzone ffff00004ecb7800:

...

[ 63.089597] CPU: 3 PID: 510 Comm: modprobe Not tainted
6.3.0-rc1-next-20230307 #1
[ 63.097186] Hardware name: Raspberry Pi 4 Model B (DT)
[ 63.102392] Call trace:
[ 63.104865] dump_backtrace (arch/arm64/kernel/stacktrace.c:160)
[ 63.108665] show_stack (arch/arm64/kernel/stacktrace.c:167)
[ 63.112021] dump_stack_lvl (lib/dump_stack.c:107)
[ 63.115734] dump_stack (lib/dump_stack.c:114)
[ 63.119093] print_trailer (mm/slub.c:953)
[ 63.122892] check_bytes_and_report (mm/slub.c:1058)
[ 63.127395] check_object (mm/slub.c:1196 (discriminator 2))
[ 63.131104] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425)
[ 63.135606] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279)
[ 63.140286] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188)
[ 63.144084] __slab_alloc.constprop.0 (mm/slub.c:3292)
[ 63.148674] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442
mm/slub.c:3491)
[ 63.153354] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4))
[ 63.156974] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486)
[ 63.160508] platform_device_add (drivers/base/platform.c:717)
[ 63.164837] platform_device_register_full (drivers/base/platform.c:844)
[ 63.169959] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup
[ 63.176057] gpio_mockup_init+0xf0/0xd40 gpio_mockup
[ 63.181269] do_one_initcall (init/main.c:1306)
[ 63.185155] do_init_module (kernel/module/main.c:2457)
[ 63.188956] load_module (kernel/module/main.c:2859)
[ 63.192755] __do_sys_finit_module (kernel/module/main.c:2961)
[ 63.197171] __arm64_sys_finit_module (kernel/module/main.c:2928)
[ 63.201765] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:57)
[ 63.205565] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149)
[ 63.210510] do_el0_svc (arch/arm64/kernel/syscall.c:194)
[ 63.213869] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:142
arch/arm64/kernel/entry-common.c:638)
[ 63.216961] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656)
[ 63.221287] el0t_64_sync (arch/arm64/kernel/entry.S:591)
[ 63.224998] FIX kmalloc-512: Restoring Poison
0xffff00004ecb7a38-0xffff00004ecb7a47=0x6b
[ 63.233202] FIX kmalloc-512: Marking all objects used
[ 63.399213] =============================================================================

links to the crash:
- https://lkft.validation.linaro.org/scheduler/job/6224830#L1291
- https://lkft.validation.linaro.org/scheduler/job/6224742#L1202
- https://lkft.validation.linaro.org/scheduler/job/6224784#L3415
- https://lkft.validation.linaro.org/scheduler/job/6224810#L2029

metadata:
git_ref: master
git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
git_sha: 709c6adf19dc558e44ab5c01659b09a16a2d3c82
git_describe: next-20230307
kernel_version: 6.3.0-rc1
kernel-config:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2MfXESbRAbSUj9oic6d8dK2d7Q8/config
build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/798095907
artifact-location:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2MfXESbRAbSUj9oic6d8dK2d7Q8
toolchain: gcc-11



--
Linaro LKFT
https://lkft.linaro.org


2023-03-09 17:50:48

by Andy Shevchenko

[permalink] [raw]
Subject: Re: selftests: gpio: crash on arm64

On Thu, Mar 09, 2023 at 09:33:29PM +0530, Naresh Kamboju wrote:
> Following kernel warnings and crash notices on arm64 Rpi4 device while
> running selftests: gpio on Linux mainline 6.3.0-rc1 kernel and Linux next.
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>

...

> [ 61.177432] *** DEADLOCK ***
> [ 61.177432]
> [ 61.177434] 3 locks held by modprobe/510:
> [ 61.177436] #0: ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at:
> get_partial_node.part.0 (mm/slub.c:2271)
> [ 61.177448] #1: ffff80000b227f18 (console_lock){+.+.}-{0:0}, at:
> vprintk_emit (kernel/printk/printk.c:1936 kernel/printk/printk.c:2315)
> [ 61.177460] #2: ffff80000b228388 (console_srcu){....}-{0:0}, at:
> console_flush_all (include/linux/srcu.h:200 kernel/printk/printk.c:290
> kernel/printk/printk.c:2934)

How is it related to the GPIO?
Can you bisect if it's a regression somewhere?

--
With Best Regards,
Andy Shevchenko



2023-04-10 09:21:46

by Naresh Kamboju

[permalink] [raw]
Subject: Re: selftests: gpio: crash on arm64

On Thu, 9 Mar 2023 at 23:20, Andy Shevchenko
<[email protected]> wrote:
>
> On Thu, Mar 09, 2023 at 09:33:29PM +0530, Naresh Kamboju wrote:
> > Following kernel warnings and crash notices on arm64 Rpi4 device while
> > running selftests: gpio on Linux mainline 6.3.0-rc1 kernel and Linux next.
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
>
> ...
>
> > [ 61.177432] *** DEADLOCK ***
> > [ 61.177432]
> > [ 61.177434] 3 locks held by modprobe/510:
> > [ 61.177436] #0: ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at:
> > get_partial_node.part.0 (mm/slub.c:2271)
> > [ 61.177448] #1: ffff80000b227f18 (console_lock){+.+.}-{0:0}, at:
> > vprintk_emit (kernel/printk/printk.c:1936 kernel/printk/printk.c:2315)
> > [ 61.177460] #2: ffff80000b228388 (console_srcu){....}-{0:0}, at:
> > console_flush_all (include/linux/srcu.h:200 kernel/printk/printk.c:290
> > kernel/printk/printk.c:2934)
>
> How is it related to the GPIO?
> Can you bisect if it's a regression somewhere?

The following crash on Linux mainline and next was noticed while running
selftest gpio on arm64. And this is a kselftest-merge config build.
Which means it has a large set of test configs enabled.

Anders performed bisection on this problem.
The bisection have been poing to this commit log,
first bad commit: [24c94060fc9b4e0f19e6e018869db46db21d6bc7]
gpiolib: ensure that fwnode is properly set


Reported-by: Linux Kernel Functional Testing <[email protected]>
Reported-by: Anders Roxell <[email protected]>

Crash log:
# 2. Module load error tests
# 2.1 gpio overflow
[ 88.796755] =============================================================================
[ 88.806704] BUG kmalloc-512 (Not tainted): Poison overwritten
[ 88.812482] -----------------------------------------------------------------------------
[ 88.812482]
[ 88.822170] 0xe38-0xe47 @offset=28216. First byte 0x6a instead of 0x6b
[ 88.830994] Allocated in swnode_register+0x40/0x218 age=828 cpu=2 pid=565
[ 88.837830] __kmem_cache_alloc_node+0x11c/0x320
[ 88.842480] kmalloc_trace+0x54/0xa8
[ 88.846083] swnode_register+0x40/0x218
[ 88.849947] fwnode_create_software_node+0xcc/0x1a0
[ 88.854856] 0xffff8000018c71cc
[ 88.858029] 0xffff8000018c73b0
[ 88.861197] do_one_initcall+0x80/0x320
[ 88.865062] do_init_module+0x50/0x210
[ 88.868843] load_module+0x201c/0x2260
[ 88.872622] __do_sys_finit_module+0xb0/0x138
[ 88.877010] __arm64_sys_finit_module+0x2c/0x48
[ 88.881572] invoke_syscall+0x8c/0x120
[ 88.885352] el0_svc_common.constprop.0+0x104/0x130
[ 88.890263] do_el0_svc+0x44/0xb8
[ 88.893606] el0_svc+0x48/0xb8
[ 88.896686] el0t_64_sync_handler+0xbc/0x138
[ 88.900984] Freed in software_node_release+0xdc/0x108 age=34 cpu=1 pid=683
[ 88.907899] __kmem_cache_free+0x2a4/0x2e0
[ 88.912024] kfree+0xc0/0x1a0
[ 88.915015] software_node_release+0xdc/0x108
[ 88.919402] kobject_put+0xb0/0x220
[ 88.922919] software_node_notify_remove+0x98/0xe8
[ 88.927741] device_del+0x184/0x380
[ 88.931259] platform_device_del.part.0+0x24/0xa8
[ 88.935995] platform_device_unregister+0x30/0x50
[ 88.940730] 0xffff8000018a1dd4
[ 88.943898] __arm64_sys_delete_module+0x184/0x328
[ 88.948722] invoke_syscall+0x8c/0x120
[ 88.952502] el0_svc_common.constprop.0+0x104/0x130
[ 88.957413] do_el0_svc+0x44/0xb8
[ 88.960757] el0_svc+0x48/0xb8
[ 88.963834] el0t_64_sync_handler+0xbc/0x138
[ 88.968132] el0t_64_sync+0x190/0x198
[ 88.971820] Slab 0xfffffc00209d0800 objects=21 used=7
fp=0xffff000827423200
flags=0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff)
[ 88.984653] Object 0xe00 @offset=28160 fp=0xffff000827423200
[ 88.984653]
[ 88.992953] Redzone <Trim>
[ 89.002470] Redzone <Trim>
[ 89.011986] Redzone <Trim>
...
[ 89.278435] Redzone <Trim>
[ 89.287951] Redzone <Trim>
[ 89.297468] Object <Trim>
[ 89.306984] Object <Trim>
[ 89.316500] Object <Trim>
...
[ 89.592465] Object
[ 89.601981] Redzone
[ 89.610801] Padding
[ 89.620317] Padding


Details test and crash log:
- https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc6/testrun/16155019/suite/log-parser-test/test/check-kernel-exception/log
- https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc6/testrun/16155019/suite/log-parser-test/tests/
- https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc6/testrun/16155157/suite/log-parser-test/tests/

metadata:
git_ref: master
git_repo: https://gitlab.com/Linaro/lkft/mirrors/torvalds/linux-mainline
git_sha: 09a9639e56c01c7a00d6c0ca63f4c7c41abe075d
git_describe: v6.3-rc6
kernel_version: 6.3.0-rc6
kernel-config:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2OCXLSEmqxrg13KKFGzypngoE3K/config
build-url: https://gitlab.com/Linaro/lkft/mirrors/torvalds/linux-mainline/-/pipelines/832195018
artifact-location:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2OCXLSEmqxrg13KKFGzypngoE3K/
toolchain: gcc-11
email-notification: ''
build_name: gcc-11-lkftconfig-kselftest



--
Linaro LKFT
https://lkft.linaro.org

>
> --
> With Best Regards,
> Andy Shevchenko
>
>

2023-04-11 09:05:47

by Linus Walleij

[permalink] [raw]
Subject: Re: selftests: gpio: crash on arm64

On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju
<[email protected]> wrote:
(...)
> Anders performed bisection on this problem.
> The bisection have been poing to this commit log,
> first bad commit: [24c94060fc9b4e0f19e6e018869db46db21d6bc7]
> gpiolib: ensure that fwnode is properly set

I don't think this is the real issue.

(...)
> # 2. Module load error tests
> # 2.1 gpio overflow
(...)
> [ 88.900984] Freed in software_node_release+0xdc/0x108 age=34 cpu=1 pid=683
> [ 88.907899] __kmem_cache_free+0x2a4/0x2e0
> [ 88.912024] kfree+0xc0/0x1a0
> [ 88.915015] software_node_release+0xdc/0x108
> [ 88.919402] kobject_put+0xb0/0x220
> [ 88.922919] software_node_notify_remove+0x98/0xe8
> [ 88.927741] device_del+0x184/0x380
> [ 88.931259] platform_device_del.part.0+0x24/0xa8
> [ 88.935995] platform_device_unregister+0x30/0x50

I think the refcount is wrong on the fwnode.

The chip is allocated with devm_gpiochip_add_data() which will not call
gpiochip_remove() until all references are removed by calling
devm_gpio_chip_release().

Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c
and see if the callback is even called. I think this could be the
problem: if that isn't cleaned up, there will be dangling references.

diff --git a/drivers/gpio/gpiolib-devres.c b/drivers/gpio/gpiolib-devres.c
index fe9ce6b19f15..30a0622210d7 100644
--- a/drivers/gpio/gpiolib-devres.c
+++ b/drivers/gpio/gpiolib-devres.c
@@ -394,6 +394,7 @@ static void devm_gpio_chip_release(void *data)
{
struct gpio_chip *gc = data;

+ pr_info("GPIOCHIP %s WAS REMOVED BY DEVRES\n", gc->label);
gpiochip_remove(gc);
}

If this isn't working we need to figure out what is holding a reference to
the gpiochip.

I don't know how the references to the gpiochip fwnode is supposed to
drop to zero though? I didn't work with mockup much ...

What I could think of is that maybe the mockup driver need a .shutdown()
callback to forcibly call gpiochip_remove(), and in that case it should
be wrapped in a non-existining devm_gpiochip_remove() since devres
is used to register it.

Bartosz will know better though! I am pretty sure he has this working
flawlessly so the tests must be doing something weird which is leaving
references around.

Yours,
Linus Walleij

2023-04-11 13:13:03

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: selftests: gpio: crash on arm64

On Tue, Apr 11, 2023 at 10:57 AM Linus Walleij <[email protected]> wrote:
>
> On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju
> <[email protected]> wrote:
> (...)
> > Anders performed bisection on this problem.
> > The bisection have been poing to this commit log,
> > first bad commit: [24c94060fc9b4e0f19e6e018869db46db21d6bc7]
> > gpiolib: ensure that fwnode is properly set
>
> I don't think this is the real issue.
>
> (...)
> > # 2. Module load error tests
> > # 2.1 gpio overflow
> (...)
> > [ 88.900984] Freed in software_node_release+0xdc/0x108 age=34 cpu=1 pid=683
> > [ 88.907899] __kmem_cache_free+0x2a4/0x2e0
> > [ 88.912024] kfree+0xc0/0x1a0
> > [ 88.915015] software_node_release+0xdc/0x108
> > [ 88.919402] kobject_put+0xb0/0x220
> > [ 88.922919] software_node_notify_remove+0x98/0xe8
> > [ 88.927741] device_del+0x184/0x380
> > [ 88.931259] platform_device_del.part.0+0x24/0xa8
> > [ 88.935995] platform_device_unregister+0x30/0x50
>
> I think the refcount is wrong on the fwnode.
>
> The chip is allocated with devm_gpiochip_add_data() which will not call
> gpiochip_remove() until all references are removed by calling
> devm_gpio_chip_release().
>
> Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c
> and see if the callback is even called. I think this could be the
> problem: if that isn't cleaned up, there will be dangling references.
>
> diff --git a/drivers/gpio/gpiolib-devres.c b/drivers/gpio/gpiolib-devres.c
> index fe9ce6b19f15..30a0622210d7 100644
> --- a/drivers/gpio/gpiolib-devres.c
> +++ b/drivers/gpio/gpiolib-devres.c
> @@ -394,6 +394,7 @@ static void devm_gpio_chip_release(void *data)
> {
> struct gpio_chip *gc = data;
>
> + pr_info("GPIOCHIP %s WAS REMOVED BY DEVRES\n", gc->label);
> gpiochip_remove(gc);
> }
>
> If this isn't working we need to figure out what is holding a reference to
> the gpiochip.
>
> I don't know how the references to the gpiochip fwnode is supposed to
> drop to zero though? I didn't work with mockup much ...
>
> What I could think of is that maybe the mockup driver need a .shutdown()
> callback to forcibly call gpiochip_remove(), and in that case it should
> be wrapped in a non-existining devm_gpiochip_remove() since devres
> is used to register it.
>
> Bartosz will know better though! I am pretty sure he has this working
> flawlessly so the tests must be doing something weird which is leaving
> references around.
>
> Yours,
> Linus Walleij

Interestingly I'm not seeing this neither with gpio-sim selftests nor
with any of the libgpiod tests which suggests it's the gpio-mockup
module that's doing something wrong (or very right in which case it
uncovers some otherwise hidden bug). Anyway, I'll try to spend some
time on it and figure it out, although I'd like to be done with
gpio-mockup altogether already.

Bart

2023-06-20 17:19:35

by Andy Shevchenko

[permalink] [raw]
Subject: Re: selftests: gpio: crash on arm64

On Tue, Apr 11, 2023 at 10:57:28AM +0200, Linus Walleij wrote:
> On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju
> <[email protected]> wrote:

...

> Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c
> and see if the callback is even called. I think this could be the
> problem: if that isn't cleaned up, there will be dangling references.

Side note: Since we have devres tracepoints, your patch seems an overkill :-)
Just enable devres tracepoints and filter out by the function name. I believe
that should work.

--
With Best Regards,
Andy Shevchenko



2023-11-15 15:38:17

by Naresh Kamboju

[permalink] [raw]
Subject: Re: selftests: gpio: crash on arm64

On Wed, 15 Nov 2023 at 07:12, <[email protected]> wrote:
>
> Tue, Nov 07, 2023 at 07:21:32PM +0530, Naresh Kamboju kirjoitti:
> > On Tue, 20 Jun 2023 at 22:11, Andy Shevchenko
> > <[email protected]> wrote:
> > > On Tue, Apr 11, 2023 at 10:57:28AM +0200, Linus Walleij wrote:
> > > > On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju
> > > > <[email protected]> wrote:
>
> ...
>
> > > > Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c
> > > > and see if the callback is even called. I think this could be the
> > > > problem: if that isn't cleaned up, there will be dangling references.
> > >
> > > Side note: Since we have devres tracepoints, your patch seems an overkill :-)
> > > Just enable devres tracepoints and filter out by the function name. I believe
> > > that should work.
> >
> > Since I have been tracking open issues on the stable-rc kernel,
> > The reported problem on stable-rc linux.6.3.y has been solved
> > on the stable-rc linux.6.6.y branch.
> >
> > Thanks for fixing this reported issue.
> >
> > Upstream links about this fix and discussion,
>
> I'm a bit lost. Is the [3] fixed? Is the fix the below mentioned commit?

As per my understanding on this,
The reported issue has been fixed with the below commit.

>
> > Commit daecca4b8433
> > gpiolib: Do not alter GPIO chip fwnode member
> >
> > [1] https://lore.kernel.org/linux-gpio/[email protected]/
> > [2] https://lore.kernel.org/linux-gpio/CAMRc=MfFEBSeJ78NO7XeuzAMJ0KezEPAYWsWnFXXaRyQPAf3dA@mail.gmail.com/
> > [3] https://lore.kernel.org/linux-gpio/CA+G9fYv94gx8+-JMzbmQaue3q3y6QdBmsGUCdD-26X5XavL3Ag@mail.gmail.com/
>
> --
> With Best Regards,
> Andy Shevchenko

- Naresh