2023-01-16 12:33:30

by Francesco Dolcini

[permalink] [raw]
Subject: spidev regression in 6.2-rc kernel

Hello,
we spotted a regression on spidev on latest 6.2-rc kernel.

[ 214.047619]
[ 214.049198] ============================================
[ 214.054533] WARNING: possible recursive locking detected
[ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
[ 214.065969] --------------------------------------------
[ 214.071290] spidev_test/1454 is trying to acquire lock:
[ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
[ 214.084164]
[ 214.084164] but task is already holding lock:
[ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
[ 214.097537]
[ 214.097537] other info that might help us debug this:
[ 214.104075] Possible unsafe locking scenario:
[ 214.104075]
[ 214.110004] CPU0
[ 214.112461] ----
[ 214.114916] lock(&spidev->spi_lock);
[ 214.118687] lock(&spidev->spi_lock);
[ 214.122457]
[ 214.122457] *** DEADLOCK ***
[ 214.122457]
[ 214.128386] May be due to missing lock nesting notation
[ 214.128386]
[ 214.135183] 2 locks held by spidev_test/1454:
[ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
[ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
[ 214.155493]
[ 214.155493] stack backtrace:
[ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
[ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 214.175555] unwind_backtrace from show_stack+0x10/0x14
[ 214.180819] show_stack from dump_stack_lvl+0x60/0x90
[ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858
[ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378
[ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8
[ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24
[ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
[ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c
[ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c
[ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
[ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0
[ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
[ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6


This is not running the latest rc4, but on sha 97ec4d559d93 (this is
just what our CI had available when this test was run). I was not able
to bisect it, but it seems something that you could have introduced.

The log is from an apalis-imx6, but I have the same on other ARM SOC.

Can you have a look?
Thanks

Francesco




2023-01-16 13:08:01

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: spidev regression in 6.2-rc kernel

On Mon, 16 Jan 2023 at 13:19, Bartosz Golaszewski <[email protected]> wrote:
>
> On Mon, Jan 16, 2023 at 1:06 PM Francesco Dolcini <[email protected]> wrote:
> >
> > Hello,
> > we spotted a regression on spidev on latest 6.2-rc kernel.
> >
> > [ 214.047619]
> > [ 214.049198] ============================================
> > [ 214.054533] WARNING: possible recursive locking detected
> > [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> > [ 214.065969] --------------------------------------------
> > [ 214.071290] spidev_test/1454 is trying to acquire lock:
> > [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> > [ 214.084164]
> > [ 214.084164] but task is already holding lock:
> > [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > [ 214.097537]
> > [ 214.097537] other info that might help us debug this:
> > [ 214.104075] Possible unsafe locking scenario:
> > [ 214.104075]
> > [ 214.110004] CPU0
> > [ 214.112461] ----
> > [ 214.114916] lock(&spidev->spi_lock);
> > [ 214.118687] lock(&spidev->spi_lock);
> > [ 214.122457]
> > [ 214.122457] *** DEADLOCK ***
> > [ 214.122457]
> > [ 214.128386] May be due to missing lock nesting notation
> > [ 214.128386]
> > [ 214.135183] 2 locks held by spidev_test/1454:
> > [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> > [ 214.155493]
> > [ 214.155493] stack backtrace:
> > [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> > [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > [ 214.175555] unwind_backtrace from show_stack+0x10/0x14
> > [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90
> > [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858
> > [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378
> > [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8
> > [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24
> > [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> > [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> > [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c
> > [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> > [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> > [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> > [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
> >
> >
> > This is not running the latest rc4, but on sha 97ec4d559d93 (this is
> > just what our CI had available when this test was run). I was not able
> > to bisect it, but it seems something that you could have introduced.
> >
> > The log is from an apalis-imx6, but I have the same on other ARM SOC.
> >
> > Can you have a look?
> > Thanks
> >
> > Francesco
> >
>
> Eek! Yes it's commit 1f4d2dd45b6e ("spi: spidev: fix a race condition
> when accessing spidev->spi"): spidev_ioctl() takes the lock and in
> certain instances can end up calling spidev_compat_ioc_message() which
> takes the same lock again. I'll send a fix shortly.
>
> Bart

Seems, like that's not it... Francesco: what is the output of:

./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x44/0xab8

and

./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x44/0xab8

on the spidev.o object for that build?

Bart

2023-01-16 13:22:42

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: spidev regression in 6.2-rc kernel

On Mon, Jan 16, 2023 at 1:06 PM Francesco Dolcini <[email protected]> wrote:
>
> Hello,
> we spotted a regression on spidev on latest 6.2-rc kernel.
>
> [ 214.047619]
> [ 214.049198] ============================================
> [ 214.054533] WARNING: possible recursive locking detected
> [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> [ 214.065969] --------------------------------------------
> [ 214.071290] spidev_test/1454 is trying to acquire lock:
> [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> [ 214.084164]
> [ 214.084164] but task is already holding lock:
> [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> [ 214.097537]
> [ 214.097537] other info that might help us debug this:
> [ 214.104075] Possible unsafe locking scenario:
> [ 214.104075]
> [ 214.110004] CPU0
> [ 214.112461] ----
> [ 214.114916] lock(&spidev->spi_lock);
> [ 214.118687] lock(&spidev->spi_lock);
> [ 214.122457]
> [ 214.122457] *** DEADLOCK ***
> [ 214.122457]
> [ 214.128386] May be due to missing lock nesting notation
> [ 214.128386]
> [ 214.135183] 2 locks held by spidev_test/1454:
> [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> [ 214.155493]
> [ 214.155493] stack backtrace:
> [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 214.175555] unwind_backtrace from show_stack+0x10/0x14
> [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90
> [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858
> [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378
> [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8
> [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24
> [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c
> [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
>
>
> This is not running the latest rc4, but on sha 97ec4d559d93 (this is
> just what our CI had available when this test was run). I was not able
> to bisect it, but it seems something that you could have introduced.
>
> The log is from an apalis-imx6, but I have the same on other ARM SOC.
>
> Can you have a look?
> Thanks
>
> Francesco
>

Eek! Yes it's commit 1f4d2dd45b6e ("spi: spidev: fix a race condition
when accessing spidev->spi"): spidev_ioctl() takes the lock and in
certain instances can end up calling spidev_compat_ioc_message() which
takes the same lock again. I'll send a fix shortly.

Bart

2023-01-16 13:53:47

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: spidev regression in 6.2-rc kernel

On Mon, Jan 16, 2023 at 2:23 PM Max Krummenacher
<[email protected]> wrote:
>
> Hi
>
> On Mon, 2023-01-16 at 13:47 +0100, Bartosz Golaszewski wrote:
> > On Mon, 16 Jan 2023 at 13:19, Bartosz Golaszewski <[email protected]> wrote:
> > >
> > > On Mon, Jan 16, 2023 at 1:06 PM Francesco Dolcini <[email protected]> wrote:
> > > >
> > > > Hello,
> > > > we spotted a regression on spidev on latest 6.2-rc kernel.
> > > >
> > > > [ 214.047619]
> > > > [ 214.049198] ============================================
> > > > [ 214.054533] WARNING: possible recursive locking detected
> > > > [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> > > > [ 214.065969] --------------------------------------------
> > > > [ 214.071290] spidev_test/1454 is trying to acquire lock:
> > > > [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> > > > [ 214.084164]
> > > > [ 214.084164] but task is already holding lock:
> > > > [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > > > [ 214.097537]
> > > > [ 214.097537] other info that might help us debug this:
> > > > [ 214.104075] Possible unsafe locking scenario:
> > > > [ 214.104075]
> > > > [ 214.110004] CPU0
> > > > [ 214.112461] ----
> > > > [ 214.114916] lock(&spidev->spi_lock);
> > > > [ 214.118687] lock(&spidev->spi_lock);
> > > > [ 214.122457]
> > > > [ 214.122457] *** DEADLOCK ***
> > > > [ 214.122457]
> > > > [ 214.128386] May be due to missing lock nesting notation
> > > > [ 214.128386]
> > > > [ 214.135183] 2 locks held by spidev_test/1454:
> > > > [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > > > [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> > > > [ 214.155493]
> > > > [ 214.155493] stack backtrace:
> > > > [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> > > > [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > > > [ 214.175555] unwind_backtrace from show_stack+0x10/0x14
> > > > [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90
> > > > [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858
> > > > [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378
> > > > [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8
> > > > [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24
> > > > [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> > > > [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> > > > [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c
> > > > [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> > > > [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> > > > [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> > > > [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
> > > >
> > > >
> > > > This is not running the latest rc4, but on sha 97ec4d559d93 (this is
> > > > just what our CI had available when this test was run). I was not able
> > > > to bisect it, but it seems something that you could have introduced.
> > > >
> > > > The log is from an apalis-imx6, but I have the same on other ARM SOC.
> > > >
> > > > Can you have a look?
> > > > Thanks
> > > >
> > > > Francesco
> > > >
> > >
> > > Eek! Yes it's commit 1f4d2dd45b6e ("spi: spidev: fix a race condition
> > > when accessing spidev->spi"): spidev_ioctl() takes the lock and in
> > > certain instances can end up calling spidev_compat_ioc_message() which
> > > takes the same lock again. I'll send a fix shortly.
> > >
> > > Bart
> >
> > Seems, like that's not it... Francesco: what is the output of:
> >
> > ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x44/0xab8
> >
> > and
> >
> > ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x44/0xab8
> >
> > on the spidev.o object for that build?
> >
> > Bart
> >
> I did this for Francesco.
>
> The CI build isn't configured for this:
> ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x44/0xab8
> ERROR: CONFIG_DEBUG_INFO not enabled
>
> So I recompiled the kernel and rerun the test.
> [ 146.916497] ============================================
> [ 146.921821] WARNING: possible recursive locking detected
> [ 146.927143] 6.2.0-rc3-00349-g97ec4d559d93 #4 Not tainted
> [ 146.932467] --------------------------------------------
> [ 146.937786] spidev_test/558 is trying to acquire lock:
> [ 146.942935] c4c79c3c (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x89c/0xa78
> [ 146.950565]
> [ 146.950565] but task is already holding lock:
> [ 146.956405] c4c79c3c (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x40/0xa78
> [ 146.963932]
> [ 146.963932] other info that might help us debug this:
> [ 146.970467] Possible unsafe locking scenario:
> [ 146.970467]
> [ 146.976394] CPU0
> [ 146.978849] ----
> [ 146.981302] lock(&spidev->spi_lock);
> [ 146.985071] lock(&spidev->spi_lock);
> [ 146.988838]
> [ 146.988838] *** DEADLOCK ***
> [ 146.988838]
> [ 146.994765] May be due to missing lock nesting notation
> [ 146.994765]
> [ 147.001561] 2 locks held by spidev_test/558:
> [ 147.005842] #0: c4c79c3c (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x40/0xa78
> [ 147.013808] #1: c4c79c94 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x6c/0xa78
> [ 147.021774]
> [ 147.021774] stack backtrace:
> [ 147.026141] CPU: 0 PID: 558 Comm: spidev_test Not tainted 6.2.0-rc3-00349-g97ec4d559d93 #4
> [ 147.034419] Hardware name: Freescale i.MX7 Dual (Device Tree)
> [ 147.040180] unwind_backtrace from show_stack+0x10/0x14
> [ 147.045433] show_stack from dump_stack_lvl+0x58/0x70
> [ 147.050506] dump_stack_lvl from __lock_acquire+0x13cc/0x28e0
> [ 147.056279] __lock_acquire from lock_acquire+0xf4/0x368
> [ 147.061616] lock_acquire from __mutex_lock+0x80/0x8a0
> [ 147.066783] __mutex_lock from mutex_lock_nested+0x1c/0x24
> [ 147.072295] mutex_lock_nested from spidev_ioctl+0x89c/0xa78
> [ 147.077983] spidev_ioctl from sys_ioctl+0x540/0xddc
> [ 147.082972] sys_ioctl from ret_fast_syscall+0x0/0x1c
> [ 147.088043] Exception stack(0xf1159fa8 to 0xf1159ff0)
> [ 147.093111] 9fa0: 00000000 00001000 00000003 40206b00 befe3a28 befe3a20
> [ 147.101303] 9fc0: 00000000 00001000 014b01a0 00000036 00420040 0041fd18 00000000 00000003
> [ 147.109493] 9fe0: 00000036 befe3a08 b6e86dc5 b6dfe5f6
>
> Using the changed code layout from this warning I get:
>
> $ ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x89c/0xa78
> spidev_ioctl+0x89c/0xa78:
> spidev_sync at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:99
> (inlined by) spidev_message at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:297
> (inlined by) spidev_ioctl at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:504
>
> $ ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x40/0xa78
> spidev_ioctl+0x40/0xa78:
> spidev_ioctl at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:363
>
> $ ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x6c/0xa78
> spidev_ioctl+0x6c/0xa78:
> spidev_ioctl at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:377
>
> Regards
> Max

Thanks Max, that makes sense now, working on a fix.

Bart

2023-01-16 14:39:37

by Max Krummenacher

[permalink] [raw]
Subject: Re: spidev regression in 6.2-rc kernel

Hi

On Mon, 2023-01-16 at 13:47 +0100, Bartosz Golaszewski wrote:
> On Mon, 16 Jan 2023 at 13:19, Bartosz Golaszewski <[email protected]> wrote:
> >
> > On Mon, Jan 16, 2023 at 1:06 PM Francesco Dolcini <[email protected]> wrote:
> > >
> > > Hello,
> > > we spotted a regression on spidev on latest 6.2-rc kernel.
> > >
> > > [ 214.047619]
> > > [ 214.049198] ============================================
> > > [ 214.054533] WARNING: possible recursive locking detected
> > > [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> > > [ 214.065969] --------------------------------------------
> > > [ 214.071290] spidev_test/1454 is trying to acquire lock:
> > > [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> > > [ 214.084164]
> > > [ 214.084164] but task is already holding lock:
> > > [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > > [ 214.097537]
> > > [ 214.097537] other info that might help us debug this:
> > > [ 214.104075] Possible unsafe locking scenario:
> > > [ 214.104075]
> > > [ 214.110004] CPU0
> > > [ 214.112461] ----
> > > [ 214.114916] lock(&spidev->spi_lock);
> > > [ 214.118687] lock(&spidev->spi_lock);
> > > [ 214.122457]
> > > [ 214.122457] *** DEADLOCK ***
> > > [ 214.122457]
> > > [ 214.128386] May be due to missing lock nesting notation
> > > [ 214.128386]
> > > [ 214.135183] 2 locks held by spidev_test/1454:
> > > [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > > [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> > > [ 214.155493]
> > > [ 214.155493] stack backtrace:
> > > [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> > > [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > > [ 214.175555] unwind_backtrace from show_stack+0x10/0x14
> > > [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90
> > > [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858
> > > [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378
> > > [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8
> > > [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24
> > > [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> > > [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> > > [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c
> > > [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> > > [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> > > [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> > > [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
> > >
> > >
> > > This is not running the latest rc4, but on sha 97ec4d559d93 (this is
> > > just what our CI had available when this test was run). I was not able
> > > to bisect it, but it seems something that you could have introduced.
> > >
> > > The log is from an apalis-imx6, but I have the same on other ARM SOC.
> > >
> > > Can you have a look?
> > > Thanks
> > >
> > > Francesco
> > >
> >
> > Eek! Yes it's commit 1f4d2dd45b6e ("spi: spidev: fix a race condition
> > when accessing spidev->spi"): spidev_ioctl() takes the lock and in
> > certain instances can end up calling spidev_compat_ioc_message() which
> > takes the same lock again. I'll send a fix shortly.
> >
> > Bart
>
> Seems, like that's not it... Francesco: what is the output of:
>
> ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x44/0xab8
>
> and
>
> ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x44/0xab8
>
> on the spidev.o object for that build?
>
> Bart
>
I did this for Francesco.

The CI build isn't configured for this:
./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x44/0xab8
ERROR: CONFIG_DEBUG_INFO not enabled

So I recompiled the kernel and rerun the test.
[ 146.916497] ============================================
[ 146.921821] WARNING: possible recursive locking detected
[ 146.927143] 6.2.0-rc3-00349-g97ec4d559d93 #4 Not tainted
[ 146.932467] --------------------------------------------
[ 146.937786] spidev_test/558 is trying to acquire lock:
[ 146.942935] c4c79c3c (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x89c/0xa78
[ 146.950565]
[ 146.950565] but task is already holding lock:
[ 146.956405] c4c79c3c (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x40/0xa78
[ 146.963932]
[ 146.963932] other info that might help us debug this:
[ 146.970467] Possible unsafe locking scenario:
[ 146.970467]
[ 146.976394] CPU0
[ 146.978849] ----
[ 146.981302] lock(&spidev->spi_lock);
[ 146.985071] lock(&spidev->spi_lock);
[ 146.988838]
[ 146.988838] *** DEADLOCK ***
[ 146.988838]
[ 146.994765] May be due to missing lock nesting notation
[ 146.994765]
[ 147.001561] 2 locks held by spidev_test/558:
[ 147.005842] #0: c4c79c3c (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x40/0xa78
[ 147.013808] #1: c4c79c94 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x6c/0xa78
[ 147.021774]
[ 147.021774] stack backtrace:
[ 147.026141] CPU: 0 PID: 558 Comm: spidev_test Not tainted 6.2.0-rc3-00349-g97ec4d559d93 #4
[ 147.034419] Hardware name: Freescale i.MX7 Dual (Device Tree)
[ 147.040180] unwind_backtrace from show_stack+0x10/0x14
[ 147.045433] show_stack from dump_stack_lvl+0x58/0x70
[ 147.050506] dump_stack_lvl from __lock_acquire+0x13cc/0x28e0
[ 147.056279] __lock_acquire from lock_acquire+0xf4/0x368
[ 147.061616] lock_acquire from __mutex_lock+0x80/0x8a0
[ 147.066783] __mutex_lock from mutex_lock_nested+0x1c/0x24
[ 147.072295] mutex_lock_nested from spidev_ioctl+0x89c/0xa78
[ 147.077983] spidev_ioctl from sys_ioctl+0x540/0xddc
[ 147.082972] sys_ioctl from ret_fast_syscall+0x0/0x1c
[ 147.088043] Exception stack(0xf1159fa8 to 0xf1159ff0)
[ 147.093111] 9fa0: 00000000 00001000 00000003 40206b00 befe3a28 befe3a20
[ 147.101303] 9fc0: 00000000 00001000 014b01a0 00000036 00420040 0041fd18 00000000 00000003
[ 147.109493] 9fe0: 00000036 befe3a08 b6e86dc5 b6dfe5f6

Using the changed code layout from this warning I get:

$ ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x89c/0xa78
spidev_ioctl+0x89c/0xa78:
spidev_sync at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:99
(inlined by) spidev_message at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:297
(inlined by) spidev_ioctl at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:504

$ ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x40/0xa78
spidev_ioctl+0x40/0xa78:
spidev_ioctl at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:363

$ ./scripts/faddr2line drivers/spi/spidev.o spidev_ioctl+0x6c/0xa78
spidev_ioctl+0x6c/0xa78:
spidev_ioctl at /home/krm/git.toradex.com/linux-toradex-mainline/drivers/spi/spidev.c:377

Regards
Max

2023-01-16 15:34:46

by Bartosz Golaszewski

[permalink] [raw]
Subject: Re: spidev regression in 6.2-rc kernel

On Mon, Jan 16, 2023 at 2:45 PM Bartosz Golaszewski <[email protected]> wrote:
>
> Thanks Max, that makes sense now, working on a fix.
>
> Bart

Max, Francesco,

I've sent out a fix, please give it a spin and leave your Tested-by if
you don't mind.

Bart

Subject: Re: spidev regression in 6.2-rc kernel

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 16.01.23 13:06, Francesco Dolcini wrote:
> Hello,
> we spotted a regression on spidev on latest 6.2-rc kernel.
>
> [ 214.047619]
> [ 214.049198] ============================================
> [ 214.054533] WARNING: possible recursive locking detected
> [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> [ 214.065969] --------------------------------------------
> [ 214.071290] spidev_test/1454 is trying to acquire lock:
> [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> [ 214.084164]
> [ 214.084164] but task is already holding lock:
> [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> [ 214.097537]
> [ 214.097537] other info that might help us debug this:
> [ 214.104075] Possible unsafe locking scenario:
> [ 214.104075]
> [ 214.110004] CPU0
> [ 214.112461] ----
> [ 214.114916] lock(&spidev->spi_lock);
> [ 214.118687] lock(&spidev->spi_lock);
> [ 214.122457]
> [ 214.122457] *** DEADLOCK ***
> [ 214.122457]
> [ 214.128386] May be due to missing lock nesting notation
> [ 214.128386]
> [ 214.135183] 2 locks held by spidev_test/1454:
> [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> [ 214.155493]
> [ 214.155493] stack backtrace:
> [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [ 214.175555] unwind_backtrace from show_stack+0x10/0x14
> [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90
> [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858
> [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378
> [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8
> [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24
> [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c
> [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
>
>
> This is not running the latest rc4, but on sha 97ec4d559d93 (this is
> just what our CI had available when this test was run). I was not able
> to bisect it, but it seems something that you could have introduced.
>
> The log is from an apalis-imx6, but I have the same on other ARM SOC.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 1f4d2dd45b6e
#regzbot title spi: spidev: recursive locking error
#regzbot monitor:
https://lore.kernel.org/all/[email protected]/
#regzbot fix: spi: spidev: fix a recursive locking error
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-01-16 19:42:58

by Francesco Dolcini

[permalink] [raw]
Subject: Re: spidev regression in 6.2-rc kernel

On Mon, Jan 16, 2023 at 04:51:58PM +0100, Linux kernel regression tracking (#adding) wrote:
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.
> See link in footer if these mails annoy you.]
>
> On 16.01.23 13:06, Francesco Dolcini wrote:
> > Hello,
> > we spotted a regression on spidev on latest 6.2-rc kernel.
> >
> > [ 214.047619]
> > [ 214.049198] ============================================
> > [ 214.054533] WARNING: possible recursive locking detected
> > [ 214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> > [ 214.065969] --------------------------------------------
> > [ 214.071290] spidev_test/1454 is trying to acquire lock:
> > [ 214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> > [ 214.084164]
> > [ 214.084164] but task is already holding lock:
> > [ 214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > [ 214.097537]
> > [ 214.097537] other info that might help us debug this:
> > [ 214.104075] Possible unsafe locking scenario:
> > [ 214.104075]
> > [ 214.110004] CPU0
> > [ 214.112461] ----
> > [ 214.114916] lock(&spidev->spi_lock);
> > [ 214.118687] lock(&spidev->spi_lock);
> > [ 214.122457]
> > [ 214.122457] *** DEADLOCK ***
> > [ 214.122457]
> > [ 214.128386] May be due to missing lock nesting notation
> > [ 214.128386]
> > [ 214.135183] 2 locks held by spidev_test/1454:
> > [ 214.139553] #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > [ 214.147524] #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> > [ 214.155493]
> > [ 214.155493] stack backtrace:
> > [ 214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> > [ 214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > [ 214.175555] unwind_backtrace from show_stack+0x10/0x14
> > [ 214.180819] show_stack from dump_stack_lvl+0x60/0x90
> > [ 214.185900] dump_stack_lvl from __lock_acquire+0x874/0x2858
> > [ 214.191584] __lock_acquire from lock_acquire+0xfc/0x378
> > [ 214.196918] lock_acquire from __mutex_lock+0x9c/0x8a8
> > [ 214.202083] __mutex_lock from mutex_lock_nested+0x1c/0x24
> > [ 214.207597] mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> > [ 214.213284] spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> > [ 214.218277] sys_ioctl from ret_fast_syscall+0x0/0x1c
> > [ 214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> > [ 214.228422] dfa0: 00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> > [ 214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> > [ 214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
> >
> >
> > This is not running the latest rc4, but on sha 97ec4d559d93 (this is
> > just what our CI had available when this test was run). I was not able
> > to bisect it, but it seems something that you could have introduced.
> >
> > The log is from an apalis-imx6, but I have the same on other ARM SOC.
>
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced 1f4d2dd45b6e
> #regzbot title spi: spidev: recursive locking error
> #regzbot monitor:
> https://lore.kernel.org/all/[email protected]/
> #regzbot fix: spi: spidev: fix a recursive locking error
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.

The issue is in mainline, starting from v6.2-rc4, not in next.

#regzbot ^introduced a720416d94634068951773cb9e9d6f1b73769e5b

Francesco