2022-04-21 19:25:55

by Naresh Kamboju

[permalink] [raw]
Subject: Re: [PATCH 4.19 000/338] 4.19.238-rc1 review

On Mon, 18 Apr 2022 at 14:09, Naresh Kamboju <[email protected]> wrote:
>
> On Thu, 14 Apr 2022 at 18:45, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > This is the start of the stable review cycle for the 4.19.238 release.
> > There are 338 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sat, 16 Apr 2022 11:07:54 +0000.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.238-rc1.gz
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
>
>
> Following kernel warning noticed on arm64 Juno-r2 while booting
> stable-rc 4.19.238. Here is the full test log link [1].
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
> [ 0.000000] Linux version 4.19.238 (tuxmake@tuxmake) (gcc version
> 11.2.0 (Debian 11.2.0-18)) #1 SMP PREEMPT @1650206156
> [ 0.000000] Machine model: ARM Juno development board (r2)
> <trim>
> [ 18.499895] ================================
> [ 18.504172] WARNING: inconsistent lock state
> [ 18.508451] 4.19.238 #1 Not tainted
> [ 18.511944] --------------------------------
> [ 18.516222] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> [ 18.522242] kworker/u12:3/60 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 18.527826] (____ptrval____)
> (&(&xprt->transport_lock)->rlock){+.?.}, at: xprt_destroy+0x70/0xe0
> [ 18.536648] {IN-SOFTIRQ-W} state was registered at:
> [ 18.541543] lock_acquire+0xc8/0x23c
> [ 18.545216] _raw_spin_lock+0x50/0x64
> [ 18.548973] xs_tcp_state_change+0x1b4/0x440
> [ 18.553343] tcp_rcv_state_process+0x684/0x1300
> [ 18.557972] tcp_v4_do_rcv+0x70/0x290
> [ 18.561731] tcp_v4_rcv+0xc34/0xda0
> [ 18.565316] ip_local_deliver_finish+0x16c/0x3c0
> [ 18.570032] ip_local_deliver+0x6c/0x240
> [ 18.574051] ip_rcv_finish+0x98/0xe4
> [ 18.577722] ip_rcv+0x68/0x210
> [ 18.580871] __netif_receive_skb_one_core+0x6c/0x9c
> [ 18.585847] __netif_receive_skb+0x2c/0x74
> [ 18.590039] netif_receive_skb_internal+0x88/0x20c
> [ 18.594928] netif_receive_skb+0x68/0x1a0
> [ 18.599036] smsc911x_poll+0x104/0x290
> [ 18.602881] net_rx_action+0x124/0x4bc
> [ 18.606727] __do_softirq+0x1d0/0x524
> [ 18.610484] irq_exit+0x11c/0x144
> [ 18.613894] __handle_domain_irq+0x84/0xe0
> [ 18.618086] gic_handle_irq+0x5c/0xb0
> [ 18.621843] el1_irq+0xb4/0x130
> [ 18.625081] cpuidle_enter_state+0xc0/0x3ec
> [ 18.629361] cpuidle_enter+0x38/0x4c
> [ 18.633032] do_idle+0x200/0x2c0
> [ 18.636353] cpu_startup_entry+0x30/0x50
> [ 18.640372] rest_init+0x260/0x270
> [ 18.643870] start_kernel+0x45c/0x490
> [ 18.647625] irq event stamp: 18931
> [ 18.651037] hardirqs last enabled at (18931): [<ffff00000832e800>]
> kfree+0xe0/0x370
> [ 18.658799] hardirqs last disabled at (18930): [<ffff00000832e7ec>]
> kfree+0xcc/0x370
> [ 18.666564] softirqs last enabled at (18920): [<ffff000008fbce94>]
> rpc_wake_up_first_on_wq+0xb4/0x1b0
> [ 18.675893] softirqs last disabled at (18918): [<ffff000008fbce18>]
> rpc_wake_up_first_on_wq+0x38/0x1b0
> [ 18.685217]
> [ 18.685217] other info that might help us debug this:
> [ 18.691758] Possible unsafe locking scenario:
> [ 18.691758]
> [ 18.697689] CPU0
> [ 18.700137] ----
> [ 18.702586] lock(&(&xprt->transport_lock)->rlock);
> [ 18.707562] <Interrupt>
> [ 18.710184] lock(&(&xprt->transport_lock)->rlock);
> [ 18.715335]
> [ 18.715335] *** DEADLOCK ***

My bisect script pointed to the following kernel commit,

BAT BISECTION OLD: This iteration (kernel rev
2d235d26dcf81d34c93ba8616d75c804b5ee5f3f) presents old behavior.
242a3e0c75b64b4ced82e29e07a6d6d98eeec826 is the first new commit
commit 242a3e0c75b64b4ced82e29e07a6d6d98eeec826
Author: NeilBrown <[email protected]>
Date: Tue Mar 8 13:42:17 2022 +1100

SUNRPC: avoid race between mod_timer() and del_timer_sync()

commit 3848e96edf4788f772d83990022fa7023a233d83 upstream.

xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync().
Both xprt_unlock_connect() and xprt_release() call
->release_xprt()
which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
which calls mod_timer().

This may result in mod_timer() being called *after* del_timer_sync().
When this happens, the timer may fire long after the xprt has been freed,
and run_timer_softirq() will probably crash.

The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is
always called under ->transport_lock. So if we take ->transport_lock to
call del_timer_sync(), we can be sure that mod_timer() will run first
(if it runs at all).

Cc: [email protected]
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/sunrpc/xprt.c | 7 +++++++
1 file changed, 7 insertions(+)

Reported-by: Linux Kernel Functional Testing <[email protected]>

--
Linaro LKFT
https://lkft.linaro.org


2022-04-26 07:58:58

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 4.19 000/338] 4.19.238-rc1 review

On Thu, 21 Apr 2022, Naresh Kamboju wrote:
> On Mon, 18 Apr 2022 at 14:09, Naresh Kamboju <[email protected]> wrote:
> >
> > On Thu, 14 Apr 2022 at 18:45, Greg Kroah-Hartman
> > <[email protected]> wrote:
> > >
> > > This is the start of the stable review cycle for the 4.19.238 release.
> > > There are 338 patches in this series, all will be posted as a response
> > > to this one. If anyone has any issues with these being applied, please
> > > let me know.
> > >
> > > Responses should be made by Sat, 16 Apr 2022 11:07:54 +0000.
> > > Anything received after that time might be too late.
> > >
> > > The whole patch series can be found in one patch at:
> > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.238-rc1.gz
> > > or in the git tree and branch at:
> > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> > > and the diffstat can be found below.
> > >
> > > thanks,
> > >
> > > greg k-h
> >
> >
> > Following kernel warning noticed on arm64 Juno-r2 while booting
> > stable-rc 4.19.238. Here is the full test log link [1].
> >
> > [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
> > [ 0.000000] Linux version 4.19.238 (tuxmake@tuxmake) (gcc version
> > 11.2.0 (Debian 11.2.0-18)) #1 SMP PREEMPT @1650206156
> > [ 0.000000] Machine model: ARM Juno development board (r2)
> > <trim>
> > [ 18.499895] ================================
> > [ 18.504172] WARNING: inconsistent lock state
> > [ 18.508451] 4.19.238 #1 Not tainted
> > [ 18.511944] --------------------------------
> > [ 18.516222] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > [ 18.522242] kworker/u12:3/60 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > [ 18.527826] (____ptrval____)
> > (&(&xprt->transport_lock)->rlock){+.?.}, at: xprt_destroy+0x70/0xe0
> > [ 18.536648] {IN-SOFTIRQ-W} state was registered at:
> > [ 18.541543] lock_acquire+0xc8/0x23c

Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
spin_unlock_bh().

Thanks,
NeilBrown

Subject: Re: [PATCH 4.19 000/338] 4.19.238-rc1 review

Hi Neil

On Tue, Apr 26, 2022 at 12:29:55PM +1000, NeilBrown wrote:
> On Thu, 21 Apr 2022, Naresh Kamboju wrote:
> > On Mon, 18 Apr 2022 at 14:09, Naresh Kamboju <[email protected]> wrote:
> > >
> > > On Thu, 14 Apr 2022 at 18:45, Greg Kroah-Hartman
> > > <[email protected]> wrote:
> > > >
> > > > This is the start of the stable review cycle for the 4.19.238 release.
> > > > There are 338 patches in this series, all will be posted as a response
> > > > to this one. If anyone has any issues with these being applied, please
> > > > let me know.
> > > >
> > > > Responses should be made by Sat, 16 Apr 2022 11:07:54 +0000.
> > > > Anything received after that time might be too late.
> > > >
> > > > The whole patch series can be found in one patch at:
> > > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.238-rc1.gz
> > > > or in the git tree and branch at:
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> > > > and the diffstat can be found below.
> > > >
> > > > thanks,
> > > >
> > > > greg k-h
> > >
> > >
> > > Following kernel warning noticed on arm64 Juno-r2 while booting
> > > stable-rc 4.19.238. Here is the full test log link [1].
> > >
> > > [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
> > > [ 0.000000] Linux version 4.19.238 (tuxmake@tuxmake) (gcc version
> > > 11.2.0 (Debian 11.2.0-18)) #1 SMP PREEMPT @1650206156
> > > [ 0.000000] Machine model: ARM Juno development board (r2)
> > > <trim>
> > > [ 18.499895] ================================
> > > [ 18.504172] WARNING: inconsistent lock state
> > > [ 18.508451] 4.19.238 #1 Not tainted
> > > [ 18.511944] --------------------------------
> > > [ 18.516222] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > > [ 18.522242] kworker/u12:3/60 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > [ 18.527826] (____ptrval____)
> > > (&(&xprt->transport_lock)->rlock){+.?.}, at: xprt_destroy+0x70/0xe0
> > > [ 18.536648] {IN-SOFTIRQ-W} state was registered at:
> > > [ 18.541543] lock_acquire+0xc8/0x23c
>
> Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
> spin_unlock_bh().
>

We get the same deadlock or similar one and we think that
can be connected to this thread on 4.19.243. For us is a bit
difficult to hit but we are going to apply this change

net: sunrpc: Fix deadlock in xprt_destroy

Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
spin_unlock_bh().

Signed-off-by: Michael Trimarchi <[email protected]>
---
net/sunrpc/xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d05fa7c36d00..b1abf4848bbc 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1550,9 +1550,9 @@ static void xprt_destroy(struct rpc_xprt *xprt)
* is cleared. We use ->transport_lock to ensure the mod_timer()
* can only run *before* del_time_sync(), never after.
*/
- spin_lock(&xprt->transport_lock);
+ spin_lock_bh(&xprt->transport_lock);
del_timer_sync(&xprt->timer);
- spin_unlock(&xprt->transport_lock);
+ spin_unlock_bh(&xprt->transport_lock);

/*
* Destroy sockets etc from the system workqueue so they can
--
2.37.2

> Thanks,
> NeilBrown
>

Thank you

2022-12-16 21:32:09

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 4.19 000/338] 4.19.238-rc1 review



> On Dec 16, 2022, at 13:31, Michael Trimarchi <[email protected]> wrote:
>
> [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Neil
>
> On Tue, Apr 26, 2022 at 12:29:55PM +1000, NeilBrown wrote:
>> On Thu, 21 Apr 2022, Naresh Kamboju wrote:
>>> On Mon, 18 Apr 2022 at 14:09, Naresh Kamboju <[email protected]> wrote:
>>>>
>>>> On Thu, 14 Apr 2022 at 18:45, Greg Kroah-Hartman
>>>> <[email protected]> wrote:
>>>>>
>>>>> This is the start of the stable review cycle for the 4.19.238 release.
>>>>> There are 338 patches in this series, all will be posted as a response
>>>>> to this one. If anyone has any issues with these being applied, please
>>>>> let me know.
>>>>>
>>>>> Responses should be made by Sat, 16 Apr 2022 11:07:54 +0000.
>>>>> Anything received after that time might be too late.
>>>>>
>>>>> The whole patch series can be found in one patch at:
>>>>> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.238-rc1.gz
>>>>> or in the git tree and branch at:
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
>>>>> and the diffstat can be found below.
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>>
>>>> Following kernel warning noticed on arm64 Juno-r2 while booting
>>>> stable-rc 4.19.238. Here is the full test log link [1].
>>>>
>>>> [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
>>>> [ 0.000000] Linux version 4.19.238 (tuxmake@tuxmake) (gcc version
>>>> 11.2.0 (Debian 11.2.0-18)) #1 SMP PREEMPT @1650206156
>>>> [ 0.000000] Machine model: ARM Juno development board (r2)
>>>> <trim>
>>>> [ 18.499895] ================================
>>>> [ 18.504172] WARNING: inconsistent lock state
>>>> [ 18.508451] 4.19.238 #1 Not tainted
>>>> [ 18.511944] --------------------------------
>>>> [ 18.516222] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>>>> [ 18.522242] kworker/u12:3/60 [HC0[0]:SC0[0]:HE1:SE1] takes:
>>>> [ 18.527826] (____ptrval____)
>>>> (&(&xprt->transport_lock)->rlock){+.?.}, at: xprt_destroy+0x70/0xe0
>>>> [ 18.536648] {IN-SOFTIRQ-W} state was registered at:
>>>> [ 18.541543] lock_acquire+0xc8/0x23c
>>
>> Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
>> spin_unlock_bh().
>>
>
> We get the same deadlock or similar one and we think that
> can be connected to this thread on 4.19.243. For us is a bit
> difficult to hit but we are going to apply this change
>
> net: sunrpc: Fix deadlock in xprt_destroy
>
> Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
> spin_unlock_bh().
>
> Signed-off-by: Michael Trimarchi <[email protected]>
> ---
> net/sunrpc/xprt.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index d05fa7c36d00..b1abf4848bbc 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -1550,9 +1550,9 @@ static void xprt_destroy(struct rpc_xprt *xprt)
> * is cleared. We use ->transport_lock to ensure the mod_timer()
> * can only run *before* del_time_sync(), never after.
> */
> - spin_lock(&xprt->transport_lock);
> + spin_lock_bh(&xprt->transport_lock);
> del_timer_sync(&xprt->timer);
> - spin_unlock(&xprt->transport_lock);
> + spin_unlock_bh(&xprt->transport_lock);
>
> /*
> * Destroy sockets etc from the system workqueue so they can
> —

Agreed. When backporting to kernels that are older than 5.3.x, the transport lock needs to be taken using the bh-safe spin lock variants.

Reviewed-by: Trond Myklebust <[email protected] <mailto:[email protected]>>

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

Subject: Re: [PATCH 4.19 000/338] 4.19.238-rc1 review

Hi

On Fri, Dec 16, 2022 at 10:25 PM Trond Myklebust
<[email protected]> wrote:
>
>
>
> > On Dec 16, 2022, at 13:31, Michael Trimarchi <[email protected]> wrote:
> >
> > [You don't often get email from [email protected]. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Hi Neil
> >
> > On Tue, Apr 26, 2022 at 12:29:55PM +1000, NeilBrown wrote:
> >> On Thu, 21 Apr 2022, Naresh Kamboju wrote:
> >>> On Mon, 18 Apr 2022 at 14:09, Naresh Kamboju <[email protected]> wrote:
> >>>>
> >>>> On Thu, 14 Apr 2022 at 18:45, Greg Kroah-Hartman
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> This is the start of the stable review cycle for the 4.19.238 release.
> >>>>> There are 338 patches in this series, all will be posted as a response
> >>>>> to this one. If anyone has any issues with these being applied, please
> >>>>> let me know.
> >>>>>
> >>>>> Responses should be made by Sat, 16 Apr 2022 11:07:54 +0000.
> >>>>> Anything received after that time might be too late.
> >>>>>
> >>>>> The whole patch series can be found in one patch at:
> >>>>> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.238-rc1.gz
> >>>>> or in the git tree and branch at:
> >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
> >>>>> and the diffstat can be found below.
> >>>>>
> >>>>> thanks,
> >>>>>
> >>>>> greg k-h
> >>>>
> >>>>
> >>>> Following kernel warning noticed on arm64 Juno-r2 while booting
> >>>> stable-rc 4.19.238. Here is the full test log link [1].
> >>>>
> >>>> [ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033]
> >>>> [ 0.000000] Linux version 4.19.238 (tuxmake@tuxmake) (gcc version
> >>>> 11.2.0 (Debian 11.2.0-18)) #1 SMP PREEMPT @1650206156
> >>>> [ 0.000000] Machine model: ARM Juno development board (r2)
> >>>> <trim>
> >>>> [ 18.499895] ================================
> >>>> [ 18.504172] WARNING: inconsistent lock state
> >>>> [ 18.508451] 4.19.238 #1 Not tainted
> >>>> [ 18.511944] --------------------------------
> >>>> [ 18.516222] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> >>>> [ 18.522242] kworker/u12:3/60 [HC0[0]:SC0[0]:HE1:SE1] takes:
> >>>> [ 18.527826] (____ptrval____)
> >>>> (&(&xprt->transport_lock)->rlock){+.?.}, at: xprt_destroy+0x70/0xe0
> >>>> [ 18.536648] {IN-SOFTIRQ-W} state was registered at:
> >>>> [ 18.541543] lock_acquire+0xc8/0x23c
> >>
> >> Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
> >> spin_unlock_bh().
> >>
> >
> > We get the same deadlock or similar one and we think that
> > can be connected to this thread on 4.19.243. For us is a bit
> > difficult to hit but we are going to apply this change
> >
> > net: sunrpc: Fix deadlock in xprt_destroy
> >
> > Prior to Linux 5.3, ->transport_lock needs spin_lock_bh() and
> > spin_unlock_bh().
> >
> > Signed-off-by: Michael Trimarchi <[email protected]>
> > ---
> > net/sunrpc/xprt.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > index d05fa7c36d00..b1abf4848bbc 100644
> > --- a/net/sunrpc/xprt.c
> > +++ b/net/sunrpc/xprt.c
> > @@ -1550,9 +1550,9 @@ static void xprt_destroy(struct rpc_xprt *xprt)
> > * is cleared. We use ->transport_lock to ensure the mod_timer()
> > * can only run *before* del_time_sync(), never after.
> > */
> > - spin_lock(&xprt->transport_lock);
> > + spin_lock_bh(&xprt->transport_lock);
> > del_timer_sync(&xprt->timer);
> > - spin_unlock(&xprt->transport_lock);
> > + spin_unlock_bh(&xprt->transport_lock);
> >
> > /*
> > * Destroy sockets etc from the system workqueue so they can
> > —
>
> Agreed. When backporting to kernels that are older than 5.3.x, the transport lock needs to be taken using the bh-safe spin lock variants.
>
> Reviewed-by: Trond Myklebust <[email protected] <mailto:[email protected]>>
>

Seems already applied, but for some reason I miss it. I will re-align
to stable again

Michael

> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>


--
Michael Nazzareno Trimarchi
Co-Founder & Chief Executive Officer
M. +39 347 913 2170
[email protected]
__________________________________

Amarula Solutions BV
Joop Geesinkweg 125, 1114 AB, Amsterdam, NL
T. +31 (0)85 111 9172
[email protected]
http://www.amarulasolutions.com