2022-06-17 14:01:25

by Mel Gorman

[permalink] [raw]
Subject: Lockups due to "locking/rwsem: Make handoff bit handling more consistent"

Hi Waiman,

I've received reports of lockups happening in kernels including
commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent"). The exact symptoms vary but usually it's either a soft lockup
(older kernel with a backport), the task hanging and never exiting or the
machine becomes generally unresponsive and ssh is broken. The problem
started in 5.16 and reliably bisected to commit d257cc8cb8d5. Reverting
the patch in 5.16, 5.17 and 5.18 finish the test successfully but I didn't
test a revert on 5.19-rc2 because of other changes layered on top.

The reproducer is simple -- start pairs of CPU hogs pinned to a CPU with
different SCHED_RR priorities that run for a few seconds. It does not
hit every time but usually happens within 10 attempts. On 5.16 at least,
the tasks failed to exit and kept retrying to exit using the following path

[<0>] rwsem_down_write_slowpath+0x2ad/0x580
[<0>] unlink_file_vma+0x2c/0x50
[<0>] free_pgtables+0xbe/0x110
[<0>] exit_mmap+0xc1/0x220
[<0>] mmput+0x52/0x110
[<0>] do_exit+0x2ec/0xb00
[<0>] do_group_exit+0x2d/0x90
[<0>] get_signal+0xb6/0x920
[<0>] arch_do_signal_or_restart+0xba/0x700
[<0>] exit_to_user_mode_prepare+0xb7/0x230
[<0>] irqentry_exit_to_user_mode+0x5/0x20
[<0>] asm_sysvec_apic_timer_interrupt+0x12/0x20
[<0>] preempt_schedule_thunk+0x16/0x18
[<0>] rwsem_down_write_slowpath+0x2ad/0x580
[<0>] unlink_file_vma+0x2c/0x50
[<0>] free_pgtables+0xbe/0x110
[<0>] exit_mmap+0xc1/0x220
[<0>] mmput+0x52/0x110
[<0>] do_exit+0x2ec/0xb00
[<0>] do_group_exit+0x2d/0x90
[<0>] get_signal+0xb6/0x920
[<0>] arch_do_signal_or_restart+0xba/0x700
[<0>] exit_to_user_mode_prepare+0xb7/0x230
[<0>] irqentry_exit_to_user_mode+0x5/0x20
[<0>] asm_sysvec_apic_timer_interrupt+0x12/0x20

The C file and shell script to run it are attached.

--
Mel Gorman
SUSE Labs


Attachments:
(No filename) (1.83 kB)
fsim.c (199.00 B)
run-fsim.sh (438.00 B)
Download all attachments

2022-06-17 15:01:12

by Waiman Long

[permalink] [raw]
Subject: Re: Lockups due to "locking/rwsem: Make handoff bit handling more consistent"

On 6/17/22 09:43, Mel Gorman wrote:
> Hi Waiman,
>
> I've received reports of lockups happening in kernels including
> commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
> consistent"). The exact symptoms vary but usually it's either a soft lockup
> (older kernel with a backport), the task hanging and never exiting or the
> machine becomes generally unresponsive and ssh is broken. The problem
> started in 5.16 and reliably bisected to commit d257cc8cb8d5. Reverting
> the patch in 5.16, 5.17 and 5.18 finish the test successfully but I didn't
> test a revert on 5.19-rc2 because of other changes layered on top.
>
> The reproducer is simple -- start pairs of CPU hogs pinned to a CPU with
> different SCHED_RR priorities that run for a few seconds. It does not
> hit every time but usually happens within 10 attempts. On 5.16 at least,
> the tasks failed to exit and kept retrying to exit using the following path
>
> [<0>] rwsem_down_write_slowpath+0x2ad/0x580
> [<0>] unlink_file_vma+0x2c/0x50
> [<0>] free_pgtables+0xbe/0x110
> [<0>] exit_mmap+0xc1/0x220
> [<0>] mmput+0x52/0x110
> [<0>] do_exit+0x2ec/0xb00
> [<0>] do_group_exit+0x2d/0x90
> [<0>] get_signal+0xb6/0x920
> [<0>] arch_do_signal_or_restart+0xba/0x700
> [<0>] exit_to_user_mode_prepare+0xb7/0x230
> [<0>] irqentry_exit_to_user_mode+0x5/0x20
> [<0>] asm_sysvec_apic_timer_interrupt+0x12/0x20
> [<0>] preempt_schedule_thunk+0x16/0x18
> [<0>] rwsem_down_write_slowpath+0x2ad/0x580
> [<0>] unlink_file_vma+0x2c/0x50
> [<0>] free_pgtables+0xbe/0x110
> [<0>] exit_mmap+0xc1/0x220
> [<0>] mmput+0x52/0x110
> [<0>] do_exit+0x2ec/0xb00
> [<0>] do_group_exit+0x2d/0x90
> [<0>] get_signal+0xb6/0x920
> [<0>] arch_do_signal_or_restart+0xba/0x700
> [<0>] exit_to_user_mode_prepare+0xb7/0x230
> [<0>] irqentry_exit_to_user_mode+0x5/0x20
> [<0>] asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> The C file and shell script to run it are attached.
>
Thanks for the reproducer and I will try to reproduce it locally.

It is a known issue that I have receive similar report from an Oracle
engineer. That is the reason I posted commit 1ee326196c66
("locking/rwsem: Always try to wake waiters in out_nolock path") that
was merged in v5.19. I believe it helps but it may not be able to
eliminate all possible race conditions. To make rwsem behave more like
before commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling
more consistent"), I posted a follow-up patch

https://lore.kernel.org/lkml/[email protected]/

But it hasn't gotten review yet.

I will try your reproducer to see if these patches are able to address
the lockup problem.

Thanks,
Longman

commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent").

2022-06-20 14:56:47

by Mel Gorman

[permalink] [raw]
Subject: Re: Lockups due to "locking/rwsem: Make handoff bit handling more consistent"

On Fri, Jun 17, 2022 at 10:29:20AM -0400, Waiman Long wrote:
> > The C file and shell script to run it are attached.
> >
> Thanks for the reproducer and I will try to reproduce it locally.
>
> It is a known issue that I have receive similar report from an Oracle
> engineer. That is the reason I posted commit 1ee326196c66 ("locking/rwsem:
> Always try to wake waiters in out_nolock path") that was merged in v5.19. I
> believe it helps but it may not be able to eliminate all possible race
> conditions. To make rwsem behave more like before commit d257cc8cb8d5
> ("locking/rwsem: Make handoff bit handling more consistent"), I posted a
> follow-up patch
>
> https://lore.kernel.org/lkml/[email protected]/
>
> But it hasn't gotten review yet.
>

FWIW, the patch passed the test case when applied to both 5.18 and
5.19-rc3.

--
Mel Gorman
SUSE Labs

2022-06-22 01:53:52

by Waiman Long

[permalink] [raw]
Subject: Re: Lockups due to "locking/rwsem: Make handoff bit handling more consistent"

On 6/20/22 10:09, Mel Gorman wrote:
> On Fri, Jun 17, 2022 at 10:29:20AM -0400, Waiman Long wrote:
>>> The C file and shell script to run it are attached.
>>>
>> Thanks for the reproducer and I will try to reproduce it locally.
>>
>> It is a known issue that I have receive similar report from an Oracle
>> engineer. That is the reason I posted commit 1ee326196c66 ("locking/rwsem:
>> Always try to wake waiters in out_nolock path") that was merged in v5.19. I
>> believe it helps but it may not be able to eliminate all possible race
>> conditions. To make rwsem behave more like before commit d257cc8cb8d5
>> ("locking/rwsem: Make handoff bit handling more consistent"), I posted a
>> follow-up patch
>>
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> But it hasn't gotten review yet.
>>
> FWIW, the patch passed the test case when applied to both 5.18 and
> 5.19-rc3.

Thanks for running the test. Do you mean that both 5.18 and 5.19-rc3
fail the test and they pass only after applying the patch?

Anyway, I am not able to reproduce the failure in both 5.18 and
5.19-rc3. Perhaps it is due to the difference in the running
environment, i.e. gcc, glibc, etc. What operating environment (SuSE
version) do you use to reproduce the failure? I used RHEL8 which is the
most convenient one for me.

BTW, do you mind if I put down your name with a "Tested-by:" tag to the
patch?

Thanks,
Longman

2022-06-22 15:36:39

by Mel Gorman

[permalink] [raw]
Subject: Re: Lockups due to "locking/rwsem: Make handoff bit handling more consistent"

On Tue, Jun 21, 2022 at 09:32:12PM -0400, Waiman Long wrote:
> On 6/20/22 10:09, Mel Gorman wrote:
> > On Fri, Jun 17, 2022 at 10:29:20AM -0400, Waiman Long wrote:
> > > > The C file and shell script to run it are attached.
> > > >
> > > Thanks for the reproducer and I will try to reproduce it locally.
> > >
> > > It is a known issue that I have receive similar report from an Oracle
> > > engineer. That is the reason I posted commit 1ee326196c66 ("locking/rwsem:
> > > Always try to wake waiters in out_nolock path") that was merged in v5.19. I
> > > believe it helps but it may not be able to eliminate all possible race
> > > conditions. To make rwsem behave more like before commit d257cc8cb8d5
> > > ("locking/rwsem: Make handoff bit handling more consistent"), I posted a
> > > follow-up patch
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > But it hasn't gotten review yet.
> > >
> > FWIW, the patch passed the test case when applied to both 5.18 and
> > 5.19-rc3.
>
> Thanks for running the test. Do you mean that both 5.18 and 5.19-rc3 fail
> the test and they pass only after applying the patch?
>

Yes.

> Anyway, I am not able to reproduce the failure in both 5.18 and 5.19-rc3.
> Perhaps it is due to the difference in the running environment, i.e. gcc,
> glibc, etc. What operating environment (SuSE version) do you use to
> reproduce the failure? I used RHEL8 which is the most convenient one for me.
>

It was reproduced on Leap 15.4 with a 2-socket machine with 40 cores
(SMT-2). The kernel built was based on the distribution config. gcc
version was based on 7.5.0.

> BTW, do you mind if I put down your name with a "Tested-by:" tag to the
> patch?
>

No problem.

--
Mel Gorman
SUSE Labs