2023-03-20 19:42:46

by Joe Korty

[permalink] [raw]
Subject: [PATCH 5.10.162-rt78] Restore initialization of wake_q_sleeper.next in fork.c

In the transition from 5.10.158-rt77 to 5.10.162-rt78,
the initialization of task_struct::wake_q_sleeper.next
was dropped. Restore it.

This appears to be only a problem in 5.10. 5.15 does not
have wake_q_sleeper; 4.19 does have it but its initialization
there is still present.

The 5.10.162-rt78 patch that damaged fork.c is:

0170-locking-rtmutex-add-sleeping-lock-implementation.patch

I do not have a simple test that brings out this problem.
My test consists of a shell script and eight binaries,
all of which were written in Ada. strace shows that it
does a few thousand forks in rapid succession. One of the
forks stalls out, after which no fork after that returns.
Eventually the 122 second stallout occurs and a large
number of threads are shown to be waiting for tasklist
lock, either in do_exit or in copy_process. The kernel
.config has rt and many debug features enabled, lockdep
included.

Signed-off-by: Joe Korty <[email protected]

Index: b/kernel/fork.c
===================================================================
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -960,6 +960,7 @@ static struct task_struct *dup_task_stru
tsk->splice_pipe = NULL;
tsk->task_frag.page = NULL;
tsk->wake_q.next = NULL;
+ tsk->wake_q_sleeper.next = NULL;
tsk->pf_io_worker = NULL;

account_kernel_stack(tsk, 1);


Subject: Re: [PATCH 5.10.162-rt78] Restore initialization of wake_q_sleeper.next in fork.c

On Mon, Mar 20, 2023 at 03:37:31PM -0400, Joe Korty wrote:
> In the transition from 5.10.158-rt77 to 5.10.162-rt78,
> the initialization of task_struct::wake_q_sleeper.next
> was dropped. Restore it.
>
> This appears to be only a problem in 5.10. 5.15 does not
> have wake_q_sleeper; 4.19 does have it but its initialization
> there is still present.
>
> The 5.10.162-rt78 patch that damaged fork.c is:
>
> 0170-locking-rtmutex-add-sleeping-lock-implementation.patch
>
> I do not have a simple test that brings out this problem.
> My test consists of a shell script and eight binaries,
> all of which were written in Ada. strace shows that it
> does a few thousand forks in rapid succession. One of the
> forks stalls out, after which no fork after that returns.
> Eventually the 122 second stallout occurs and a large
> number of threads are shown to be waiting for tasklist
> lock, either in do_exit or in copy_process. The kernel
> .config has rt and many debug features enabled, lockdep
> included.

Joe, thank you for investigating that problem and for writing a patch.

Earlier today Steffen Dirkwinkel sent a similar patch:

https://lore.kernel.org/all/[email protected]/

Would you mind giving your ACK to his patch? I have that patch queued for
my next build already.

Thank you,
Luis

> Signed-off-by: Joe Korty <[email protected]
>
> Index: b/kernel/fork.c
> ===================================================================
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -960,6 +960,7 @@ static struct task_struct *dup_task_stru
> tsk->splice_pipe = NULL;
> tsk->task_frag.page = NULL;
> tsk->wake_q.next = NULL;
> + tsk->wake_q_sleeper.next = NULL;
> tsk->pf_io_worker = NULL;
>
> account_kernel_stack(tsk, 1);
>
---end quoted text---


2023-03-20 20:04:38

by Joe Korty

[permalink] [raw]
Subject: Re: [PATCH 5.10.162-rt78] Restore initialization of wake_q_sleeper.next in fork.c

On Mon, Mar 20, 2023 at 05:00:13PM -0300, Luis Claudio R. Goncalves wrote:
> On Mon, Mar 20, 2023 at 03:37:31PM -0400, Joe Korty wrote:
> > In the transition from 5.10.158-rt77 to 5.10.162-rt78,
> > the initialization of task_struct::wake_q_sleeper.next
> > was dropped. Restore it.
> >
> > This appears to be only a problem in 5.10. 5.15 does not
> > have wake_q_sleeper; 4.19 does have it but its initialization
> > there is still present.
> >
> > The 5.10.162-rt78 patch that damaged fork.c is:
> >
> > 0170-locking-rtmutex-add-sleeping-lock-implementation.patch
> >
> > I do not have a simple test that brings out this problem.
> > My test consists of a shell script and eight binaries,
> > all of which were written in Ada. strace shows that it
> > does a few thousand forks in rapid succession. One of the
> > forks stalls out, after which no fork after that returns.
> > Eventually the 122 second stallout occurs and a large
> > number of threads are shown to be waiting for tasklist
> > lock, either in do_exit or in copy_process. The kernel
> > .config has rt and many debug features enabled, lockdep
> > included.
>
> Joe, thank you for investigating that problem and for writing a patch.
>
> Earlier today Steffen Dirkwinkel sent a similar patch:
>
> https://lore.kernel.org/all/[email protected]/
>
> Would you mind giving your ACK to his patch? I have that patch queued for
> my next build already.

Acked-by: Joe Korty <[email protected]>