2022-02-09 08:39:17

by Waiman Long

[permalink] [raw]
Subject: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section

I was made aware of the following lockdep splat:

[ 2516.308763] =====================================================
[ 2516.309085] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 2516.309433] 5.14.0-51.el9.aarch64+debug #1 Not tainted
[ 2516.309703] -----------------------------------------------------
[ 2516.310149] stress-ng/153663 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 2516.310512] ffff0000e422b198 (&newf->file_lock){+.+.}-{2:2}, at: fd_install+0x368/0x4f0
[ 2516.310944]
and this task is already holding:
[ 2516.311248] ffff0000c08140d8 (&sighand->siglock){-.-.}-{2:2}, at: copy_process+0x1e2c/0x3e80
[ 2516.311804] which would create a new lock dependency:
[ 2516.312066] (&sighand->siglock){-.-.}-{2:2} -> (&newf->file_lock){+.+.}-{2:2}
[ 2516.312446]
but this new dependency connects a HARDIRQ-irq-safe lock:
[ 2516.312983] (&sighand->siglock){-.-.}-{2:2}
:
[ 2516.330700] Possible interrupt unsafe locking scenario:

[ 2516.331075] CPU0 CPU1
[ 2516.331328] ---- ----
[ 2516.331580] lock(&newf->file_lock);
[ 2516.331790] local_irq_disable();
[ 2516.332231] lock(&sighand->siglock);
[ 2516.332579] lock(&newf->file_lock);
[ 2516.332922] <Interrupt>
[ 2516.333069] lock(&sighand->siglock);
[ 2516.333291]
*** DEADLOCK ***
[ 2516.389845]
stack backtrace:
[ 2516.390101] CPU: 3 PID: 153663 Comm: stress-ng Kdump: loaded Not tainted 5.14.0-51.el9.aarch64+debug #1
[ 2516.390756] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 2516.391155] Call trace:
[ 2516.391302] dump_backtrace+0x0/0x3e0
[ 2516.391518] show_stack+0x24/0x30
[ 2516.391717] dump_stack_lvl+0x9c/0xd8
[ 2516.391938] dump_stack+0x1c/0x38
[ 2516.392247] print_bad_irq_dependency+0x620/0x710
[ 2516.392525] check_irq_usage+0x4fc/0x86c
[ 2516.392756] check_prev_add+0x180/0x1d90
[ 2516.392988] validate_chain+0x8e0/0xee0
[ 2516.393215] __lock_acquire+0x97c/0x1e40
[ 2516.393449] lock_acquire.part.0+0x240/0x570
[ 2516.393814] lock_acquire+0x90/0xb4
[ 2516.394021] _raw_spin_lock+0xe8/0x154
[ 2516.394244] fd_install+0x368/0x4f0
[ 2516.394451] copy_process+0x1f5c/0x3e80
[ 2516.394678] kernel_clone+0x134/0x660
[ 2516.394895] __do_sys_clone3+0x130/0x1f4
[ 2516.395128] __arm64_sys_clone3+0x5c/0x7c
[ 2516.395478] invoke_syscall.constprop.0+0x78/0x1f0
[ 2516.395762] el0_svc_common.constprop.0+0x22c/0x2c4
[ 2516.396050] do_el0_svc+0xb0/0x10c
[ 2516.396252] el0_svc+0x24/0x34
[ 2516.396436] el0t_64_sync_handler+0xa4/0x12c
[ 2516.396688] el0t_64_sync+0x198/0x19c
[ 2517.491197] NET: Registered PF_ATMPVC protocol family
[ 2517.491524] NET: Registered PF_ATMSVC protocol family
[ 2591.991877] sched: RT throttling activated

One way to solve this problem is to move the fd_install() call out of
the sighand->siglock critical section.

Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
on cleanups"), the pidfd installation was done without holding both
the task_list lock and the sighand->siglock. Obviously, holding these
two locks are not really needed to protect the fd_install() call.
So move the fd_install() call down to after the releases of both locks.

Fixes: 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/fork.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f7b21..007af7fb47c7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2323,10 +2323,6 @@ static __latent_entropy struct task_struct *copy_process(
goto bad_fork_cancel_cgroup;
}

- /* past the last point of failure */
- if (pidfile)
- fd_install(pidfd, pidfile);
-
init_task_pid_links(p);
if (likely(p->pid)) {
ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);
@@ -2375,6 +2371,9 @@ static __latent_entropy struct task_struct *copy_process(
syscall_tracepoint_update(p);
write_unlock_irq(&tasklist_lock);

+ if (pidfile)
+ fd_install(pidfd, pidfile);
+
proc_fork_connector(p);
sched_post_fork(p, args);
cgroup_post_fork(p, args);
--
2.27.0



2022-02-09 10:00:48

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section

On Tue, Feb 08, 2022 at 03:59:06PM -0600, Eric W. Biederman wrote:

> The fd is being installed in the fdtable of the parent process,
> and the siglock and tasklist_lock are held to protect the child.
>
>
> Further fd_install is exposing the fd to userspace where it can be used
> by the process_madvise and the process_mrelease system calls, from
> anything that shares the fdtable of the parent thread. Which means it
> needs to be guaranteed that kernel_clone will call wake_up_process
> before it is safe to call fd_install.

You mean "no calling fd_install() until after we are past the last possible
failure exit, by which point we know that wake_up_process() will eventually
be called", hopefully? If so (as I assumed all along), anything downstream
of
if (fatal_signal_pending(current)) {
retval = -EINTR;
goto bad_fork_cancel_cgroup;
}

should be fine...

2022-02-09 11:22:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section

Al Viro <[email protected]> writes:

> On Tue, Feb 08, 2022 at 03:59:06PM -0600, Eric W. Biederman wrote:
>
>> The fd is being installed in the fdtable of the parent process,
>> and the siglock and tasklist_lock are held to protect the child.
>>
>>
>> Further fd_install is exposing the fd to userspace where it can be used
>> by the process_madvise and the process_mrelease system calls, from
>> anything that shares the fdtable of the parent thread. Which means it
>> needs to be guaranteed that kernel_clone will call wake_up_process
>> before it is safe to call fd_install.
>
> You mean "no calling fd_install() until after we are past the last possible
> failure exit, by which point we know that wake_up_process() will eventually
> be called", hopefully? If so (as I assumed all along), anything downstream
> of
> if (fatal_signal_pending(current)) {
> retval = -EINTR;
> goto bad_fork_cancel_cgroup;
> }
>
> should be fine...

Except for the problems of calling fd_install under siglock, and
tasklist_lock, which protect nothing and cause lockdep splats.

There may also be assumptions on the task actually being fully setup,
if not today then in a future use pidfd. So I am not particularly
comfortable with fd_install coming before we drop tasklist_lock.

I was pointing out that to resolve the locking issue we fundamentally
can not move the fd_install earlier, to resolve the locking issues.

Eric

2022-02-09 23:24:46

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] copy_process(): Move fd_install() out of sighand->siglock critical section

Waiman Long <[email protected]> writes:

> I was made aware of the following lockdep splat:
>
> [ 2516.308763] =====================================================
> [ 2516.309085] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> [ 2516.309433] 5.14.0-51.el9.aarch64+debug #1 Not tainted
> [ 2516.309703] -----------------------------------------------------
> [ 2516.310149] stress-ng/153663 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 2516.310512] ffff0000e422b198 (&newf->file_lock){+.+.}-{2:2}, at: fd_install+0x368/0x4f0
> [ 2516.310944]
> and this task is already holding:
> [ 2516.311248] ffff0000c08140d8 (&sighand->siglock){-.-.}-{2:2}, at: copy_process+0x1e2c/0x3e80
> [ 2516.311804] which would create a new lock dependency:
> [ 2516.312066] (&sighand->siglock){-.-.}-{2:2} -> (&newf->file_lock){+.+.}-{2:2}
> [ 2516.312446]
> but this new dependency connects a HARDIRQ-irq-safe lock:
> [ 2516.312983] (&sighand->siglock){-.-.}-{2:2}
> :
> [ 2516.330700] Possible interrupt unsafe locking scenario:
>
> [ 2516.331075] CPU0 CPU1
> [ 2516.331328] ---- ----
> [ 2516.331580] lock(&newf->file_lock);
> [ 2516.331790] local_irq_disable();
> [ 2516.332231] lock(&sighand->siglock);
> [ 2516.332579] lock(&newf->file_lock);
> [ 2516.332922] <Interrupt>
> [ 2516.333069] lock(&sighand->siglock);
> [ 2516.333291]
> *** DEADLOCK ***
> [ 2516.389845]
> stack backtrace:
> [ 2516.390101] CPU: 3 PID: 153663 Comm: stress-ng Kdump: loaded Not tainted 5.14.0-51.el9.aarch64+debug #1
> [ 2516.390756] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
> [ 2516.391155] Call trace:
> [ 2516.391302] dump_backtrace+0x0/0x3e0
> [ 2516.391518] show_stack+0x24/0x30
> [ 2516.391717] dump_stack_lvl+0x9c/0xd8
> [ 2516.391938] dump_stack+0x1c/0x38
> [ 2516.392247] print_bad_irq_dependency+0x620/0x710
> [ 2516.392525] check_irq_usage+0x4fc/0x86c
> [ 2516.392756] check_prev_add+0x180/0x1d90
> [ 2516.392988] validate_chain+0x8e0/0xee0
> [ 2516.393215] __lock_acquire+0x97c/0x1e40
> [ 2516.393449] lock_acquire.part.0+0x240/0x570
> [ 2516.393814] lock_acquire+0x90/0xb4
> [ 2516.394021] _raw_spin_lock+0xe8/0x154
> [ 2516.394244] fd_install+0x368/0x4f0
> [ 2516.394451] copy_process+0x1f5c/0x3e80
> [ 2516.394678] kernel_clone+0x134/0x660
> [ 2516.394895] __do_sys_clone3+0x130/0x1f4
> [ 2516.395128] __arm64_sys_clone3+0x5c/0x7c
> [ 2516.395478] invoke_syscall.constprop.0+0x78/0x1f0
> [ 2516.395762] el0_svc_common.constprop.0+0x22c/0x2c4
> [ 2516.396050] do_el0_svc+0xb0/0x10c
> [ 2516.396252] el0_svc+0x24/0x34
> [ 2516.396436] el0t_64_sync_handler+0xa4/0x12c
> [ 2516.396688] el0t_64_sync+0x198/0x19c
> [ 2517.491197] NET: Registered PF_ATMPVC protocol family
> [ 2517.491524] NET: Registered PF_ATMSVC protocol family
> [ 2591.991877] sched: RT throttling activated
>
> One way to solve this problem is to move the fd_install() call out of
> the sighand->siglock critical section.
>
> Before commit 6fd2fe494b17 ("copy_process(): don't use ksys_close()
> on cleanups"), the pidfd installation was done without holding both
> the task_list lock and the sighand->siglock. Obviously, holding these
> two locks are not really needed to protect the fd_install() call.
> So move the fd_install() call down to after the releases of both
> locks.
>
> Fixes: 6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups")
> Signed-off-by: Waiman Long <[email protected]>

The fd_install can not be moved earlier and it does need to be moved
outside the locks.

Reviewed-by: "Eric W. Biederman" <[email protected]>

> ---
> kernel/fork.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d75a528f7b21..007af7fb47c7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2323,10 +2323,6 @@ static __latent_entropy struct task_struct *copy_process(
> goto bad_fork_cancel_cgroup;
> }
>
> - /* past the last point of failure */
> - if (pidfile)
> - fd_install(pidfd, pidfile);
> -
> init_task_pid_links(p);
> if (likely(p->pid)) {
> ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);
> @@ -2375,6 +2371,9 @@ static __latent_entropy struct task_struct *copy_process(
> syscall_tracepoint_update(p);
> write_unlock_irq(&tasklist_lock);
>
> + if (pidfile)
> + fd_install(pidfd, pidfile);
> +
> proc_fork_connector(p);
> sched_post_fork(p, args);
> cgroup_post_fork(p, args);