Received-SPF: pass (google.com: domain of linux-kernel+bounces-15630-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1;
Precedence: bulk
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=UTF-8
Date: Wed, 03 Jan 2024 16:04:54 +0200
Message-Id: <CY54MOETXVFI.1102C6BQTO36@suppilovahvero>
Cc: <kernel@quicinc.com>, <quic_pkondeti@quicinc.com>,
 <keescook@chromium.or>, <viro@zeniv.linux.org.uk>, <brauner@kernel.org>,
 <oleg@redhat.com>, <dhowells@redhat.com>, <paul@paul-moore.com>,
 <jmorris@namei.org>, <serge@hallyn.com>, <linux-mm@kvack.org>,
 <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
 <keyrings@vger.kernel.org>, <linux-security-module@vger.kernel.org>,
 <linux-arm-msm@vger.kernel.org>
Subject: Re: [PATCH] kernel: Introduce a write lock/unlock wrapper for
 tasklist_lock
From: "Jarkko Sakkinen" <jarkko@kernel.org>
To: "Maria Yu" <quic_aiquny@quicinc.com>, <ebiederm@xmission.com>
References: <20231225081932.17752-1-quic_aiquny@quicinc.com>
In-Reply-To: <20231225081932.17752-1-quic_aiquny@quicinc.com>

On Mon Dec 25, 2023 at 10:19 AM EET, Maria Yu wrote:
> As a rwlock for tasklist_lock, there are multiple scenarios to acquire
> read lock which write lock needed to be waiting for.
> In freeze_process/thaw_processes it can take about 200+ms for holding rea=
d
> lock of tasklist_lock by walking and freezing/thawing tasks in commercial
> devices. And write_lock_irq will have preempt disabled and local irq
> disabled to spin until the tasklist_lock can be acquired. This leading to
> a bad responsive performance of current system.
> Take an example:
> 1. cpu0 is holding read lock of tasklist_lock to thaw_processes.
> 2. cpu1 is waiting write lock of tasklist_lock to exec a new thread with
>    preempt_disabled and local irq disabled.
> 3. cpu2 is waiting write lock of tasklist_lock to do_exit with
>    preempt_disabled and local irq disabled.
> 4. cpu3 is waiting write lock of tasklist_lock to do_exit with
>    preempt_disabled and local irq disabled.
> So introduce a write lock/unlock wrapper for tasklist_lock specificly.
> The current taskslist_lock writers all have write_lock_irq to hold
> tasklist_lock, and write_unlock_irq to release tasklist_lock, that means
> the writers are not suitable or workable to wait on tasklist_lock in irq
> disabled scenarios. So the write lock/unlock wrapper here only follow the
> current design of directly use local_irq_disable and local_irq_enable,
> and not take already irq disabled writer callers into account.
> Use write_trylock in the loop and enabled irq for cpu to repsond if lock
> cannot be taken.
>
> Signed-off-by: Maria Yu <quic_aiquny@quicinc.com>
> ---
>  fs/exec.c                  | 10 +++++-----
>  include/linux/sched/task.h | 29 +++++++++++++++++++++++++++++
>  kernel/exit.c              | 16 ++++++++--------
>  kernel/fork.c              |  6 +++---
>  kernel/ptrace.c            | 12 ++++++------
>  kernel/sys.c               |  8 ++++----
>  security/keys/keyctl.c     |  4 ++--
>  7 files changed, 57 insertions(+), 28 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 4aa19b24f281..030eef6852eb 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1086,7 +1086,7 @@ static int de_thread(struct task_struct *tsk)
> =20
>  		for (;;) {
>  			cgroup_threadgroup_change_begin(tsk);
> -			write_lock_irq(&tasklist_lock);
> +			write_lock_tasklist_lock();
>  			/*
>  			 * Do this under tasklist_lock to ensure that
>  			 * exit_notify() can't miss ->group_exec_task
> @@ -1095,7 +1095,7 @@ static int de_thread(struct task_struct *tsk)
>  			if (likely(leader->exit_state))
>  				break;
>  			__set_current_state(TASK_KILLABLE);
> -			write_unlock_irq(&tasklist_lock);
> +			write_unlock_tasklist_lock();
>  			cgroup_threadgroup_change_end(tsk);
>  			schedule();
>  			if (__fatal_signal_pending(tsk))
> @@ -1150,7 +1150,7 @@ static int de_thread(struct task_struct *tsk)
>  		 */
>  		if (unlikely(leader->ptrace))
>  			__wake_up_parent(leader, leader->parent);
> -		write_unlock_irq(&tasklist_lock);
> +		write_unlock_tasklist_lock();
>  		cgroup_threadgroup_change_end(tsk);
> =20
>  		release_task(leader);
> @@ -1198,13 +1198,13 @@ static int unshare_sighand(struct task_struct *me=
)
> =20
>  		refcount_set(&newsighand->count, 1);
> =20
> -		write_lock_irq(&tasklist_lock);
> +		write_lock_tasklist_lock();
>  		spin_lock(&oldsighand->siglock);
>  		memcpy(newsighand->action, oldsighand->action,
>  		       sizeof(newsighand->action));
>  		rcu_assign_pointer(me->sighand, newsighand);
>  		spin_unlock(&oldsighand->siglock);
> -		write_unlock_irq(&tasklist_lock);
> +		write_unlock_tasklist_lock();
> =20
>  		__cleanup_sighand(oldsighand);
>  	}
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index a23af225c898..6f69d9a3c868 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -50,6 +50,35 @@ struct kernel_clone_args {
>   * a separate lock).
>   */
>  extern rwlock_t tasklist_lock;
> +
> +/*
> + * Tasklist_lock is a special lock, it takes a good amount of time of
> + * taskslist_lock readers to finish, and the pure write_irq_lock api
> + * will do local_irq_disable at the very first, and put the current cpu
> + * waiting for the lock while is non-responsive for interrupts.
> + *
> + * The current taskslist_lock writers all have write_lock_irq to hold
> + * tasklist_lock, and write_unlock_irq to release tasklist_lock, that
> + * means the writers are not suitable or workable to wait on
> + * tasklist_lock in irq disabled scenarios. So the write lock/unlock
> + * wrapper here only follow the current design of directly use
> + * local_irq_disable and local_irq_enable.
> + */
> +static inline void write_lock_tasklist_lock(void)
> +{
> +	while (1) {
> +		local_irq_disable();
> +		if (write_trylock(&tasklist_lock))
> +			break;
> +		local_irq_enable();
> +		cpu_relax();
> +	}

Maybe:

	local_irq_disable();
	while (!write_trylock(&tasklist_lock)) {
		local_irq_enable();
		cpu_relax();
		local_irq_disable();
	}

BR, Jarkko