Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp277936pxp; Wed, 16 Mar 2022 05:48:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJysqkINYabtpT0FfbakMxrD2MyZh8CIuEgH12o0PCsFMfV7Y5SiYyg6fuVeJpQno82oNb1u X-Received: by 2002:a17:907:968e:b0:6db:aed5:43c8 with SMTP id hd14-20020a170907968e00b006dbaed543c8mr19441397ejc.636.1647434934122; Wed, 16 Mar 2022 05:48:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647434934; cv=none; d=google.com; s=arc-20160816; b=fRiZaL8d/ozDrmXCXDSxbPzPCZGCt3eXF82nUpyHs1Qx4aWdmeudOGWjjZdti4aTd0 j9mNHly5MLHjommbjr8P0b9oJm6FXJ1YG9TQMvlaFAIwCL/luV8+/UDUdtO/jQSCyOB2 3tTWDbj9IJ6bVxsP0qcjpKQSIc/C1VypyJjoh2cHYoT1udVsNwM7GS19pnROc3J3hrRr RtIJYiilp6m8CRET9/YqjWbNF+RFP2jq/1ACtHq/SYSngI4tcO5P9JAWnfByjlVO/7G5 uK8U9ZGrAb3aZHvRlGmUxhoshOgwvUiGOa9m3qtGBmV6/4ZndzAR+n2zUywaOS2leRJl o0tA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature :dkim-signature:date; bh=KNqciUy5xI65efVoxJ7qQM4PLqNNQWPnQpoPSQheucY=; b=sa9TqzlMvzLqAUaBAHBXoOjwVOE0UrmXw0sIScWZdjUwtu1wUKguPq9z6Jf3Twzz6F hf10oYjuq5AXo/CPTmNgdbsXcUpbMvWdcDxLMw3ccjVBw9Cy0RZTkF65mDMTIRTtuv43 wYe8zt5Q5Xk+5C5NaRJLAlI8mx2ZsDe/u5kv3B6SsbPxMiMup+MdtCA7nVBY/WgS7TeX gFB40s1oZGnd1HSD37Xeepd611n5b6iuSK4amqW5WFTdSrtRTTFUq0mhNH83LBe4rWBF lNeF3tCPQttAVWwTO7eub+Zn9of7j7SpGngzwJYUY85JW86eApO89mfi6CLFxozlBZfC Lq/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=oIS0hzgQ; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p3-20020a17090653c300b006cfe1dc865csi1089942ejo.582.2022.03.16.05.48.26; Wed, 16 Mar 2022 05:48:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=oIS0hzgQ; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236568AbiCNJ2w (ORCPT + 99 others); Mon, 14 Mar 2022 05:28:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41930 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231899AbiCNJ2v (ORCPT ); Mon, 14 Mar 2022 05:28:51 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8701033E16 for ; Mon, 14 Mar 2022 02:27:41 -0700 (PDT) Date: Mon, 14 Mar 2022 10:27:37 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1647250059; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=KNqciUy5xI65efVoxJ7qQM4PLqNNQWPnQpoPSQheucY=; b=oIS0hzgQgCx0f78xrIP6gFwQnxMW9q8MVecvgRLm3subj+xI4RovmSGesrZvbqKuQhsd9P vEiVWCOr438/pFXc2i29vt23OihMrqnsBtmvRExWBHEiczU7DncNBXYNuHV1S7dFVFPm4J wjeUTDQDWp4VTHsvbRP1brYk81dIf94nLkaiGhU041n+UXYFjZ3S8Aby0t/BhXmoj2XI+O 9t2U+bUHK4tEoRAKz4NqfOEf5CEZcadaYk7JPHfdxVblD/1hJJ1eUzeogzQcUygzn7HlAy iaYLYlvlannkLHzhO121xmctpGvaffHXtwi9bMKdZrieo8MGR6g4hWlvp4hlRg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1647250059; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=KNqciUy5xI65efVoxJ7qQM4PLqNNQWPnQpoPSQheucY=; b=GCH0meEUUSXH7HCEMUKVVNIsodEj2ovjkNBKDnGF6AtF5KGy/IFcNeDZNgh+NzbsRbzben qI78YyqPi8Z1UiCw== From: Sebastian Andrzej Siewior To: linux-kernel@vger.kernel.org Cc: Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Oleg Nesterov , Peter Zijlstra , Steven Rostedt , Thomas Gleixner , Vincent Guittot Subject: Re: [PATCH] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT. Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022-03-02 22:04:25 [+0100], To linux-kernel@vger.kernel.org wrote: > As explained by Alexander Fyodorov : > > |read_lock(&tasklist_lock) in ptrace_stop() is converted to sleeping > |lock on a PREEMPT_RT kernel, and it can remove __TASK_TRACED from > |task->state (by moving it to task->saved_state). If parent does > |wait() on child followed by a sys_ptrace call, the following race can > |happen: > | > |- child sets __TASK_TRACED in ptrace_stop() > |- parent does wait() which eventually calls wait_task_stopped() and returns > | child's pid > |- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves > | __TASK_TRACED flag to saved_state > |- parent calls sys_ptrace, which calls ptrace_check_attach() and > | wait_task_inactive() > > The patch is based on his initial patch where an additional check is > added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is > taken in case the caller is interrupted between looking into ->state and > ->saved_state. > > [ Fix for ptrace_unfreeze_traced() by Oleg Nesterov ] > Signed-off-by: Sebastian Andrzej Siewior ping. > --- > > I redid the state matching part compared to what I had in my queue so it > hopefully looks better. > > include/linux/sched.h | 127 ++++++++++++++++++++++++++++++++++++++++-- > kernel/ptrace.c | 25 +++++---- > kernel/sched/core.c | 5 +- > 3 files changed, 140 insertions(+), 17 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 75ba8aa60248b..73109ce7ce789 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -118,12 +118,8 @@ struct task_group; > > #define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING) > > -#define task_is_traced(task) ((READ_ONCE(task->__state) & __TASK_TRACED) != 0) > - > #define task_is_stopped(task) ((READ_ONCE(task->__state) & __TASK_STOPPED) != 0) > > -#define task_is_stopped_or_traced(task) ((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0) > - > /* > * Special states are those that do not use the normal wait-loop pattern. See > * the comment with set_special_state(). > @@ -2006,6 +2002,129 @@ static inline int test_tsk_need_resched(struct task_struct *tsk) > return unlikely(test_tsk_thread_flag(tsk,TIF_NEED_RESCHED)); > } > > +#ifdef CONFIG_PREEMPT_RT > + > +static inline bool task_state_match_and(struct task_struct *tsk, long state) > +{ > + unsigned long flags; > + bool match = false; > + > + raw_spin_lock_irqsave(&tsk->pi_lock, flags); > + if (READ_ONCE(tsk->__state) & state) > + match = true; > + else if (tsk->saved_state & state) > + match = true; > + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags); > + return match; > +} > + > +static inline bool __task_state_match_eq(struct task_struct *tsk, long state) > +{ > + bool match = false; > + > + if (READ_ONCE(tsk->__state) == state) > + match = true; > + else if (tsk->saved_state == state) > + match = true; > + return match; > +} > + > +static inline bool task_state_match_eq(struct task_struct *tsk, long state) > +{ > + unsigned long flags; > + bool match; > + > + raw_spin_lock_irqsave(&tsk->pi_lock, flags); > + match = __task_state_match_eq(tsk, state); > + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags); > + return match; > +} > + > +static inline bool task_state_match_and_set(struct task_struct *tsk, long state, > + long new_state) > +{ > + unsigned long flags; > + bool match = false; > + > + raw_spin_lock_irqsave(&tsk->pi_lock, flags); > + if (READ_ONCE(tsk->__state) & state) { > + WRITE_ONCE(tsk->__state, new_state); > + match = true; > + } else if (tsk->saved_state & state) { > + tsk->__state = new_state; > + match = true; > + } > + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags); > + return match; > +} > + > +static inline bool task_state_match_eq_set(struct task_struct *tsk, long state, > + long new_state) > +{ > + unsigned long flags; > + bool match = false; > + > + raw_spin_lock_irqsave(&tsk->pi_lock, flags); > + if (READ_ONCE(tsk->__state) == state) { > + WRITE_ONCE(tsk->__state, new_state); > + match = true; > + } else if (tsk->saved_state == state) { > + tsk->saved_state = new_state; > + match = true; > + } > + raw_spin_unlock_irqrestore(&tsk->pi_lock, flags); > + return match; > +} > + > +#else > + > +static inline bool task_state_match_and(struct task_struct *tsk, long state) > +{ > + return READ_ONCE(tsk->__state) & state; > +} > + > +static inline bool __task_state_match_eq(struct task_struct *tsk, long state) > +{ > + return READ_ONCE(tsk->__state) == state; > +} > + > +static inline bool task_state_match_eq(struct task_struct *tsk, long state) > +{ > + return __task_state_match_eq(tsk, state); > +} > + > +static inline bool task_state_match_and_set(struct task_struct *tsk, long state, > + long new_state) > +{ > + if (READ_ONCE(tsk->__state) & state) { > + WRITE_ONCE(tsk->__state, new_state); > + return true; > + } > + return false; > +} > + > +static inline bool task_state_match_eq_set(struct task_struct *tsk, long state, > + long new_state) > +{ > + if (READ_ONCE(tsk->__state) == state) { > + WRITE_ONCE(tsk->__state, new_state); > + return true; > + } > + return false; > +} > + > +#endif > + > +static inline bool task_is_traced(struct task_struct *tsk) > +{ > + return task_state_match_and(tsk, __TASK_TRACED); > +} > + > +static inline bool task_is_stopped_or_traced(struct task_struct *tsk) > +{ > + return task_state_match_and(tsk, __TASK_STOPPED | __TASK_TRACED); > +} > + > /* > * cond_resched() and cond_resched_lock(): latency reduction via > * explicit rescheduling in places that are safe. The return > diff --git a/kernel/ptrace.c b/kernel/ptrace.c > index eea265082e975..5ce0948c0c0a7 100644 > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -195,10 +195,10 @@ static bool ptrace_freeze_traced(struct task_struct *task) > return ret; > > spin_lock_irq(&task->sighand->siglock); > - if (task_is_traced(task) && !looks_like_a_spurious_pid(task) && > - !__fatal_signal_pending(task)) { > - WRITE_ONCE(task->__state, __TASK_TRACED); > - ret = true; > + if (!looks_like_a_spurious_pid(task) && !__fatal_signal_pending(task)) { > + > + ret = task_state_match_and_set(task, __TASK_TRACED, > + __TASK_TRACED); > } > spin_unlock_irq(&task->sighand->siglock); > > @@ -207,7 +207,10 @@ static bool ptrace_freeze_traced(struct task_struct *task) > > static void ptrace_unfreeze_traced(struct task_struct *task) > { > - if (READ_ONCE(task->__state) != __TASK_TRACED) > + bool frozen; > + > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > + READ_ONCE(task->__state) != __TASK_TRACED) > return; > > WARN_ON(!task->ptrace || task->parent != current); > @@ -217,12 +220,12 @@ static void ptrace_unfreeze_traced(struct task_struct *task) > * Recheck state under the lock to close this race. > */ > spin_lock_irq(&task->sighand->siglock); > - if (READ_ONCE(task->__state) == __TASK_TRACED) { > - if (__fatal_signal_pending(task)) > - wake_up_state(task, __TASK_TRACED); > - else > - WRITE_ONCE(task->__state, TASK_TRACED); > - } > + > + frozen = task_state_match_eq_set(task, __TASK_TRACED, TASK_TRACED); > + > + if (frozen && __fatal_signal_pending(task)) > + wake_up_state(task, __TASK_TRACED); > + > spin_unlock_irq(&task->sighand->siglock); > } > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 9745613d531ce..a44414946de3d 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3239,7 +3239,8 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state > * is actually now running somewhere else! > */ > while (task_running(rq, p)) { > - if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) > + if (match_state && > + unlikely(!task_state_match_eq(p, match_state))) > return 0; > cpu_relax(); > } > @@ -3254,7 +3255,7 @@ unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state > running = task_running(rq, p); > queued = task_on_rq_queued(p); > ncsw = 0; > - if (!match_state || READ_ONCE(p->__state) == match_state) > + if (!match_state || __task_state_match_eq(p, match_state)) > ncsw = p->nvcsw | LONG_MIN; /* sets MSB */ > task_rq_unlock(rq, p, &rf); > > -- > 2.35.1 >