Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp772271pxb; Tue, 5 Apr 2022 22:40:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwvT6+n0BrmzkuMDtquPDhb8FE+VpOv5Iam6k+94bcK3NJ//K51i4CdiIlXAgOpbShJib2H X-Received: by 2002:a17:902:ec89:b0:156:c5db:356d with SMTP id x9-20020a170902ec8900b00156c5db356dmr6883477plg.74.1649223656509; Tue, 05 Apr 2022 22:40:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649223656; cv=none; d=google.com; s=arc-20160816; b=LvWyhPd7NBw7FC0ac4364jLxMWhKZ5y8ZhsbGRA882ywx0wCtyw4nwZOb75u6DWaeo aclgBSkri+UGe/i/5lv5kbWeamg64dBEMmXTIDxQ6WTQnfJrzlA+NWHP3do476hZ2DgE AE1U0061jEVtlenZoGGHwpeNY1Q39imkLboK5QoXI3NWLJ79ciD+6bS3Q8/eQ0KLJQqp wCV7vkki1MJo8VvLMVsNkV/z1knycwur9N+U8XTEIskoOsfp4Jrt/Z+HMeUOLvHkhM7P rJ56T3AJMe5xYqhXkOwjBnEuTgHYUxgpNzPRVdhlJU1hOwL2VjYlc3w2JJ7/vWBVu3fV Aw9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=JRCtPql37TI0QCAvo5gW2Zck8T0P/CqvA5SiZrYJuYk=; b=kWfYH4XNR5YsRb3Ucy7ApGh9s3dBm2l6U1NQPHjWdgVwCKBbtLDbvMob/Xyo+Mxo6d Nki+uSeYH92mIZJUMFug/4xuI+pRz4rNjg7E7JkIVLjgDrY431qgSSJTnHIkNC22gv5e CvhouH5LBVLm35p9ktigzDGLzMDKmB7eVjJ93qCrOpX9e5t25FvZMiZTYp9uheUmVFDR kcxuJqiy4g7diKzY5M06DB6mL2kThCo9ipVCKZxAI3R47rQMI2C4xcy3dzXWPuHJGa9g glhFVp1F/oKIjUMUwLdK7iPNc2EB56Vg3NETJ0HKPsNjTDSb4qnTVJ/KSn3lRqR4K15/ rX4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=P8Fdu3E4; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id rj6-20020a17090b3e8600b001c976ed0feesi4356431pjb.102.2022.04.05.22.40.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Apr 2022 22:40:56 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=P8Fdu3E4; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C8CE5424F1E; Tue, 5 Apr 2022 21:46:30 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1578127AbiDEXSd (ORCPT + 99 others); Tue, 5 Apr 2022 19:18:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357777AbiDEK1M (ORCPT ); Tue, 5 Apr 2022 06:27:12 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5113339BB9 for ; Tue, 5 Apr 2022 03:10:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=JRCtPql37TI0QCAvo5gW2Zck8T0P/CqvA5SiZrYJuYk=; b=P8Fdu3E46UH/j9cbM0vhZ3I4j4 P3hCnKljTXjPpOhVbNoxZbPnnfk27KD6EdpsVXRQHssmrcI+3FVTxjnsBl9pBUDZt3LfpTBUYhRgw KRWmyMkFq27QsAgh6ZOFp1I3FmgmAJMyXXs2S6DZE2tXKeP2B86RgCPjfbwbNarCOv2ICByaJtg1Y U/Mi1uoPFm2LQAl8MyKlTW3fGanNx840M0f9u83HcYFAhS2nYyK/obT1nkuz/js72ov5v7Ls2LTBE C+lwefHBW8UpMOscSP7shKLI2UpkNPLXDG18jcRLnaly9S9wPNLZW62aG/T/WOFwB4xmpUqwpHg2z u+iq5xAg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=worktop.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nbg8q-006aqi-SA; Tue, 05 Apr 2022 10:10:28 +0000 Received: by worktop.programming.kicks-ass.net (Postfix, from userid 1000) id 03E57986B5B; Tue, 5 Apr 2022 12:10:26 +0200 (CEST) Date: Tue, 5 Apr 2022 12:10:26 +0200 From: Peter Zijlstra To: Sebastian Andrzej Siewior Cc: Oleg Nesterov , linux-kernel@vger.kernel.org, Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Steven Rostedt , Thomas Gleixner , Vincent Guittot Subject: Re: [PATCH v2] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT. Message-ID: <20220405101026.GB34954@worktop.programming.kicks-ass.net> References: <20220314185429.GA30364@redhat.com> <20220315142944.GA22670@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 31, 2022 at 04:25:42PM +0200, Sebastian Andrzej Siewior wrote: > As explained by Alexander Fyodorov : > > |read_lock(&tasklist_lock) in ptrace_stop() is converted to sleeping > |lock on a PREEMPT_RT kernel, and it can remove __TASK_TRACED from > |task->__state (by moving it to task->saved_state). If parent does > |wait() on child followed by a sys_ptrace call, the following race can > |happen: > | > |- child sets __TASK_TRACED in ptrace_stop() > |- parent does wait() which eventually calls wait_task_stopped() and returns > | child's pid > |- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves > | __TASK_TRACED flag to saved_state > |- parent calls sys_ptrace, which calls ptrace_check_attach() and > | wait_task_inactive() > > The patch is based on his initial patch where an additional check is > added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is > acquired to have stable view on ->__state and ->saved_state. > > wait_task_inactive() needs to check both task states while waiting for the > expected task state. Should the expected task state be in ->saved_state then > the task is blocked on a sleeping lock. In this case wait_task_inactive() needs > to wait until the lock situtation has been resolved (the expected state is in > ->__state). This ensures that the task is idle and does not wakeup as part of > lock resolving and races for instance with __switch_to_xtra() while the > debugger clears TIF_BLOCKSTEP() (noted by Oleg Nesterov). > > [ Fix for ptrace_unfreeze_traced() by Oleg Nesterov ] > > Signed-off-by: Sebastian Andrzej Siewior > --- > v1…v2: > - Use also ->saved_state in task_state_match_and_set(). > - Wait in wait_task_inactive() until the desired task state is in > ->__state so that the task won't wake up a as part of lock > resolving. Pointed out by Oleg Nesterov. > > include/linux/sched.h | 128 ++++++++++++++++++++++++++++++++++++++++++++++++-- > kernel/ptrace.c | 25 +++++---- > kernel/sched/core.c | 11 +++- > 3 files changed, 146 insertions(+), 18 deletions(-) > > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -118,12 +118,8 @@ struct task_group; > > #define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING) > > -#define task_is_traced(task) ((READ_ONCE(task->__state) & __TASK_TRACED) != 0) > - > #define task_is_stopped(task) ((READ_ONCE(task->__state) & __TASK_STOPPED) != 0) > > -#define task_is_stopped_or_traced(task) ((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0) > - > /* > * Special states are those that do not use the normal wait-loop pattern. See > * the comment with set_special_state(). Urgh, so I have reworking all this somewhere on my todo list as well. Except I mean to move it away from using p->__state entirely. We should not be keeping canonical state in there. As is, I think we can write task_is_stopped() like: #define task_is_stopped(task) ((task)->jobctl & JOBCTL_STOP_PENDING) Because jobctl is in fact the canonical state. I'm still not sure if we can do the same with task_is_traced(), ideally that would be expressed in terms of (task)->ptrace. But ptrace_stop() hurts my brain. All that stuff is entirely to involved. Anyway, let me see if I can page some of that back..