Received: by 2002:a05:6a10:144:0:0:0:0 with SMTP id 4csp715387pxw; Fri, 8 Apr 2022 20:49:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwcQrLaT5Od6rIL/GHiZmlhAeQLGFuOfMs+VL8n7U0JmVMuxp3MlroiFxFUr4Fsfy3nkTh5 X-Received: by 2002:a17:902:9a81:b0:158:1c91:4655 with SMTP id w1-20020a1709029a8100b001581c914655mr6568995plp.162.1649476193172; Fri, 08 Apr 2022 20:49:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649476193; cv=none; d=google.com; s=arc-20160816; b=IKoZ1Nw8hDmQnXkLNuTLbmawGZvGjZ5/jr2kbycxztkJ2FjulZpy7Zpf/t8J1X/QDi z5hv4eSk/MQ9FwBt/GYX4no5nuorXNIONMtdQov0y206KHinqjfXvMwrc9PRz+D5UyCb hE9EXEBZA6muxY0eN+kBr54N/urLiert5uWqVX/inMDswX8Q12MyhAsju7YB+3zj7jgd TdA+FcFjq2JE7LUgN41iTXzLWkUuWGwTdTasUqebRy/Vh5bD1XvMpRRFECXz7xYGAdws WKIiV4OjQvCi5/uaMO4YKh8QZVI1gQsEMFssRVY85OntIxSmZ6SltqYDaKTcriWvZ4Yj dc8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:mime-version:user-agent:message-id :in-reply-to:date:references:cc:to:from; bh=JcN1IvcYk60KuNgDb8wGbSg70+PiGoEc+v/6QkLHcLk=; b=P0Qg+4rlrVBlsYyxQ4lQZQgsmNnFN4Z2DI0GZue24c63MDf4A9Fo6pt9sy8kVV/KdA q+8cHWI0AASj01E46LNoNfDhjQ+PPLPoC9z8AzwBqgVXnoFcLXhkQ1aSzDInr2SC1BiS i6/j6e9MuZE/hO1+a/vWsPUGFG3ETPbddYAqow3XIIUjSv9jxWWH8DToZjtT5AE4Snp0 SgLKjlserAABXMnk+02JZ2axoNVs9i/efim9nNrmKBV87zPzfaLSevmfHomdn5OP8cJF cAYM3CgH4mJ3sCjn5VBMmloISN//wdTJW5fL/dYqJmpuBlVB/NTI/Wdig7F1l1UrmStG PhHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z11-20020a170903018b00b00153b2d16405si3079374plg.13.2022.04.08.20.49.37; Fri, 08 Apr 2022 20:49:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231681AbiDHTnR (ORCPT + 99 others); Fri, 8 Apr 2022 15:43:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230264AbiDHTnQ (ORCPT ); Fri, 8 Apr 2022 15:43:16 -0400 Received: from out01.mta.xmission.com (out01.mta.xmission.com [166.70.13.231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40EF01A801 for ; Fri, 8 Apr 2022 12:41:11 -0700 (PDT) Received: from in01.mta.xmission.com ([166.70.13.51]:42746) by out01.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1ncuTk-00C3Ie-SS; Fri, 08 Apr 2022 13:41:08 -0600 Received: from ip68-227-174-4.om.om.cox.net ([68.227.174.4]:42862 helo=email.froward.int.ebiederm.org.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1ncuTi-007Kjk-MT; Fri, 08 Apr 2022 13:41:08 -0600 From: "Eric W. Biederman" To: Peter Zijlstra Cc: Oleg Nesterov , Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org, Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Ingo Molnar , Juri Lelli , Mel Gorman , Steven Rostedt , Thomas Gleixner , Vincent Guittot References: <20220314185429.GA30364@redhat.com> <20220315142944.GA22670@redhat.com> <20220405101026.GB34954@worktop.programming.kicks-ass.net> <20220405102849.GA2708@redhat.com> <20220407121340.GA2762@worktop.programming.kicks-ass.net> <87v8vk8q4g.fsf@email.froward.int.ebiederm.org> <20220408090908.GO2731@worktop.programming.kicks-ass.net> Date: Fri, 08 Apr 2022 14:40:42 -0500 In-Reply-To: <20220408090908.GO2731@worktop.programming.kicks-ass.net> (Peter Zijlstra's message of "Fri, 8 Apr 2022 11:09:08 +0200") Message-ID: <874k332wjp.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1ncuTi-007Kjk-MT;;;mid=<874k332wjp.fsf@email.froward.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.174.4;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18gdPkKSpj7Z7VBxZ2y31rC9yKS9HjfR0w= X-SA-Exim-Connect-IP: 68.227.174.4 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Peter Zijlstra X-Spam-Relay-Country: X-Spam-Timing: total 1429 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 13 (0.9%), b_tie_ro: 11 (0.8%), parse: 0.97 (0.1%), extract_message_metadata: 11 (0.8%), get_uri_detail_list: 1.54 (0.1%), tests_pri_-1000: 11 (0.8%), tests_pri_-950: 1.17 (0.1%), tests_pri_-900: 0.96 (0.1%), tests_pri_-90: 85 (5.9%), check_bayes: 83 (5.8%), b_tokenize: 8 (0.5%), b_tok_get_all: 8 (0.6%), b_comp_prob: 2.6 (0.2%), b_tok_touch_all: 62 (4.3%), b_finish: 0.85 (0.1%), tests_pri_0: 1288 (90.1%), check_dkim_signature: 0.54 (0.0%), check_dkim_adsp: 2.4 (0.2%), poll_dns_idle: 0.49 (0.0%), tests_pri_10: 4.1 (0.3%), tests_pri_500: 11 (0.8%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH v2] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT. X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Peter Zijlstra writes: > On Thu, Apr 07, 2022 at 05:50:39PM -0500, Eric W. Biederman wrote: >> Given that fundamentally TASK_WAKEKILL must be added in ptrace_stop and >> removed in ptrace_attach I don't see your proposed usage of jobctl helps >> anything fundamental. >> >> I suspect somewhere there is a deep trade-off between complicating >> the scheduler to have a very special case for what is now >> TASK_RTLOCK_WAIT, and complicating the rest of the code with having >> TASK_RTLOCK_WAIT in __state and the values that should be in state >> stored somewhere else. > > The thing is; ptrace is a special case. I feel very strongly we should > not complicate the scheduler/wakeup path for something that 'never' > happens. I was going to comment that I could not understand how the saved_state mechanism under PREEMPT_RT works. Then I realized that wake_up_process and wake_up_state call try_to_wake_up which calls ttwu_state_match which modifies saved_state. The options appear to be that either ptrace_freeze_traced modifies __state/state to remove TASK_KILLABLE. Or that something clever happens in ptrace_freeze_traced that guarantees the task does not wake up. Something living in kernel/sched/* like wait_task_inactive. I can imagine adding add a loop around freezable_schedule in ptrace_stop. That does something like: do { freezable_schedule(); } while (current->jobctl & JOBCTL_PTRACE_FREEZE); Unfortunately after a SIGKILL is delivered the process will never sleep unless there is a higher priority process to preempt it. So I don't think that is a viable solution. What ptrace_freeze_traced and ptrace_unfreeze_traced fundamentally need is that the process to not do anything interesting, so that the tracer process can modify the process and it's task_struct. That need is the entire reason ptrace does questionable things with with __state. So if we can do something better perhaps with a rewritten freezer it would be a general code improvement. The ptrace code really does want TASK_KILLABLE semantics the entire time a task is being manipulated by the ptrace system call. The code in ptrace_unfreeze_traced goes through some gymnastics to detect if a process was killed while traced (AKA to detect a missed SIGKILL) and to use wake_up_state to make the task runnable instead of putting it back in TASK_TRACED. So really all that is required is a way to ask the scheduler to just not schedule the process until the ptrace syscall completes and calls ptrace_unfreeze_traced. Eric