Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp4073784iog; Tue, 21 Jun 2022 11:26:20 -0700 (PDT) X-Google-Smtp-Source: AGRyM1trk1bLK/DQmFlOzjSPtuuvdkD0qzdi4c1dP5zFOCbKCHOzMBGQ57KWwgRD/aqsMaXUwgzF X-Received: by 2002:a63:7a5d:0:b0:40c:fcbe:4799 with SMTP id j29-20020a637a5d000000b0040cfcbe4799mr2945165pgn.297.1655835979758; Tue, 21 Jun 2022 11:26:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655835979; cv=none; d=google.com; s=arc-20160816; b=TOXWXbyCt2fHN++UGGaEjj8rw3AbUWS3MLt8wk0b+8s1CEFTdNmAxd/dZo0QtqVkbb LY+87c1LRit4ke7rFzWwktrAXg46k6fxUQSm37z5HWIkuGp7x1Z/Se08uHApLtZaAmlh dNPS14cVOLSuaySyjWmr8lKWE6d53QibEuCNCl60LzaekX/lq7cByUfdGctyLC1nX3Xk GQFRv4BH3s9U7cYmGXyhUqozZ7kdBHxEYKZceNHBveTwf3fTdWyHV51TcdiE27LRSLps DKHrej49QgwwnK7F1b8daHVQoSJbfTbizHctiGHVcf6ipI3LmxSFgTLGEw4HhRVqUxem VosQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:mime-version:user-agent:message-id :in-reply-to:date:references:cc:to:from; bh=ZSVPcwyns/Hu7EIh5U1eFCR2d8PZKfdNre+k/PtQ/0U=; b=z71Ds+Q2uEpUCxGt0yZrR0JtGbHShEpEaCv+Q0eG9SlkNleSywd2b2ahWZbcixVFdL Evy0ZU7lhLFZB+L0nC/BFT3ZlpnT+uq6yA73eTaXsnPjYRtlx+O1tBHjNX3pPIK94j+s VJgAMJd7/wfuVzUFe8q2YtvW24XRvSl2MAH/wnUHnKYh0OIScgnU68dwBiSW2ExGt1HH w+TgRz0ATs4UDgN/fMvTRt2LjxYby5moDZWUh5Kxo0HiLc+9037iyOgb0p7yQXruBQtJ mVss1SwUse4Zt+isqwWL7gFhDktAGqXNnJ/joQdK7Y4zbL3gigAUlVvQU53IBEJ00pyp X7xQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q12-20020a056a00150c00b0050605f7112bsi21757336pfu.130.2022.06.21.11.26.07; Tue, 21 Jun 2022 11:26:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343568AbiFURrs (ORCPT + 99 others); Tue, 21 Jun 2022 13:47:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236144AbiFURrr (ORCPT ); Tue, 21 Jun 2022 13:47:47 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 029DA1D33D; Tue, 21 Jun 2022 10:47:43 -0700 (PDT) Received: from in01.mta.xmission.com ([166.70.13.51]:40154) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1o3hyW-0082j0-AD; Tue, 21 Jun 2022 11:47:40 -0600 Received: from ip68-227-174-4.om.om.cox.net ([68.227.174.4]:57204 helo=email.froward.int.ebiederm.org.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1o3hyU-00FcFh-4P; Tue, 21 Jun 2022 11:47:39 -0600 From: "Eric W. Biederman" To: Alexander Gordeev Cc: linux-kernel@vger.kernel.org, rjw@rjwysocki.net, Oleg Nesterov , mingo@kernel.org, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, mgorman@suse.de, bigeasy@linutronix.de, Will Deacon , tj@kernel.org, linux-pm@vger.kernel.org, Peter Zijlstra , Richard Weinberger , Anton Ivanov , Johannes Berg , linux-um@lists.infradead.org, Chris Zankel , Max Filippov , linux-xtensa@linux-xtensa.org, Kees Cook , Jann Horn , linux-ia64@vger.kernel.org References: <87a6bv6dl6.fsf_-_@email.froward.int.ebiederm.org> <20220505182645.497868-12-ebiederm@xmission.com> <877d5ajesi.fsf@email.froward.int.ebiederm.org> Date: Tue, 21 Jun 2022 12:47:30 -0500 In-Reply-To: (Alexander Gordeev's message of "Tue, 21 Jun 2022 17:15:47 +0200") Message-ID: <87bkulgb7x.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1o3hyU-00FcFh-4P;;;mid=<87bkulgb7x.fsf@email.froward.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.174.4;;;frm=ebiederm@xmission.com;;;spf=softfail X-XM-AID: U2FsdGVkX1/dpp5tIacC6XPqlYr0c2axC5loJaQ/H14= X-SA-Exim-Connect-IP: 68.227.174.4 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Virus: No X-Spam-DCC: XMission; sa02 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ***;Alexander Gordeev X-Spam-Relay-Country: X-Spam-Timing: total 1603 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 4.8 (0.3%), b_tie_ro: 3.2 (0.2%), parse: 1.29 (0.1%), extract_message_metadata: 12 (0.8%), get_uri_detail_list: 3.7 (0.2%), tests_pri_-1000: 9 (0.6%), tests_pri_-950: 1.09 (0.1%), tests_pri_-900: 0.84 (0.1%), tests_pri_-90: 182 (11.4%), check_bayes: 176 (11.0%), b_tokenize: 9 (0.6%), b_tok_get_all: 12 (0.7%), b_comp_prob: 3.0 (0.2%), b_tok_touch_all: 148 (9.2%), b_finish: 0.80 (0.0%), tests_pri_0: 1377 (85.9%), check_dkim_signature: 0.43 (0.0%), check_dkim_adsp: 2.4 (0.1%), poll_dns_idle: 0.65 (0.0%), tests_pri_10: 2.8 (0.2%), tests_pri_500: 9 (0.6%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH v4 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Alexander Gordeev writes: > On Tue, Jun 21, 2022 at 09:02:05AM -0500, Eric W. Biederman wrote: >> Alexander Gordeev writes: >> >> > On Thu, May 05, 2022 at 01:26:45PM -0500, Eric W. Biederman wrote: >> >> From: Peter Zijlstra >> >> >> >> Currently ptrace_stop() / do_signal_stop() rely on the special states >> >> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this >> >> state exists only in task->__state and nowhere else. >> >> >> >> There's two spots of bother with this: >> >> >> >> - PREEMPT_RT has task->saved_state which complicates matters, >> >> meaning task_is_{traced,stopped}() needs to check an additional >> >> variable. >> >> >> >> - An alternative freezer implementation that itself relies on a >> >> special TASK state would loose TASK_TRACED/TASK_STOPPED and will >> >> result in misbehaviour. >> >> >> >> As such, add additional state to task->jobctl to track this state >> >> outside of task->__state. >> >> >> >> NOTE: this doesn't actually fix anything yet, just adds extra state. >> >> >> >> --EWB >> >> * didn't add a unnecessary newline in signal.h >> >> * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up >> >> instead of in signal_wake_up_state. This prevents the clearing >> >> of TASK_STOPPED and TASK_TRACED from getting lost. >> >> * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared >> > >> > Hi Eric, Peter, >> > >> > On s390 this patch triggers warning at kernel/ptrace.c:272 when >> > kill_child testcase from strace tool is repeatedly used (the source >> > is attached for reference): >> > >> > while :; do >> > strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child >> > done >> > >> > It normally takes few minutes to cause the warning in -rc3, but FWIW >> > it hits almost immediately for ptrace_stop-cleanup-for-v5.19 tag of >> > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace. >> > >> > Commit 7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't >> > fail") suggests this WARN_ON_ONCE() is not really expected, yet we >> > observe a child in __TASK_TRACED state. Could you please comment here? >> > >> >> For clarity the warning is that the child is not in __TASK_TRACED state. >> >> The code is waiting for the code to stop in the scheduler in the >> __TASK_TRACED state so that it can safely read and change the >> processes state. Some of that state is not even saved until the >> process is scheduled out so we have to wait until the process >> is stopped in the scheduler. > > So I assume (checked actually) the return 0 below from kernel/sched/core.c: > wait_task_inactive() is where it bails out: > > 3303 while (task_running(rq, p)) { > 3304 if (match_state && unlikely(READ_ONCE(p->__state) != match_state)) > 3305 return 0; > 3306 cpu_relax(); > 3307 } > > Yet, the child task is always found in __TASK_TRACED state (as seen > in crash dumps): > >> 101447 11342 13 ce3a8100 RU 0.0 10040 4412 strace > 101450 101447 0 bb04b200 TR 0.0 2272 1136 kill_child > 108261 101447 2 d0b10100 TR 0.0 2272 532 kill_child > crash> task bb04b200 __state > PID: 101450 TASK: bb04b200 CPU: 0 COMMAND: "kill_child" > __state = 8, > > crash> task d0b10100 __state > PID: 108261 TASK: d0b10100 CPU: 2 COMMAND: "kill_child" > __state = 8, That is weird. >> At least on s390 it looks like there is a race between SIGKILL and >> ptrace_check_attach. That isn't good. >> >> Reading the code below there is something missing because I don't see >> anything making ptrace calls, and ptrace_check_attach (which contains >> the warning) only happens in the ptrace syscall. > > That is what I believe strace does when calling that code: > > strace -f -qq -e signal=none -e trace=sched_yield,/kill ./kill_child Thank you. That was my braino. I will have to see if it reproduces for me on x86 (I don't have an s390). Perhaps if I can reproduce it I can guess what is going wrong. So far it appears WARN_ON_ONCE has nothing to warn about yet it is warning. Eric