Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp2861814pxf; Sun, 21 Mar 2021 09:23:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJztuTE4oNhq21TLLgKi575vKp5i+zctDwwVkBlItOlzwh75/3BENhyIotzj/CapBXBWhJLQ X-Received: by 2002:a17:906:c051:: with SMTP id bm17mr14892084ejb.21.1616343784394; Sun, 21 Mar 2021 09:23:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616343784; cv=none; d=google.com; s=arc-20160816; b=hKMg9rgvfkSZdU9PC7X/7lQRGJEdtNuJjJ2VS2OTeAmhcxq/wCvLkwLdYH40ZkazNC eUbWytwmy43C5/FpO/h3lXEOp3YFuKaCLPCFyXYYCqZnisXXeFDVECs1Lxn/iBJGZ2PP gOp5Kkvvg1t8RSR1CEcoeHF1qJJojBjMw3KfcyxXO87f+Own62v/P0svRrD3Qmt5hIfX RPPFDxLKlW5hha3kUQZSxh4kKAyHopf6W4bLvZSs7b4L2IItV1uVdwIrb9Ab/UOo5g16 1KCHbbjXSCApmSMXa+zmuJQUiRU5mMgMUfpKt0nRy/Lxv28GeadH1cZmLPQWushwSz7i WJRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=lfjjDnUJu9opPPSvQVQubITocISsFGgS1UWLUqrIkKA=; b=iPiHnZLnxvemS/g4tR/CFhFR69KjIOlbij/DxMD1Q1t8kVc8JYKAewU8WYnRA+o4nB HaHWVBGRy2XukMPieTm4HNTQiscxc5c3P/MzwLBwSgNIpNPSad259Vu6/Y4OVHgp+Cc4 4L/HbTLpkKqycB144Av0ZPvovcButqRlXQ5o9zkkt2XRtm3Pr7YS1LcbLXsDvzW+GH7I e96md5hpf7WZB/J8RDUaE6EBddQ2l2HbfhuFNIr8HBoXhgLcKRIbAfgNCKEqRP9+Hxll guscCdyMNC5a//o5pJREiG3ydQJweA7JJX1JOZI5kNH//F+x8+pqpbaIEfGQ0b2Ofdmg ROVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=2Q4EYjPP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c7si9890937ejp.450.2021.03.21.09.22.41; Sun, 21 Mar 2021 09:23:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=2Q4EYjPP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230112AbhCUPnH (ORCPT + 99 others); Sun, 21 Mar 2021 11:43:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230115AbhCUPmj (ORCPT ); Sun, 21 Mar 2021 11:42:39 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AB14C061762 for ; Sun, 21 Mar 2021 08:42:39 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id q11so5273655pld.11 for ; Sun, 21 Mar 2021 08:42:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=lfjjDnUJu9opPPSvQVQubITocISsFGgS1UWLUqrIkKA=; b=2Q4EYjPPArir5nkhENmAqet6f7qHsv+Fd0FeFVmw9hhZRfkEXbnUUQjDkz2ovHu3Fv CLgenqXPoHqF/MtNiJ/GCf1NUNW/yMsohywIk0spdMmjpgcq4FG28nDcOQCPqkc6nuOA 44YxEIUkOavNg+lt0Jn7q5R0u5yyW0pZPo/z9dy50yllOMZcImxM0xgWn0eqSTWoWPiT ot5sn5G07FOsMrUgli1JPU2WEdPpQa231UTywAfBwcg2rl8Hd7YxR3E0kedGBhcq0mFp whHxeDJacp/3JHCR3tGVWe7WuQhTixcw6HSYq055Qdi6XS7kHRhKnKNq8FNEUiHdoVdg ZaEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=lfjjDnUJu9opPPSvQVQubITocISsFGgS1UWLUqrIkKA=; b=GTl4udfGJiF/5PdCIRw9qY28aU7mmmniqfX8Lqk9Ii8jJQ4kgyb8wi257uzZgb1K8a lXmREsK1OTl7RHshMn7M3qeH2HUUS/HkLvAbiMHuI5Qp+F6FnsOddqP8tqTjS3C+WYXC XJysd4oAACW0eVhr9g1JktGWTqMoCISaOCZkaxddzcHQdNZquQUG9+eR04PXnor0NMjB DcAe1QqtiqFHiujxWk97qY3uUgtCdIA7IacavhjeDHd/1bFBOVxDKfKHB+z7lqOdHNaF 7fctkmWUSCr39UCoW7YrsAoeqsiDBBnrVnSGv3gXko4kpRZLsfMnFi61Ba7YP5X+mxMe +AIg== X-Gm-Message-State: AOAM5316zizMoIkZ49ZAorg7WeVDZSosz68mCUCszVA30IxsuDuys/9h sBomGBNiAJJNu7ZpIr/l8dd6Vg== X-Received: by 2002:a17:90a:b392:: with SMTP id e18mr8771052pjr.66.1616341358593; Sun, 21 Mar 2021 08:42:38 -0700 (PDT) Received: from [192.168.1.134] ([66.219.217.173]) by smtp.gmail.com with ESMTPSA id w2sm10522754pgh.54.2021.03.21.08.42.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 21 Mar 2021 08:42:38 -0700 (PDT) Subject: Re: [PATCHSET 0/2] PF_IO_WORKER signal tweaks To: "Eric W. Biederman" Cc: Linus Torvalds , io-uring , Linux Kernel Mailing List , Oleg Nesterov , criu@openvz.org References: <20210320153832.1033687-1-axboe@kernel.dk> <907b36b6-a022-019a-34ea-58ce46dc2d12@kernel.dk> From: Jens Axboe Message-ID: <718e91db-5f0b-9aed-7b65-9d41c9f9f8f4@kernel.dk> Date: Sun, 21 Mar 2021 09:42:36 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/21/21 9:18 AM, Eric W. Biederman wrote: > Jens Axboe writes: > >> On 3/20/21 4:08 PM, Eric W. Biederman wrote: >>> >>> Added criu because I just realized that io_uring (which can open files >>> from an io worker thread) looks to require some special handling for >>> stopping and freezing processes. If not in the SIGSTOP case in the >>> related cgroup freezer case. >>> >>> Linus Torvalds writes: >>> >>>> On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds >>>> wrote: >>>>> >>>>> Alternatively, make it not use >>>>> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it >>>>> unnecessarily allocate its own signal state, so that's "cleaner" but >>>>> not great either. >>>> >>>> Thinking some more about that, it would be problematic for things like >>>> the resource counters too. They'd be much better shared. >>>> >>>> Not adding it to the thread list etc might be clever, but feels a bit too scary. >>>> >>>> So on the whole I think Jens' minor patches to just not have IO helper >>>> threads accept signals are probably the right thing to do. >>> >>> The way I see it we have two options: >>> >>> 1) Don't ask PF_IO_WORKERs to stop do_signal_stop and in >>> task_join_group_stop. >>> >>> The easiest comprehensive implementation looks like just >>> updating task_set_jobctl_pending to treat PF_IO_WORKER >>> as it treats PF_EXITING. >>> >>> 2) Have the main loop of the kernel thread test for JOBCTL_STOP_PENDING >>> and call into do_signal_stop. >>> >>> It is a wee bit trickier to modify the io_workers to stop, but it does >>> not look prohibitively difficult. >>> >>> All of the work performed by the io worker is work scheduled via >>> io_uring by the process being stopped. >>> >>> - Is the amount of work performed by the io worker thread sufficiently >>> negligible that we don't care? >>> >>> - Or is the amount of work performed by the io worker so great that it >>> becomes a way for an errant process to escape SIGSTOP? >>> >>> As the code is all intermingled with the cgroup_freezer. I am also >>> wondering creating checkpoints needs additional stopping guarantees. >> >> The work done is the same a syscall, basically. So it could be long >> running and essentially not doing anything (eg read from a socket, no >> data is there), or it's pretty short lived (eg read from a file, just >> waiting on DMA). >> >> This is outside of my domain of expertise, which is exactly why I added >> you and Linus to make some calls on what the best approach here would >> be. My two patches obviously go route #1 in terms of STOP. And fwiw, >> I tested this: >> >>> To solve the issue that SIGSTOP is simply broken right now I am totally >>> fine with something like: >>> >>> diff --git a/kernel/signal.c b/kernel/signal.c >>> index ba4d1ef39a9e..cb9acdfb32fa 100644 >>> --- a/kernel/signal.c >>> +++ b/kernel/signal.c >>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask) >>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING)); >>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK)); >>> >>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING))) >>> + if (unlikely(fatal_signal_pending(task) || >>> + (task->flags & (PF_EXITING | PF_IO_WORKER)))) >>> return false; >>> >>> if (mask & JOBCTL_STOP_SIGMASK) >> >> and can confirm it works fine for me with 2/2 reverted and this applied >> instead. >> >>> Which just keeps from creating unstoppable processes today. I am just >>> not convinced that is what we want as a long term solution. >> >> How about we go with either my 2/2 or yours above to at least ensure we >> don't leave workers looping as schedule() is a nop with sigpending? If >> there's a longer timeline concern that "evading" SIGSTOP is a concern, I >> have absolutely no qualms with making the IO threads participate. But >> since it seems conceptually simple but with potentially lurking minor >> issues, probably not the ideal approach for right now. > > > Here is the signoff for mine. > > Signed-off-by: "Eric W. Biederman" > > Yours misses the joining of group stop during fork. So we better use > mine. I've updated it and attributed it to you, here is is: https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=4db4b1a0d1779dc159f7b87feb97030ec0b12597 > As far as I can see that fixes the outstanding bugs. Great! > Jens can you make a proper patch out of it and send it to Linus for > -rc4? I unfortunately have other commitments and this is all I can do > for today. Will do - I'm going to sanity run the current branch and do a followup pull request for Linus once I've verified everything is still sane. -- Jens Axboe