Received: by 2002:ac0:a679:0:0:0:0:0 with SMTP id p54csp651935imp; Thu, 21 Feb 2019 08:30:27 -0800 (PST) X-Google-Smtp-Source: AHgI3IY8tmQKAULwud4Elyqw03Ger1pAS+0V0Bs3jP9yYykes2gH5VpdYnYDmJofbBi939A5xSqQ X-Received: by 2002:aa7:8d57:: with SMTP id s23mr40892822pfe.237.1550766626916; Thu, 21 Feb 2019 08:30:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550766626; cv=none; d=google.com; s=arc-20160816; b=QIxwGVAuHjG0ZNGLcF6t161FAYOyxFuE3y05aOkBZq6l3fI8C9Jn6JPA7eNCJl9EZL 3ETDb+1kf8KpJD7JcZfhyncHcZVytMktnhEwObt3eHPaj8Ih49R+vhYB6AWU0TNFxGCX YfvSyzsSTzrU4/XTjRpzvdxYF9aBiTbMZm+Q/SU6DeZi6pK33Su2cdzJ66ZgPYNdSXDd c2aLhNPHgCXZoGgki2VD9HfP5SEn6jid8K7QJu6G7emc+LYnpl6vx6J46nZ5abU/WGlS Wlcfpr+8lvFjUlCGbOFwUiqIbWDX6CrUUeGJx0xbG2Q5/PI1hEpYMRKHvxNwF/h8rSha 9+pQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=eDBC9hxiH5Wd+QmE/h6aUGT2xBlPzOlndsl7x5NxgeM=; b=ovAgfsi0OPWc0KktBSDc9/S9P465lFeGRhrux9LJQljIZTBPTw7TTtZDoraFTMhw9J hL+KKnOnfGl3V2K7mSzOsTstc1n/9ptI3Zz6VzMOC+ZYbt4PEhzmHd6SNcZErksoGEYM viMglcaywMkPSIjXPXUeMhjX6x7S7j59jxm2xf9uJKIdBA71jaBXGgyPoGJjBTbx3nU9 5WZSXUDoLGliODr7sm3S7+NRc5ZD5w7kEu40RL75Q5pyzlkJlY7LrkaYbt807xpsxnh4 Ysyh1nbnmh98fNwInveHyAv0NdnZAmQnwjTFwarvNSNMwAU0ZYPsisJIzO6s7i3sXJe7 pJWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d125si21449045pfc.114.2019.02.21.08.30.11; Thu, 21 Feb 2019 08:30:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728181AbfBUQ31 (ORCPT + 99 others); Thu, 21 Feb 2019 11:29:27 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33827 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726074AbfBUQ31 (ORCPT ); Thu, 21 Feb 2019 11:29:27 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 573A681106; Thu, 21 Feb 2019 16:29:26 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.43.17.152]) by smtp.corp.redhat.com (Postfix) with SMTP id 0584C5D9D3; Thu, 21 Feb 2019 16:29:24 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Thu, 21 Feb 2019 17:29:26 +0100 (CET) Date: Thu, 21 Feb 2019 17:29:24 +0100 From: Oleg Nesterov To: Roman Gushchin Cc: Roman Gushchin , Tejun Heo , Kernel Team , "cgroups@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v8 0/7] freezer for cgroup v2 Message-ID: <20190221162923.GA26064@redhat.com> References: <20190219220252.4906-1-guro@fb.com> <20190220143748.GA9477@redhat.com> <20190220220020.GA16335@castle.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190220220020.GA16335@castle.DHCP.thefacebook.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 21 Feb 2019 16:29:26 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/20, Roman Gushchin wrote: > > On Wed, Feb 20, 2019 at 03:37:48PM +0100, Oleg Nesterov wrote: > > > > I tried to not argue with intent, but to be honest I am more and more > > sceptical... Lets forget about ptrace for the moment. > > > > Once again, why do we want a killable freezer? > > > > If a user wants to kill a frozen task from CGRP_FROZEN cgroup he can simply > > > > 1. send SIGKILL to that task > > > > 2. migrate it to the root cgroup. > > > > why this doesn't / can't work? > > It does work, but it doesn't look as a nice interface to take into > the cgroup v2 world. > > It just not clear, why killing a frozen task requires some cgroup-level > operations? It doesn't add anything except some additional complexity > to the userspace. Yes. But to me this is a reasonable trade-off because this way we do not add additional complexity to the kernel. Actually, "killable" is not that difficult afaics. "ptraceable" looks more problematic to me. Again, user-space can do 1. PTRACE_SEIZE 2. move the tracee to the root cgroup 3. do anything with the tracee 4. move it back > Generally speaking, any process hanging in D-state > for a long time isn't the nicest object from the userspace's point of view. Roman, this is unfair comparison ;) > Exactly as a SIGSTOPped process can be killed without sending SIGCONT, > why a frozen task would require some additional operations? this too, > And I'm not talking about the case, when the process which is sending > SIGKILL has no write access to cgroupfs. True. But there is another case. If admin wants to freeze a cgroup then it is not clear why a user which can send SIGKILL to a frozen process should wake it up. ------------------------------------------------------------------------------ Again, it is not that I hate the idea of killable/ptraceable freezer. Just I personally think it's not worth the trouble. Perhaps I am wrong, but so far I do not see a good implementation... And, apart from reading/writing the registers, what can ptrace do with a frozen tracee? This doesn't look like a "must have" feature to me. At least, may I ask you again to make (if possible) a separate patch which adds the ability to kill/ptrace? ------------------------------------------------------------------------------ > > Why I am starting to argue... The ability to kill a frozen task complicates > > the code, and since cgroup_enter_stopped() (in this version at least) doesn't > > properly interacts with freezable_schedule() leads to other problems. > > > > From 7/7: > > > > + cgroup.freeze > > + A read-write single value file which exists on non-root cgroups. > > + Allowed values are "0" and "1". The default is "0". > > + > > + Writing "1" to the file causes freezing of the cgroup and all > > + descendant cgroups. This means that all belonging processes will > > + be stopped and will not run until the cgroup will be explicitly > > + unfrozen. Freezing of the cgroup may take some time; > > ^^^^^^^^^^^^^^^^^^ > > it may take infinite time. > > > > Just suppose that a task does vfork() and this races with cgroup_do_freeze(true). > > If the new child notices JOBCTL_TRAP_FREEZE before exit/exec the cgroup will be > > never frozen. > > Hm, why? cgroup_update_frozen() called from cgroup_post_fork() should bring > the cgroup into the frozen state. If it's not true (I'm missing some race here), > it's a bug, but I don't see why it's not possible in general. A task P calls vfork() and creates the new child C. Now, how can the parent P (which sleeps in TASK_KILLABLE) call cgroup_enter_stopped() ? It can't until C exits or execs. C can't exit or exec because it is frozen. > > If I read the current kernel/cgroup/freezer.c correctly, CGROUP_FREEZING should > > "always" work (unless a task hangs in D state) and to me this looks more important > > than kill/ptrace support... > > Again, I don't see a case, when cgroup v1 freezer will work and the proposed > v2 freezer won't work in general. See above. Oleg.