Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4286983pxj; Wed, 12 May 2021 02:07:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwCQaO+OYMcHfwGVsuGDWY57Gs8/rUXYRAtFyvO3tZTHcy583V4Q8MvdMYwQ1fc7OWHJbcq X-Received: by 2002:a17:906:8307:: with SMTP id j7mr35353346ejx.420.1620810453466; Wed, 12 May 2021 02:07:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620810453; cv=none; d=google.com; s=arc-20160816; b=PfKcOOiNbgC1+55eh1rjdbv87z0pB5xJ+eTKA45wOrCFw9FLOIhkjqDqDFIwZSwa8o RwftiIBQOJLFwCO3SW8k88GaVGuLJ9QRL6fQ82192Wy0pgci3UE3HF2F8Wpm/IT7Z1IF TO/KzLhTEzx1jN03/m0ujRa4Nr3WXqrIeIOCCdmdT8TcQvi1j8QJL5WaBOFEXVJZHfjn +OprMVEtJ/93in0/4NwIPHeIhI0YWykSDMacIQEM/KV7RL464z0psohp8TuB6fcC8YT1 E41J0G4+ZV0huzVKuMGd1G7t3su0t3JHBBX66Xo3I4vtbVxQhYa+yHjjvXFSshjzr5oW ISmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=lPA4jdR9pUSuLVhz3k4AQEC1mWmI2as4o2MUbJuqgJQ=; b=Xu9QkkMOwXQBDRIJQ/d7Kb4EgYo2paIGjZ04kCX1x5iGpuiBqkUi+bUQQpZofC3tE+ nQy/Aa11aXpeW2MvWVuCl09Uyb2JPFilRp+aa4KXU7fYBIu0tcbC1Ooxrjb5myf8KU4n vWvPTf3twqKt60a9syiS2dRKrQNCt/oHcRdU9qOkpu+HZQFcX1iwxvYZIW/78YWyakPA F33sBrXYo/qHHUua0HcjjzKqTO7ISkdshi35OfyweXu77//tTZiX+eigKzhJoGx1jrwp KVIPGH+/jTfwCbyVTmsZgxo5U+bHdl0BIzNxhn6Y7urMrMsuiabhcYMvLFJwPKCUZ8Gx B4gQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=Lb5g7JrV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b10si19254473ejd.716.2021.05.12.02.07.09; Wed, 12 May 2021 02:07:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=Lb5g7JrV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230477AbhELJHM (ORCPT + 99 others); Wed, 12 May 2021 05:07:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230452AbhELJHL (ORCPT ); Wed, 12 May 2021 05:07:11 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28584C061574 for ; Wed, 12 May 2021 02:06:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=lPA4jdR9pUSuLVhz3k4AQEC1mWmI2as4o2MUbJuqgJQ=; b=Lb5g7JrV7yuJ6mBKsPot0g7+dZ qrn6L4z1ZCNPx0OVoBc/CF8C8OZ/B41NgL4NyxU9gJPtmJdwuDEsaEY4/z9lp8354Up0wy2Adfp++ qMrxHeLKTBvdbMTyXzQnje+cW3KHHGB5SCDJOB3l6x5g7dOjc7cGHqiMVb0kIKMqArGUSVZnWcfZ4 SOrPPVzAF/eyLEyMOS8DgyKzqKVVIuZHmx+F5HNTY/ZMlzIbaC0FybPcNf5iMNXKxLGFpkDp3qkBR AE0whaXY/fZjAt31heV3I4YHW9B+mXNwsngSpE+L+sgClZQd5UDA53CYcZhkKDBDViDIXjTR+AmqQ IS8lXMXg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94 #2 (Red Hat Linux)) id 1lgkoX-002ORm-7v; Wed, 12 May 2021 09:05:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id B06CD300242; Wed, 12 May 2021 11:05:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 812AA2BA8A555; Wed, 12 May 2021 11:05:56 +0200 (CEST) Date: Wed, 12 May 2021 11:05:56 +0200 From: Peter Zijlstra To: Chris Hyser Cc: Joel Fernandes , Josh Don , Ingo Molnar , Vincent Guittot , Valentin Schneider , Mel Gorman , LKML , Thomas Glexiner Subject: Re: [PATCH 17/19] sched: Inherit task cookie on fork() Message-ID: References: <20210422120459.447350175@infradead.org> <20210422123308.980003687@infradead.org> <3dbce4ff-44ed-73ca-2ea1-97b126dd664e@oracle.com> <2b4ae2b4-62e5-96be-ddae-b261139842c1@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2b4ae2b4-62e5-96be-ddae-b261139842c1@oracle.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 10, 2021 at 05:38:18PM -0400, Chris Hyser wrote: > On 5/10/21 4:47 PM, Joel Fernandes wrote: > > On Mon, May 10, 2021 at 12:23 PM Chris Hyser wrote: > > > > > > +void sched_core_fork(struct task_struct *p) > > > > > +{ > > > > > + RB_CLEAR_NODE(&p->core_node); > > > > > + p->core_cookie = sched_core_clone_cookie(current); > > > > > > > > Does this make sense also for !CLONE_THREAD forks? > > > > > > Yes. Given the absence of a cgroup interface, fork inheritance > > > (clone the cookie) is the best way to create shared cookie > > > hierarchies. The security issue you mentioned was handled in my > > > original code by setting a unique cookie on 'exec', but Peter took > > > that out for the reason mentioned above. It was part of the "lets > > > get this in compromise" effort. Right, not only that, given all this is moot when parent and child have the same PTRACE permissions, since if they do, they can inspect one another's innards anyway, exec()/fork() just fundamentally isn't a magical boundary we should care about. The only special case there is SUID exec(), because in that case the actual credentials change and the PTRACE permissions do actually change. I sorta had a patch to do that, but it's yuck because that cred change happens after the point of no return and we need an allocation for the new cookie. Now, we could rely on the fact that a task context allocation (GFP_KERNEL) for something as small as our cookie will never fail and hence we shouldn't be bothered by it, we should do the error path and yuck. > > Thanks for sharing the history of it. I guess one can argue that this > > policy is better to be hardcoded in userspace since core-scheduling > > can be used for non-security usecases as well. Maybe one could simply > > call the prctl(2) from userspace if they so desire, before calling > > exec() ? > > I think the defining use case is a container's init. If the cookie is set > for it by the container creator and without any other user code knowing > about core_sched, every descendant spawned will have the same cookie and be > in the same core_sched group much like the cgroup interface had provided. If > we create a unique cookie in the kernel either on fork or exec, we are > secure, but we will now have 1000's of core sched groups. > > CLEAR was also removed (temporarily, I hope) because a core_sched > knowledgeable program in the example core_sched container group should not > be able to remove itself from _all_ core sched groups. It can modify it's > cookie, but that is no different than the normal case. Note that much of clear is possible by using SHARE_FROM on your parent to reset the cookie. > Both of these beg for a kernel policy, but that discussion was TBD. Right, I need a Champion that actually cares about cgroups and has use-cases to go argue with TJ on this. I've proposed code that I think has sane semantics, but I'm not in a position to argue for it, given I think a world without cgroups is a better world :-)))