Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1022623ybl; Wed, 28 Aug 2019 08:32:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqzwPLel3mlk9xKSzQq02gXW9g4QDIT8dJnGDwqkTOHWUgKJ/EeIPwsv8C/xkCTMv1Nprwu+ X-Received: by 2002:a17:90a:1b0a:: with SMTP id q10mr4819395pjq.91.1567006363198; Wed, 28 Aug 2019 08:32:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567006363; cv=none; d=google.com; s=arc-20160816; b=dmlX/3toXkc14d1GpSniPmuAb42k4BVDwoRTbT1m80VTkRd6y/2ktEv0sVoU9EADDk KsRagILmLf4EAFJ29+Ee8Tn0cgg6T2Zz0uG/KEe2P4fVE68bJl6lgkWKfYqxICQlpHLX DU9bZFHJBmqB9VwQKcCUBflL1ZAoLrNHnUg7TmaKPftkxLbTsM0scXlNS6JCvkFqsMYa RruBFWsK188f4liLD6CioEj1CL+BGmh89dsIntzcQwlreDQmoERNBkQOo4M45F2c7vGU T9taMFdnnitzYcU2g/e1ZKeflUXJWmAf//BQWSza8mHBQrziVByTTd0T5+Yw3pJhlcdT J3iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=evC9kVrMC6EN78ogmySeb4fEVRYrq6T3BUTb0Gurrxo=; b=PQuSVkYBc+kCNV1TQSmTdUz58rgmWS2fPsFXG1tsCVYqqsIwLoK0j6kbbUX9tc6qEu IfSrEpq0oycVSEBB+YHgU3MsqAzfsCV/kftfi6HETsZojD/dHdV4ADxR3cPJl+U+3fD5 mpZO8vStMflNEVslmXBdSDrono1EdJAJwQKSnUTuMjL3LzWNnw1Ndqb8jwhv+aHUB6DH eYGO/kCUxpGehiKzdxj5/NfRhWrkR2A8OEB+bl9r/yMupQSpJuUA1Q9AqpW9zhjxLbLW 1HAwtDClcbj3AIdIQtzJLBqN1mkG1enPGN8gjMRwQuRBbONokBQxX9s6T618Ms5IKVNC DtJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m15si2476688pgt.495.2019.08.28.08.32.26; Wed, 28 Aug 2019 08:32:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726764AbfH1Paj (ORCPT + 99 others); Wed, 28 Aug 2019 11:30:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:7204 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726429AbfH1Pai (ORCPT ); Wed, 28 Aug 2019 11:30:38 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D6AAE308A9E0; Wed, 28 Aug 2019 15:30:37 +0000 (UTC) Received: from pauld.bos.csb (dhcp-17-51.bos.redhat.com [10.18.17.51]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C2565100197A; Wed, 28 Aug 2019 15:30:35 +0000 (UTC) Date: Wed, 28 Aug 2019 11:30:34 -0400 From: Phil Auld To: Peter Zijlstra Cc: Matthew Garrett , Vineeth Remanan Pillai , Nishanth Aravamudan , Julien Desfossez , Tim Chen , mingo@kernel.org, tglx@linutronix.de, pjt@google.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Aaron Lu , Aubrey Li , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3 Message-ID: <20190828153033.GA15512@pauld.bos.csb> References: <20190827211417.snpwgnhsu5t6u52y@srcf.ucam.org> <20190827215035.GH2332@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190827215035.GH2332@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Wed, 28 Aug 2019 15:30:38 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 27, 2019 at 11:50:35PM +0200 Peter Zijlstra wrote: > On Tue, Aug 27, 2019 at 10:14:17PM +0100, Matthew Garrett wrote: > > Apple have provided a sysctl that allows applications to indicate that > > specific threads should make use of core isolation while allowing > > the rest of the system to make use of SMT, and browsers (Safari, Firefox > > and Chrome, at least) are now making use of this. Trying to do something > > similar using cgroups seems a bit awkward. Would something like this be > > reasonable? > > Sure; like I wrote earlier; I only did the cgroup thing because I was > lazy and it was the easiest interface to hack on in a hurry. > > The rest of the ABI nonsense can 'trivially' be done later; if when we > decide to actually do this. I think something that allows the tag to be set may be needed. One of the use cases for this is virtualization stacks, where you really want to be able to keep the higher CPU count and to set up the isolation from management processes on the host. The current cgroup interface doesn't work for that because it doesn't apply the tag to children. We've been unable to fully test it in a virt setup because our VMs are made of a child cgroup per vcpu. > > And given MDS, I'm still not entirely convinced it all makes sense. If > it were just L1TF, then yes, but now... I was thinking MDS is really the reason for this. L1TF has mitigations but the only current mitigation for MDS for smt is ... nosmt. The current core scheduler implementation, I believe, still has (theoretical?) holes involving interrupts, once/if those are closed it may be even less attractive. > > > Having spoken to the Chrome team, I believe that the > > semantics we want are: > > > > 1) A thread to be able to indicate that it should not run on the same > > core as anything not in posession of the same cookie > > 2) Descendents of that thread to (by default) have the same cookie > > 3) No other thread be able to obtain the same cookie > > 4) Threads not be able to rejoin the global group (ie, threads can > > segregate themselves from their parent and peers, but can never rejoin > > that group once segregated) > > > > but don't know if that's what everyone else would want. > > > > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > > index 094bb03b9cc2..5d411246d4d5 100644 > > --- a/include/uapi/linux/prctl.h > > +++ b/include/uapi/linux/prctl.h > > @@ -229,4 +229,5 @@ struct prctl_mm_map { > > # define PR_PAC_APDBKEY (1UL << 3) > > # define PR_PAC_APGAKEY (1UL << 4) > > > > +#define PR_CORE_ISOLATE 55 > > #endif /* _LINUX_PRCTL_H */ > > diff --git a/kernel/sys.c b/kernel/sys.c > > index 12df0e5434b8..a054cfcca511 100644 > > --- a/kernel/sys.c > > +++ b/kernel/sys.c > > @@ -2486,6 +2486,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > > return -EINVAL; > > error = PAC_RESET_KEYS(me, arg2); > > break; > > + case PR_CORE_ISOLATE: > > +#ifdef CONFIG_SCHED_CORE > > + current->core_cookie = (unsigned long)current; > > This needs to then also force a reschedule of current. And there's the > little issue of what happens if 'current' dies while its children live > on, and current gets re-used for a new process and does this again. sched_core_get() too? Cheers, Phil > > > +#else > > + result = -EINVAL; > > +#endif > > + break; > > default: > > error = -EINVAL; > > break; > > > > > > -- > > Matthew Garrett | mjg59@srcf.ucam.org --