Received: by 2002:ac0:a679:0:0:0:0:0 with SMTP id p54csp1562153imp; Fri, 22 Feb 2019 06:11:40 -0800 (PST) X-Google-Smtp-Source: AHgI3IZ0g5hz/wOy+2hxpuK8P9osikgKDV7Ra5HoRsdC+8GgwvkUFdgg2tVwsoVCTKJmx6z854cU X-Received: by 2002:a63:515d:: with SMTP id r29mr4080843pgl.350.1550844700476; Fri, 22 Feb 2019 06:11:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550844700; cv=none; d=google.com; s=arc-20160816; b=UZu83DROaBmWGx+ILNKRc06PGl5ltsMg6O1gww5+8ZPPfI5DJi0yJtAku4g6RiT0a/ bKSLm0s1c2ar6qk4mMLVmpRn6JdfOUGDVBbMi3v0a0fzohZ/hvgZun/xl74k0NTnHCtb RX2ogN9n8oo1I5T9zB/7BT6/iVAfLQopJ0mxIF0CniR/Ieed81yPqGiCo/mDAxO/gi/t m9rDKVGAtAaSU6ZhH5jxrFvwQb7yBISwDqBZO2Xz0p0XQtCgq7ebQEa35noQ9HaC8Tsa E+XgaftIwO/ecxAPhUeK3K7j/AzCIzRP0/9GN+kMQGvpQjDjW0YXhmkWpwaPD5+1V5EC uRoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Env7SzcD+2JiH9wlj6A1wtmyekfAP+V3cKGvXrTS4wk=; b=SwxDy/tCqNFRTxZzDlsRdEbbS0rI8jRL8daY7Cvk96MXV7QMmdUqJbGrgWS3RZ+UGI XY+nuwG1i1jCiQY5byHTqTC4BYi+OtaxPbBDMhHhCDAFvDJQhmsp4arppsRbFdR+SYyX VyhSn4n973cQml7j61WbKuctQcgPqUOkKZeW41s5WZqG5NEPffa74nSGLIb/lAqgu62C VNhocZIiabFsGpgd3NLqfiNRUx2/WIHQoSiypLEmqRUlFPmNh2qz0Iy9N06MrqvlI7u5 pTl1BauQt+A4/oGdINxI7G28ftzh32lZEdK/v4YKijSZGenJRsskkpE2UBfEZPXNcKrs KNMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=QmvmGV4N; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c4si1471174pfn.83.2019.02.22.06.11.24; Fri, 22 Feb 2019 06:11:40 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=QmvmGV4N; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726890AbfBVOLD (ORCPT + 99 others); Fri, 22 Feb 2019 09:11:03 -0500 Received: from merlin.infradead.org ([205.233.59.134]:48042 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726352AbfBVOLD (ORCPT ); Fri, 22 Feb 2019 09:11:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=Env7SzcD+2JiH9wlj6A1wtmyekfAP+V3cKGvXrTS4wk=; b=QmvmGV4N+6XPewqT6mpKK0lS4 o3SWP8DrdgCiCZf7Fbyd1mwl9B6m06IYMTcsBgvH01Jh4nGDN89ARGgWmvJV+WUkCgHz5nNfp/loY /culqbrU4Krfx/qVQnrk58eTJ2O7Rxa9LTWzVidF0Do546SGETzQ2a/mJLeM0BABdHA2zBOp5L3eq DhV0PjO8pGw5N7NmzHXEAj4OOFDxzDylAbHv1rqGMJ/aEQ1uP6hsAjJq/HEPvtYWhbRVcpHqcy7ZT ez4iR8Z1YLfVodZ89ZjiUvLTSVUaGu80vNpL2ZD11JnD/ORFFfqTBtiMYe3ynFViCnveD5+kP2+/M Hi5oCo2zw==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1gxBXC-0003Eq-En; Fri, 22 Feb 2019 14:10:38 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id A36EB2871C0A8; Fri, 22 Feb 2019 15:10:35 +0100 (CET) Date: Fri, 22 Feb 2019 15:10:35 +0100 From: Peter Zijlstra To: Greg Kerr Cc: Greg Kerr , mingo@kernel.org, tglx@linutronix.de, Paul Turner , tim.c.chen@linux.intel.com, torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, subhra.mazumdar@oracle.com, fweisbec@gmail.com, keescook@chromium.org Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling Message-ID: <20190222141035.GZ32494@hirez.programming.kicks-ass.net> References: <20190218165620.383905466@infradead.org> <20190220094255.GE32494@hirez.programming.kicks-ass.net> <20190220183355.GA213003@kerrnel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190220183355.GA213003@kerrnel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 20, 2019 at 10:33:55AM -0800, Greg Kerr wrote: > > On Tue, Feb 19, 2019 at 02:07:01PM -0800, Greg Kerr wrote: > Using cgroups could imply that a privileged user is meant to create and > track all the core scheduling groups. It sounds like you picked cgroups > out of ease of prototyping and not the specific behavior? Yep. Where a prtcl() patch would've been similarly simple, the userspace part would've been more annoying. The cgroup thing I can just echo into. > > As it happens; there is actually a bug in that very cgroup patch that > > can cause undesired scheduling. Try spotting and fixing that. > > > This is where I think the high level properties of core scheduling are > relevant. I'm not sure what bug is in the existing patch, but it's hard > for me to tell if the existing code behaves correctly without answering > questions, such as, "Should processes from two separate parents be > allowed to co-execute?" Sure, why not. The bug is that we set the cookie and don't force a reschedule. This then allows the existing task selection to continue; which might not adhere to the (new) cookie constraints. It is a transient state though; as soon as we reschedule this gets corrected automagically. A second bug is that we leak the cgroup tag state on destroy. A third bug would be that it is not hierarchical -- but that this point meh. > > Another question is if we want to be L1TF complete (and how strict) or > > not, and if so, build the missing pieces (for instance we currently > > don't kick siblings on IRQ/trap/exception entry -- and yes that's nasty > > and horrible code and missing for that reason). > > > I assumed from the beginning that this should be safe across exceptions. > Is there a mitigating reason that it shouldn't? I'm not entirely sure what you mean; so let me expound -- L1TF is public now after all. So the basic problem is that a malicious guest can read the entire L1, right? L1 is shared between SMT. So if one sibling takes a host interrupt and populates L1 with host data, that other thread can read it from the guest. This is why my old patches (which Tim has on github _somewhere_) also have hooks in irq_enter/irq_exit. The big question is of course; if any data touched by interrupts is worth the pain. > > So first; does this provide what we need? If that's sorted we can > > bike-shed on uapi/abi. > I agree on not bike shedding about the API, but can we agree on some of > the high level properties? For example, who generates the core > scheduling ids, what properties about them are enforced, etc.? It's an opaque cookie; the scheduler really doesn't care. All it does is ensure that tasks match or force idle within a core. My previous patches got the cookie from a modified preempt_notifier_register/unregister() which passed the vcpu->kvm pointer into it from vcpu_load/put. This auto-grouped VMs. It was also found to be somewhat annoying because apparently KVM does a lot of userspace assist for all sorts of nonsense and it would leave/re-join the cookie group for every single assist. Causing tons of rescheduling. I'm fine with having all these interfaces, kvm, prctl and cgroup, and I don't care about conflict resolution -- that's the tedious part of the bike-shed :-) The far more important questions are if there's enough workloads where this can be made useful or not. If not, none of that interface crud matters one whit, we can file these here patches in the bit-bucket and happily go spend out time elsewhere.