Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp239482pxu; Wed, 25 Nov 2020 01:42:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJzYVPbu7H0UV8BQ6lg4Qn6IC2ZB72wCcbijSxYBRPJkWk/uW15lri8NJferNWDLmIT4dvdK X-Received: by 2002:aa7:d34e:: with SMTP id m14mr2639064edr.42.1606297355429; Wed, 25 Nov 2020 01:42:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606297355; cv=none; d=google.com; s=arc-20160816; b=Jom1LK7VOOEk3b18FwEy9WgeQb4B3lhTJFly+nn31Iwf/ER9OROEuRfRi/g6pe+Hgk mK7T4zr4fePvl0JdNo/beT0S7R3OmAUDYfS9XFTVgbVxL2jJ2zP8JfoWlDgHhRLfDb3j 1NCa3TOZrGyocU1n5n+UIn2Jjao/4LETtRg8MIsw8xOyJIEEoKfee7tiTj/6282h/v6n z9AJRXTU214oEA9S2GAEfAB55ePy/FVQoLx/NATPCWH8ld5i2qEJCr7QWTIA2uBAAjtg igEtglbgjHIOZJPEjV2xlJnSM7h93/wKAfnECnMQQstiSsuBVdmgZSlVsaZtFb/2ZKfy Vc0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=NaHFaphjx/izFPgOWBYEKxWLZnpwZ/VDl8dDgyk0gRo=; b=tmWuNDi9iBaeZRblhhG5u3uoDiqb9M0+UCp/3wSq7hxsAIalU9JK9+MtxP2epsynKX dEoDwKcpFczrrRyZ+iP/0GAtNaARE8njYWi+Y0Fml5ZYFwfY9aFK89akExUmEyXt8Os0 u40CO46Kzx24sdf961yi4PqKOCK/vtn5WsDoouRmFBiNU9zZlSQGE3ILbJ3voAj0bVz9 BXpVejlWFOtqtq3hWslX0RUaDDp4WFA58cb9OdV+ZcYloiWOWiHa8leCcCz+Hgxvx78b UaVCg2o9lFUKtGuqBsC6qdJr69bJJ0c+zce10+bb7MdlZ8h8a3weeDJRTCov9CE8JGO+ K1yg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=merlin.20170209 header.b=PlVTt2CN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d64si867205edd.257.2020.11.25.01.42.10; Wed, 25 Nov 2020 01:42:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=merlin.20170209 header.b=PlVTt2CN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726442AbgKYJiE (ORCPT + 99 others); Wed, 25 Nov 2020 04:38:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725921AbgKYJiD (ORCPT ); Wed, 25 Nov 2020 04:38:03 -0500 Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A87A2C0613D4 for ; Wed, 25 Nov 2020 01:38:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=NaHFaphjx/izFPgOWBYEKxWLZnpwZ/VDl8dDgyk0gRo=; b=PlVTt2CNgTL1afwnkPpRYAxYdm QMtQYk+cLALNCZHKU5bcbRyCpr0Vl4ucxtVD5T8J6uj7XAhqvK8vkIxEQOxvQJtL6ha5ZTgpEt/tZ VUKi/rfSL6kaq26ISAGre2eF5PX2KH/SkzRqoSXMsv+EiSuIx8uBG9SC5/cYiVjSv1YQkyHoXTA1x OaDmlwe0B0avqr3qsnNROFxKu3lO+PxwZhOEiXWd33r5zWXaKbNtH38ugi8YLheSTiA/zezSyYrnO gfCDCbNT0r5BkPbEpVLmy0Fu6hPWbs1UNuBRxVW616ZZAj13U43uI7wmQuCz3ZmjRE0LDN4tKhEH4 ctDs+62Q==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1khrEV-0004O2-Qu; Wed, 25 Nov 2020 09:37:04 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 356C03012DF; Wed, 25 Nov 2020 10:37:00 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 1AD0C200C65A2; Wed, 25 Nov 2020 10:37:00 +0100 (CET) Date: Wed, 25 Nov 2020 10:37:00 +0100 From: Peter Zijlstra To: "Joel Fernandes (Google)" Cc: Nishanth Aravamudan , Julien Desfossez , Tim Chen , Vineeth Pillai , Aaron Lu , Aubrey Li , tglx@linutronix.de, linux-kernel@vger.kernel.org, mingo@kernel.org, torvalds@linux-foundation.org, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Phil Auld , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini , vineeth@bitbyteword.org, Chen Yu , Christian Brauner , Agata Gruza , Antonio Gomez Iglesias , graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com, pjt@google.com, rostedt@goodmis.org, derkling@google.com, benbjiang@tencent.com, Alexandre Chartre , James.Bottomley@hansenpartnership.com, OWeisse@umich.edu, Dhaval Giani , Junaid Shahid , jsbarnes@google.com, chris.hyser@oracle.com, Ben Segall , Josh Don , Hao Luo , Tom Lendacky , Aubrey Li , Tim Chen , "Paul E . McKenney" Subject: Re: [PATCH -tip 18/32] kernel/entry: Add support for core-wide protection of kernel-mode Message-ID: <20201125093700.GP2414@hirez.programming.kicks-ass.net> References: <20201117232003.3580179-1-joel@joelfernandes.org> <20201117232003.3580179-19-joel@joelfernandes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201117232003.3580179-19-joel@joelfernandes.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 17, 2020 at 06:19:48PM -0500, Joel Fernandes (Google) wrote: > Core-scheduling prevents hyperthreads in usermode from attacking each > other, but it does not do anything about one of the hyperthreads > entering the kernel for any reason. This leaves the door open for MDS > and L1TF attacks with concurrent execution sequences between > hyperthreads. > > This patch therefore adds support for protecting all syscall and IRQ > kernel mode entries. Care is taken to track the outermost usermode exit > and entry using per-cpu counters. In cases where one of the hyperthreads > enter the kernel, no additional IPIs are sent. Further, IPIs are avoided > when not needed - example: idle and non-cookie HTs do not need to be > forced into kernel mode. > > More information about attacks: > For MDS, it is possible for syscalls, IRQ and softirq handlers to leak > data to either host or guest attackers. For L1TF, it is possible to leak > to guest attackers. There is no possible mitigation involving flushing > of buffers to avoid this since the execution of attacker and victims > happen concurrently on 2 or more HTs. > .../admin-guide/kernel-parameters.txt | 11 + > include/linux/entry-common.h | 12 +- > include/linux/sched.h | 12 + > kernel/entry/common.c | 28 +- > kernel/sched/core.c | 241 ++++++++++++++++++ > kernel/sched/sched.h | 3 + > 6 files changed, 304 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index bd1a5b87a5e2..b185c6ed4aba 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -4678,6 +4678,17 @@ > > sbni= [NET] Granch SBNI12 leased line adapter > > + sched_core_protect_kernel= > + [SCHED_CORE] Pause SMT siblings of a core running in > + user mode, if at least one of the siblings of the core > + is running in kernel mode. This is to guarantee that > + kernel data is not leaked to tasks which are not trusted > + by the kernel. A value of 0 disables protection, 1 > + enables protection. The default is 1. Note that protection > + depends on the arch defining the _TIF_UNSAFE_RET flag. > + Further, for protecting VMEXIT, arch needs to call > + KVM entry/exit hooks. > + > sched_debug [KNL] Enables verbose scheduler debug messages. > > schedstats= [KNL,X86] Enable or disable scheduled statistics. So I don't like the parameter name, it's too long. Also I don't like it because its a boolean. You're adding syscall,irq,kvm under a single knob where they're all due to different flavours of broken. Different hardware might want/need different combinations. Hardware without MDS but with L1TF wouldn't need the syscall hook, but you're not givng a choice here. And this is generic code, you can't assume stuff like this.