Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1658482ybb; Sat, 21 Mar 2020 03:10:00 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuMwTyMvUq25R/B2n76GbKTwnnfiBUuA/imBv0UgEo+cqUG92vG6x9/7eefw8J1Gdf81WiM X-Received: by 2002:a05:6808:c8:: with SMTP id t8mr9615900oic.3.1584785400691; Sat, 21 Mar 2020 03:10:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584785400; cv=none; d=google.com; s=arc-20160816; b=Rs+zJXY2aRzLr0EPXTZ5DMIKo+CJfsaJqQHsQ36vNYowIqF5Q/Q10auPli5cTdBLKe fzs2H+XdVLTGrsvbDSN7fBF7PaxgMiKV9V2U6D84+bty2/3BcyPcoHhHisg7cKxtRjcy /JOYOmAhTHAE/QCSk+Uyiguu7OIj12B1KKU5y/3hsc14WQj2dw6r+M9e6uGgVH9jAsMy LHZt1z3gwOw8+F4IqCNB4cTAEQir2OugNFdQA+TVhg+snIwurUyYoiOykQvZFBnL+NF5 wBdByc5UkD8NO1TAlAkx5vdYvXsrJRNDESShoyze63WY5FPSRanm7TRyiBmithM4Sq9R yFzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=gB/O7ODTbVnRbNCAa894gvxYc9hvvgKcJWUQpKrf44c=; b=ixCqMKFtDVkdm4JbRHRn08dDg9Dzc59u4H5HB2WdTciULmup+ttWp6tshfItL0PxnU 3BLLzXB+NtdepGSEj6Nx6w6gD/EM6e+Uj6nDMuckemwBPC2/jFBf7I6TC9cHu3MhpFXn iyF+ZFwAxkzML/SqIkr7LSq/uh3Ihx5tyzwQnibkxDIj7rDkofQrj/5LnL7xTrr5C+95 i4a4JVznW4wr8NthaiIrvUrFeInsbO26OhPMRXDrCK1gst1MxS24F+fcCAJPYG2RO77f UG5ojbmHltaN8WTxAZ9GOc5Q9NIoiKRY3Iufmq1X7ZsMtxTOC0obJjobIB2Xyoh+iCJ0 lPJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e17si482814oti.82.2020.03.21.03.09.26; Sat, 21 Mar 2020 03:10:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727121AbgCUKFj (ORCPT + 99 others); Sat, 21 Mar 2020 06:05:39 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:37929 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725879AbgCUKFi (ORCPT ); Sat, 21 Mar 2020 06:05:38 -0400 Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jFb0X-0001JJ-3x; Sat, 21 Mar 2020 11:05:33 +0100 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 8E15CFFC8D; Sat, 21 Mar 2020 11:05:32 +0100 (CET) From: Thomas Gleixner To: "Singh\, Balbir" , "linux-kernel\@vger.kernel.org" Cc: "keescook\@chromium.org" , "Herrenschmidt\, Benjamin" , "x86\@kernel.org" Subject: Re: [RFC PATCH] arch/x86: Optionally flush L1D on context switch In-Reply-To: <034a2c0e2cc1bb0f4f7ff9a2c5cbdc269a483a71.camel@amazon.com> References: <20200313220415.856-1-sblbir@amazon.com> <87imj19o13.fsf@nanos.tec.linutronix.de> <97b2bffc16257e70b8aa98ee86622dc4178154c4.camel@amazon.com> <8736a3456r.fsf@nanos.tec.linutronix.de> <034a2c0e2cc1bb0f4f7ff9a2c5cbdc269a483a71.camel@amazon.com> Date: Sat, 21 Mar 2020 11:05:32 +0100 Message-ID: <87d096rpjn.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Balbir, "Singh, Balbir" writes: > On Fri, 2020-03-20 at 12:49 +0100, Thomas Gleixner wrote: >> I forgot the gory details by now, but having two entry points or a >> conditional and share the rest (page allocation etc.) is definitely >> better than two slightly different implementation which basically do the >> same thing. > > OK, I can try and dedup them to the extent possible, but please do remember > that > > 1. KVM is usually loaded as a module > 2. KVM is optional > > We can share code, by putting the common bits in the core kernel. Obviously so. >> > 1. SWAPGS fixes/work arounds (unless I misunderstood your suggestion) >> >> How so? SWAPGS mitigation does not flush L1D. It merily serializes SWAPGS. > > Sorry, my bad, I was thinking MDS_CLEAR (via verw), which does flush out > things, which I suspect should be sufficient from a return to user/signal > handling, etc perspective. MDS is affecting store buffers, fill buffers and load ports. Different story. > Right now, reading through > https://software.intel.com/security-software-guidance/insights/deep-dive-snoop-assisted-l1-data-sampling > , it does seem like we need this during a context switch, specifically since a > dirty cache line can cause snooped reads for the attacker to leak data. Am I > missing anything? Yes. The way this goes is: CPU0 CPU1 victim1 store secrit victim2 attacker read secrit Now if L1D is flushed on CPU0 before attacker reaches user space, i.e. reaches the attack code, then there is nothing to see. From the link: Similar to the L1TF VMM mitigations, snoop-assisted L1D sampling can be mitigated by flushing the L1D cache between when secrets are accessed and when possibly malicious software runs on the same core. So the important point is to flush _before_ the attack code runs which involves going back to user space or guest mode. >> Even this is uninteresting: >> >> victim in -> attacker in (stays in kernel, e.g. waits for data) -> >> attacker out -> victim in >> > > Not from what I understand from the link above, the attack is a function of > what can be snooped by another core/thread and that is a function of what > modified secrets are in the cache line/store buffer. Forget HT. That's not fixable by any flushing simply because there is no scheduling involved. CPU0 HT0 CPU0 HT1 CPU1 victim1 attacker store secrit victim2 read secrit > On return to user, we already use VERW (verw), but just return to user > protection is not sufficient IMHO. Based on the link above, we need to clear > the L1D cache before it can be snooped. Again. Flush is required between store and attacker running attack code. The attacker _cannot_ run attack code while it is in the kernel so flushing L1D on context switch is just voodoo. If you want to cure the HT case with core scheduling then the scenario looks like this: CPU0 HT0 CPU0 HT1 CPU1 victim1 IDLE store secrit -> IDLE attacker in victim2 read secrit And yes, there the context switch flush on HT0 prevents it. So this can be part of a core scheduling based mitigation or handled via a per core flush request. But HT is attackable in so many ways ... Thanks, tglx