Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp3040937pxa; Tue, 25 Aug 2020 09:50:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyswEkvcbvJG7tg2HrRf8QwLi+1ewsat6DhuW59YHg6T/kCHt1gSMMaSDmS/N541zkWog51 X-Received: by 2002:a05:6402:7d5:: with SMTP id u21mr11009182edy.235.1598374224983; Tue, 25 Aug 2020 09:50:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598374224; cv=none; d=google.com; s=arc-20160816; b=tUg615rvh3q83fV6o2pkXr6yS13dsmnJYWcbTnomwqq9VLXumXHZMMukJPbIhkxIH9 8OQxzzylbYKppi0cAioWoe2nNU4aODxMpwb/LYZcqnbkN2v6ibL6lRetewDbhxnHpl8h vDKCdkgcEmgjz2JvUJD2vh2NvKn/oev+LnUOKQbJ/AFvw/lA0wVE3zdgPiYLErIxvf1r 580CcsUQoTaPXazRXaQBQy2DTpXdFXZsnjG5Bihb2E+ozhESmS6vcZES526xpYjseEju r7+tP9rxnpM8BXMhv6PoRlcKLcfL0oTfb8JHHtE033M1H0adPLOFG41wNO6Qh/xpsbq/ lCfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=f+wWEfofd5hFL1SutEqzzYeF0AxES7lnBo8nhM5Yjj0=; b=bjaLk0AlZkVHkNhzAME9DoliSiUdB97sn69MjgjhkXHz9PB4y0CQWzHgkbkQaC/j47 A6uV40odStPuUyvENfbE5HxnWRbz4B2JquZYkvr3XGDCo9fvtV/ih9/zZsBKS+GnOQ3b mPn7pjHnpZEX28HHxpw1acolJ5le4Frm0hmDJhqAZHkc4uQY9NInCUfQBbsrvfCzcFQT 2EsgLVBXZTzmS0C4eKg/+9ivxxoR0J+MLsIoqBbP1vIubdxFOYiv13xU4RKjOTXRHNPO 4oTDutQF1PImOW+GMnLUeknQNGF6rrWQDvB1K3gZdeHA1qS3hA5EifheAH3mJgig4Bay Nvuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="G5kzU/To"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w13si9248558ejk.747.2020.08.25.09.49.59; Tue, 25 Aug 2020 09:50:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="G5kzU/To"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726610AbgHYQtX (ORCPT + 99 others); Tue, 25 Aug 2020 12:49:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:42420 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726551AbgHYQtS (ORCPT ); Tue, 25 Aug 2020 12:49:18 -0400 Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 21BF320838 for ; Tue, 25 Aug 2020 16:49:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598374158; bh=9+o+z80XuKuTsz1MzRnkmmu3FeAm+4uaDIsalSIrtzA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=G5kzU/Top3evaRqsg/JngO60GDPOmHdgF1IiA7DnM7kQpIAIgH9JYHL+AQ/4FDMMQ IIPiMwdA8gUKXyakUON6exDcYM5mmklY0TpTk0QzajA7kI5cCOGDGtdx6ugGFSKNhp e7UhjcgH7A1pyFfJmqkPpjRBYKw52TT/0g9Z0InA= Received: by mail-wm1-f44.google.com with SMTP id a65so3153225wme.5 for ; Tue, 25 Aug 2020 09:49:18 -0700 (PDT) X-Gm-Message-State: AOAM533X4yUb/sbd82D+cIhEfBijPNz3QDLdk2osftOOLS227kqcuYmr Fp57p1wbZA7IAfB8AW8QXNZ+CFuQwofe53v9PeAyLA== X-Received: by 2002:a1c:bc45:: with SMTP id m66mr2531587wmf.36.1598374156654; Tue, 25 Aug 2020 09:49:16 -0700 (PDT) MIME-Version: 1.0 References: <875z98jkof.fsf@nanos.tec.linutronix.de> <3babf003-6854-e50a-34ca-c87ce4169c77@citrix.com> <20200825043959.GF15046@sjchrist-ice> In-Reply-To: <20200825043959.GF15046@sjchrist-ice> From: Andy Lutomirski Date: Tue, 25 Aug 2020 09:49:05 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: TDX #VE in SYSCALL gap (was: [RFD] x86: Curing the exception and syscall trainwreck in hardware) To: Sean Christopherson Cc: Andrew Cooper , Thomas Gleixner , LKML , X86 ML , Linus Torvalds , Tom Lendacky , Pu Wen , Stephen Hemminger , Sasha Levin , Dirk Hohndel , Jan Kiszka , Tony W Wang-oc , "H. Peter Anvin" , Asit Mallick , Gordon Tetlow , David Kaplan , Tony Luck , Andy Lutomirski Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 24, 2020 at 9:40 PM Sean Christopherson wrote: > > +Andy > > On Mon, Aug 24, 2020 at 02:52:01PM +0100, Andrew Cooper wrote: > > And to help with coordination, here is something prepared (slightly) > > earlier. > > > > https://docs.google.com/document/d/1hWejnyDkjRRAW-JEsRjA5c9CKLOPc6VKJQsuvODlQEI/edit?usp=sharing > > > > This identifies the problems from software's perspective, along with > > proposing behaviour which ought to resolve the issues. > > > > It is still a work-in-progress. The #VE section still needs updating in > > light of the publication of the recent TDX spec. > > For #VE on memory accesses in the SYSCALL gap (or NMI entry), is this > something we (Linux) as the guest kernel actually want to handle gracefully > (where gracefully means not panicking)? For TDX, a #VE in the SYSCALL gap > would require one of two things: > > a) The guest kernel to not accept/validate the GPA->HPA mapping for the > relevant pages, e.g. code or scratch data. > > b) The host VMM to remap the GPA (making the GPA->HPA pending again). > > (a) is only possible if there's a fatal buggy guest kernel (or perhaps vBIOS). > (b) requires either a buggy or malicious host VMM. > > I ask because, if the answer is "no, panic at will", then we shouldn't need > to burn an IST for TDX #VE. Exceptions won't morph to #VE and hitting an > instruction based #VE in the SYSCALL gap would be a CPU bug or a kernel bug. Or malicious hypervisor action, and that's a problem. Suppose the hypervisor remaps a GPA used in the SYSCALL gap (e.g. the actual SYSCALL text or the first memory it accesses -- I don't have a TDX spec so I don't know the details). The user does SYSCALL, the kernel hits the funny GPA, and #VE is delivered. The microcode wil write the IRET frame, with mostly user-controlled contents, wherever RSP points, and RSP is also user controlled. Calling this a "panic" is charitable -- it's really game over against an attacker who is moderately clever. The kernel can't do anything about this because it's game over before the kernel has had the chance to execute any instructions.