Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp7484415pxb; Thu, 18 Feb 2021 11:14:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJyy/LQ8/ClQVijnMOyfY2XbMxBXaNGLjD2S+/7sGqbrgum6COHwtJ0ViFti6ykzdhR0H0VD X-Received: by 2002:a17:906:1cc9:: with SMTP id i9mr5419226ejh.351.1613675682458; Thu, 18 Feb 2021 11:14:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613675682; cv=none; d=google.com; s=arc-20160816; b=jHH4yUAGt02LXyy5x94AVd7hjRfYk6fP4yLP496Yui0xU1T/pG/ANyQxocdjMBGed0 COlR4zL+GyYv3rcV0x1hjUt79dM7I2sDRteLQsbjtcTTtLN5ougSq46ze4mLukklxWPl Cv6HOtn8zjfr+s3b+Cjit151QP27k9BRHXuoeQxEg32AsIBMuh2GgsHeKjkBEKIui6Kn flCA/zPkj8X8ei2Ar6ArraLZf3Yrsk850eq8wGD28w6QU1q20ea/tnBXRMcBFOT6v1ZJ NK8G39zfr6tf09sJpykC1qhQB0h+ZPC8+qIWRyjvjHq28AxZCvW70L220ryj13tKlpc/ 6r2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=5Ez2CdddfXlS3K7VPEWJ58VATwGkhjgzBnilKBdaI6o=; b=MiddqZEo9kOcT5lHeHelAsENprih4Rp9FnV1apDf9Pj543YapSZjrpFpHhKG9w2cJA wl1s7azPcnkb0ka9tenhXUOsbnm56+1MuKtj95Ckvlw+sBy7URzUDnDmr/Yl32k1iB49 CMsrYmYWEN6M21F8nCYiRP1MoENnehjeTdZ32lbibGnrIP4vq+9mH5K6fb3DrkkeB0rq REyu9M/4BDjv1Vu9hSG69h/HHvBUa7Jbbftv8FKmPq9dY0EXnYgDCWtrWkYsEzIr6XmZ 0aDLqDgil625xjhO0dT318MnNLdxmvvq3fEa5Ni1rYViCU1GEgGPosRGcaUW79s6SmSB Hr8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="StJju/Lg"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r16si4568510ejo.467.2021.02.18.11.14.05; Thu, 18 Feb 2021 11:14:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="StJju/Lg"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234019AbhBRTNY (ORCPT + 99 others); Thu, 18 Feb 2021 14:13:24 -0500 Received: from mail.kernel.org ([198.145.29.99]:50646 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234717AbhBRRuG (ORCPT ); Thu, 18 Feb 2021 12:50:06 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6B14964EB9 for ; Thu, 18 Feb 2021 17:49:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1613670560; bh=S9HE1UIVHUZqDSOtYFkGj6UL4dBAx+REIzCSaf9/6jc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=StJju/LgTXGbheWbFPDAdr5Vxto51dTm+fr4SoLyxUzCJlhABRYw3lFYVmQDVGpkD eHcXSD4YTe1rRpcxU1JIgXj17APTImvGcTXtwb145ofM6HaY8WOj3nDmuaA+vzuRfw a+qosarmjSuejz6lW4s8KGGwKDnHLGzuki2JCeFDcXIk8I259GiLbRdjKyeH6UDnCX l1PJMzC7qdNltoFA5+Kesc3EHTsc7TJ/QOWVajcpA73KdI0Z3IO3AMGGXOwFaLenjs +zfF6pMos+bbh6ZDsgJru9QaOTPC1ck7IrUFdBKZgcbrP8dOipjRdWmfOfeqEC3M5m 3FBRrCj+PtT9g== Received: by mail-ej1-f49.google.com with SMTP id bl23so7084759ejb.5 for ; Thu, 18 Feb 2021 09:49:20 -0800 (PST) X-Gm-Message-State: AOAM532bl+3KNV2MvoPvfqFHdJJ3K/GfflEV597Y91HCMsUkAtrHvcY9 t9iIE+Of/F2mvq3NMbUhxRcWlQiZvbf5xMhRmzgBMw== X-Received: by 2002:a17:906:d10d:: with SMTP id b13mr4988544ejz.204.1613670557791; Thu, 18 Feb 2021 09:49:17 -0800 (PST) MIME-Version: 1.0 References: <20210217120143.6106-1-joro@8bytes.org> <20210217120143.6106-3-joro@8bytes.org> <20210218112500.GH7302@8bytes.org> In-Reply-To: <20210218112500.GH7302@8bytes.org> From: Andy Lutomirski Date: Thu, 18 Feb 2021 09:49:06 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/3] x86/sev-es: Check if regs->sp is trusted before adjusting #VC IST stack To: Joerg Roedel Cc: Andy Lutomirski , X86 ML , Joerg Roedel , stable , "H. Peter Anvin" , Dave Hansen , Peter Zijlstra , Jiri Slaby , Dan Williams , Tom Lendacky , Juergen Gross , Kees Cook , David Rientjes , Cfir Cohen , Erdem Aktas , Masami Hiramatsu , Mike Stunes , Sean Christopherson , Martin Radev , Arvind Sankar , LKML , kvm list , Linux Virtualization Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 18, 2021 at 3:25 AM Joerg Roedel wrote: > > Hi Andy, > > On Wed, Feb 17, 2021 at 10:09:46AM -0800, Andy Lutomirski wrote: > > Can you get rid of the linked list hack while you're at it? This code > > is unnecessarily convoluted right now, and it seems to be just asking > > for weird bugs. Just stash the old value in a local variable, please. > > Yeah, the linked list is not really necessary right now, because of the > way nested NMI handling works and given that these functions are only > used in the NMI handler right now. > The whole #VC handling code was written with future requirements in > mind, like what is needed when debugging registers get virtualized and > #HV gets enabled. > Until its clear whether __sev_es_ist_enter/exit() is needed in any of > these paths, I'd like to keep the linked list for now. It is more > complicated but allows nesting. I don't understand what this means. The whole entry mechanism on x86 is structured so that we call a C function *and return from that C function without longjmp-like magic* with the sole exception of unwind_stack_do_exit(). This means that you can match up enters and exits, and that unwind_stack_do_exit() needs to unwind correctly. In the former case, it's normal C and we can use normal local variables. In the latter case, we know exactly what state we're trying to restore and we can restore it directly without any linked lists or similar. What do you have in mind that requires a linked list? > > > Meanwhile, I'm pretty sure I can break this whole scheme if the > > hypervisor is messing with us. As a trivial example, the sequence > > SYSCALL gap -> #VC -> NMI -> #VC will go quite poorly. > > I don't see how this would break, can you elaborate? > > What I think happens is: > > SYSCALL gap (RSP is from userspace and untrusted) > > -> #VC - Handler on #VC IST stack detects that it interrupted > the SYSCALL gap and switches to the task stack. > Can you point me to exactly what code you're referring to? I spent a while digging through the code and macro tangle and I can't find this. > > -> NMI - Now running on NMI IST stack. Depending on whether the > stack switch in the #VC handler already happened, the #VC IST > entry is adjusted so that a subsequent #VC will not overwrite > the interrupted handlers stack frame. > > -> #VC - Handler runs on the adjusted #VC IST stack and switches > itself back to the NMI IST stack. This is safe wrt. nested > NMIs as long as nested NMIs itself are safe. > > As a rule of thumb, think of the #VC handler as trying to be a non-IST > handler by switching itself to the interrupted stack or the task stack. > If it detects that this is not possible (which can't happen right now, > but with SNP), it will kill the guest. I will try to think of this, but it's hard, since I can't find the code :) I found this comment: * With the current implementation it is always possible to switch to a safe * stack because #VC exceptions only happen at known places, like intercepted * instructions or accesses to MMIO areas/IO ports. They can also happen with * code instrumentation when the hypervisor intercepts #DB, but the critical * paths are forbidden to be instrumented, so #DB exceptions currently also * only happen in safe places. Unless AMD is more magic than I realize, the MOV SS bug^Wfeature means that #DB is *not* always called in safe places. But I *thnk* the code you're talking about is this: /* * If the entry is from userspace, switch stacks and treat it as * a normal entry. */ testb $3, CS-ORIG_RAX(%rsp) jnz .Lfrom_usermode_switch_stack_\@ which does not run on #VC from kernel code. > It needs to be IST, even without SNP, because #DB is IST too. When the > hypervisor intercepts #DB then any #DB exception will be turned into > #VC, so #VC needs to be handled anywhere a #DB can happen. Eww. > > And with SNP we need to be able to at least detect a malicious HV so we > can reliably kill the guest. Otherwise the HV could potentially take > control over the guest's execution flow and make it reveal its secrets. True. But is the rest of the machinery to be secure against EFLAGS.IF violations and such in place yet? > > Regards, > > Joerg