Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp504103pxk; Sun, 30 Aug 2020 11:42:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxoRyCg7MExhqJM4VYAa2d0vrmNemI07PcLyZ5zAYfcNJmK6UQhmpVK6/Fkw+TLOGvpyF81 X-Received: by 2002:a17:906:c1d7:: with SMTP id bw23mr9084312ejb.315.1598812954789; Sun, 30 Aug 2020 11:42:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598812954; cv=none; d=google.com; s=arc-20160816; b=Wv3Fb3H/OkRk7XqmPuZ1qCKcWzwyqx8eazzB/pHBdRS9cje1euI55/BonlsvZkAQ26 q/q0zvC52V4ph+O5bbNB2BdJ0+NZ0U/NW5sRiqf1RQnKD0lN37rYGwOF+odMWEr+OkBH 6gyadQf5RcLhX0xKR6vODq6CbspfmOQSnZoJel495iM/gdzfVOitLb+lFEQq24tsO0jZ S6eXam3JRWDquCMLybRA/lSDMEdBwS43JYADplO24wczbl37LUolImvCxsorMNWciybN xaSODgN1c+KD2OfSfDgOjIPW3pkeeIVtfz6ZLfFm1Dpa4n8sjiUKGrq9idlostLKL2vo tkrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=+NTYppXVFlAtMJ5lMPooDjQI0LeiUfB45dzpf9LtvXk=; b=ZNmO3YhJowSnj6cphQNiwWLDn4vXMIKnyyz6ec6rxQ21EhWywRp4wBABbksFpVMc5G /WKnIbgW1r+bNcV34uAN25TYEvel4xxPnzpSN25MUzSVjBMRDSN0MpwEJy2swkaDpyCh cU3f08FuTLEVeZnctTpU1Zc7m81DspyfRnNctQZxs292ZQrlpucJly8/kT01jXam+D2s Zjz9Vd9gP8/A/QsW6VpPexYw4gF0idUG2fpqjBzTmPfEnyQdaHWweGAFv8YGROSKrbSY 1LoMEzvKYqzfdK7C/yc4/2r3VvmStzpfp5sRkGfrAnGqoYsvv5qOk75GUafKVLF8XG5B jKwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=QQ2VZR83; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s2si4020544ejs.269.2020.08.30.11.42.11; Sun, 30 Aug 2020 11:42:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=QQ2VZR83; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726426AbgH3Shy (ORCPT + 99 others); Sun, 30 Aug 2020 14:37:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726155AbgH3Shx (ORCPT ); Sun, 30 Aug 2020 14:37:53 -0400 Received: from mail-lj1-x241.google.com (mail-lj1-x241.google.com [IPv6:2a00:1450:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23829C061573 for ; Sun, 30 Aug 2020 11:37:53 -0700 (PDT) Received: by mail-lj1-x241.google.com with SMTP id 185so4221277ljj.7 for ; Sun, 30 Aug 2020 11:37:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+NTYppXVFlAtMJ5lMPooDjQI0LeiUfB45dzpf9LtvXk=; b=QQ2VZR83aWyPSugNb4yjiAdB7wS5jzkBO4bRzddE6oyZ7jU9MKbbA2fLOSEqueCFFM ThGV8N5/AoI67xjlJ9dAdYvS4tYAnSfKEYAFSjX75o0R5Ejs/H99v0BY6qoBbaYjbpGK xDEJUXCBMXj5oyexpFtdYSZEqFBPnpkk0z7pk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+NTYppXVFlAtMJ5lMPooDjQI0LeiUfB45dzpf9LtvXk=; b=JxRvL8glJ8cn64b/QsqFfYNsDxybACjajp7+BSWvzxVw5oVZg71LvRwattMoNiIr5o r8XB26FKZzRNKZr6e2EihWxFOekZGngF8U8nNe5HkAnS6fxPbgj4IFvSyfOvaifYMknU ilU5FvCeyrqmfzJAlLT+yR+bfocUNlp0UUY3qju3igFA9ZLg0JruW284Jn3pT1jtoiKr PwTyZZIcxKswaVwejmG87663JLHpQ2Yh7rhoz1fN66ptd9XCyoZ3f6HFezEX8pYg/Eob wxbtsyb50lzBlNoFesIJancX5kBD3pOOvnSXyx95pnucgm/NUIWYmZUbfl0ZEP0tg92L +0Zg== X-Gm-Message-State: AOAM5323CgRptub/KiBKUKKXoooFojXat0g0kh6ja3UxYieHUcncVGkR DKvzGnXU+3td6ay/tO3TO4zPbIsX7RoiHQ== X-Received: by 2002:a2e:8804:: with SMTP id x4mr3612011ljh.328.1598812671479; Sun, 30 Aug 2020 11:37:51 -0700 (PDT) Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com. [209.85.208.173]) by smtp.gmail.com with ESMTPSA id w6sm300155ljj.121.2020.08.30.11.37.50 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 30 Aug 2020 11:37:51 -0700 (PDT) Received: by mail-lj1-f173.google.com with SMTP id y2so4261114ljc.1 for ; Sun, 30 Aug 2020 11:37:50 -0700 (PDT) X-Received: by 2002:a2e:2e04:: with SMTP id u4mr3497698lju.102.1598812670449; Sun, 30 Aug 2020 11:37:50 -0700 (PDT) MIME-Version: 1.0 References: <875z98jkof.fsf@nanos.tec.linutronix.de> <3babf003-6854-e50a-34ca-c87ce4169c77@citrix.com> <20200825043959.GF15046@sjchrist-ice> <20200825171903.GA20660@sjchrist-ice> <20200826191644.GC21065@sjchrist-ice> In-Reply-To: From: Linus Torvalds Date: Sun, 30 Aug 2020 11:37:34 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: TDX #VE in SYSCALL gap (was: [RFD] x86: Curing the exception and syscall trainwreck in hardware) To: Andy Lutomirski Cc: Sean Christopherson , Andrew Cooper , Thomas Gleixner , LKML , X86 ML , Tom Lendacky , Pu Wen , Stephen Hemminger , Sasha Levin , Dirk Hohndel , Jan Kiszka , Tony W Wang-oc , "H. Peter Anvin" , Asit Mallick , Gordon Tetlow , David Kaplan , Tony Luck Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Aug 30, 2020 at 8:37 AM Andy Lutomirski wrote: > > There's no such thing as "just" using an IST. Using IST opens a huge > can of works due to its recursion issues. I absolutely despise all the x86 "indirect system structures". They are horrible garbage. IST is only yet another example of that kind of brokenness, and annoys me particularly because it (and swapgs) were actually making x86 _worse_. The old i386 exception model was actually better than what x86-64 did, and IST is a big part of the problem. Just have a supervisor stack, and push the state on it. Stop playing games with multiple stacks depending on some magical indirect system state. Other examples of stupid and bad indirection: - the GDT and LDT. The kernel should never have to use them. It would be much better if the segment "shadow" state would stop being shadow state, and be the REAL state that the kernel (and user space, for that matter) accesses. Yeah, we got halfway there with MSR_FS/GS_BASE, but what a complete garbage crock that was. So now we're forced to use the selector *and* the base reghister, and they may be out of sync with each other, so you have the worst of both worlds. Keep the GDT and LDT around for compatibility reasons, so that old broken programs that want to load the segment state the oldfashioned way can do so. But make it clear that that is purely for legacy, and make the modern code just save and restore the actual true non-indirect segment state. For new models, give us a way to load base/limit/permissions directly, and reset them on kernel entry. No more descriptor table indirection games. - the IDT and the TSS segment. Exact same arguments as above. Keep them around for legacy programs, but let us just set "this is the entrypoint, this the the kernel stack" as registers. Christ, we're probably better off with one single entry-point for the whole kernel (ok, give us a separate one for NMI/MCE/doublefault, since they are _so_ special, and maybe separate "CPU exceptions" from "external interrupts), together with just a register that says what the exception was. - swapgs needs to die. The kernel GS/FS segments should just be separate segment registers from user space. No "swapping" needed. In CPL0, "gs" just means something different from user space. No save/restore code for it, no swapping, no nothing. Honestly, I think %rsp/%rip could work like that too. Just make "rsp" and "rip" be a completely different register in kernel mode - rename it in the front-end of the CPU or whatever. Imagine not having to save/restore rsp/rip on kernel entry/exit at all, because returning to user more just implicitly starts using ursp/urip. And a context switch uses (fast) MSR's to save/restore the user state (or, since it's actually a real register in the register file, just a new "mov" instruction to access the user registers). Linus