Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp576422rdb; Tue, 19 Sep 2023 04:27:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFVUdZYGSMiaetkl/6D9e91SYDB2kS2ccqoNbKhwjjdxD6iV5MaiP1N8mPL3OLclpfRgVPh X-Received: by 2002:a05:6a20:3c93:b0:138:2fb8:6c48 with SMTP id b19-20020a056a203c9300b001382fb86c48mr11483240pzj.8.1695122849771; Tue, 19 Sep 2023 04:27:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695122849; cv=none; d=google.com; s=arc-20160816; b=0JzxV7j5z/XYMP/06EsZitP4TUcef6/Smpb+gnqzvlfq0f9LtPkEOh+nTtL8edlBLN GNZADmwWtBR1VdHOPoGiZ+ONNvY8md7LwGPdx6SEVEza9v0yrKg1iZ7dAHDmy3iXjgDK jpJ99ChyaqOV/XTw4dFv9PcamLNhTzs/fyfjp/qpTBi2eP88u16t78OEEcAY9zw4cLmW 37KYn4NFoO5Mq2KtT48GVQFSL3TDK6IHthRGxVgyu1Ob6jrt2f/D9jtmc1nfhNT6NU7l YPPAfRPsOkDAGfu0LLznYvvOOnY4tkwTMVk3KYx24DrhU1azdhTxulTNDFZ3BWu2ctAa dBfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=VEZcqkbunLRF0TikvjJniAe2SOdd4y7HDzK57R4UZj4=; fh=sIX8IPNPGIGmp/BecaKB6+wITcIwwRk4z/Uynl/y9uI=; b=uRS9lE7mCOE7v5Mfqt44kCCYeOYUUacaapKrr0phwxlbulhBjifb/0WS9uSDgKbvN9 7d+VWR4hSGHgTmi6rL5437SBuNWubqIwZQImZGwSAnfslYsdeeBQhb4KV1gpQNacA64P 6DnjZd78fAT5LQiWtOL2wo+gDERIlIEcPh7wilHiqtFDGxZ91fv3JeHATEBtE2H4bihq lD+nv8ZxZ35cdIPeTZiBK9fixceiVB0IFX82PamLyVV1pAO0Tf3UBnxbZ65H30h6NM7O MrcBoCqxhiYQJSVx7Msmx+PY+fXuF/4psgBmiwcHoeaSIPI0GpBSha0b5N5BfPGEdX8d 2TDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=K6ZZnwRE; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id x2-20020a656aa2000000b00578b4c27d58si1104716pgu.52.2023.09.19.04.27.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 04:27:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=K6ZZnwRE; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id AB24D80A0DF9; Tue, 19 Sep 2023 02:07:31 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231178AbjISJHb (ORCPT + 99 others); Tue, 19 Sep 2023 05:07:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231137AbjISJHZ (ORCPT ); Tue, 19 Sep 2023 05:07:25 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 253D0DA for ; Tue, 19 Sep 2023 02:07:19 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1695114437; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VEZcqkbunLRF0TikvjJniAe2SOdd4y7HDzK57R4UZj4=; b=K6ZZnwREerdcZnimw9gy0GOuGD9EU7+YHAtqgRRRgpNr2xXae9gvZJCrzU3dFX+CwPXPeQ 837Jmn+9/RU/8Z6eXhv/1HSY8fslIUADpTn3r3WhWo3b08AXeGAwjCCXXr9h7AhMvD/Gdt GaoLm9kYY8paVaEQLCf3kLNbWWkecqrF84lLL1sSlN5Q+bSm8hYHmJGMrSGe4Eq1LNmag2 4FhyqMzPhRGeF8vdmqv9zvjYsth5uEel6xhOuKq65acSCZ6g1571+JfQVbrBZsKFxMDJF4 f2V+D0xixsLNmqdE1thZ+oaJI/YE2o3KOous1YTBSTmVt3jiyVMbjIMtHDfQcA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1695114437; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VEZcqkbunLRF0TikvjJniAe2SOdd4y7HDzK57R4UZj4=; b=v4+qNROZLDiBWpNURscX8KwLHwnQ13p+waE6e8OKJt09dv7qxjJBODDBGbVOXRv48pCQpA e4jZ2QF31R+QeNCw== To: Andy Lutomirski , Brendan Jackman Cc: Ingo Molnar , Borislav Petkov , Dave Hansen , the arch/x86 maintainers , "H. Peter Anvin" , Linux Kernel Mailing List , Lai Jiangshan , yosryahmed@google.com, reijiw@google.com, oweisse@google.com Subject: Re: [PATCH RESEND] x86/entry: Don't write to CR3 when restoring to kernel CR3 In-Reply-To: References: <20230817121513.1382800-1-jackmanb@google.com> Date: Tue, 19 Sep 2023 11:07:17 +0200 Message-ID: <87sf7awmyi.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 19 Sep 2023 02:07:31 -0700 (PDT) On Mon, Sep 18 2023 at 20:28, Andy Lutomirski wrote: > On Thu, Aug 17, 2023, at 5:15 AM, Brendan Jackman wrote: >> From: Lai Jiangshan >> >> Skip resuming KERNEL pages since it is already KERNEL CR3 >> >> Signed-off-by: Lai Jiangshan >> Signed-off-by: Brendan Jackman >> --- >> >> While staring at paranoid_exit I was confused about why we had this CR3 >> write, avoiding it seems like a free optimisation. The original commit >> 21e94459110252 ("x86/mm: Optimize RESTORE_CR3") says "Most NMI/paranoid >> exceptions will not in fact change pagetables" but I didn't't understand >> what the "most" was referring to. I then discovered this patch on the >> mailing list, Andy said[1] that it looks correct so maybe now is the >> time to merge it? > > I did? > > Looking at the link, I think I was saying that the opposite patch > (*always* flush) looked okay. That always flush part was solely for the user CR3 restore path. >> @@ -236,14 +236,13 @@ For 32-bit we have the following conventions - >> kernel is built with >> .macro RESTORE_CR3 scratch_reg:req save_reg:req >> ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI >> >> - ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID >> - >> /* >> - * KERNEL pages can always resume with NOFLUSH as we do >> - * explicit flushes. >> + * Skip resuming KERNEL pages since it is already KERNEL CR3. >> */ >> bt $PTI_USER_PGTABLE_BIT, \save_reg >> - jnc .Lnoflush_\@ >> + jnc .Lend_\@ > > I don't get it. How do you know that the actual loaded CR3 is correct? > > I'm willing to believe that there is some constraint in the way the > kernel works such that every paranoid entry will, as part of its > execution, switch CR3 to kernel *and leave it like that* *and that > this will be the _same_ kernel CR3 that was saved*. But I'm not > really convinced that's true. (Can we schedule in a paranoid entry? > Probably not. What about the weird NMI paths? What if we do > something that switches to init mm? Hmm -- doing that in a paranoid > context is probably not a brilliant idea.) You have to differentiate between entry from kernel and entry from user. Entry from user switches to the task stack, while entry from kernel always runs on IST. Entry from user cannot have kernel CR3 obviously, while entry from kernel can have either kernel CR3 or user CR3. Entry from user does not use the paranoid entry/exit paths at all, so that's a non-issue. IST prevents that the exception can schedule, which in turn guarantees that CR3 stays the same until it returns. Unless some completely stupid code path would trie to switch to a different mm from an exception which can hit into the middle of an mm switch. Then the failed restore is probably the least of our problems. > Maybe it is true, and maybe a convincing argument could be made. But > that seems like a lot of thinking and fragility for an optimization > that seems pretty minor. I don't think its pretty minor. CR3 writes even with the noflush bit set are not necessarily cheap. While most exceptions which go through the paranoid path are not hotpath, the one which matters is #NMI due to perf. So I think it's worth to spare the redundant CR3 switch in that case. Thanks, tglx