Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp4529189rwo; Tue, 25 Jul 2023 07:24:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlE7nLkLPJIzTCYQ68+rsIU+vcIifrYKSbYGxhw9xxkzmszsvr/1pFxeUGkLtPBZFEZ5U45/ X-Received: by 2002:a2e:7d14:0:b0:2b7:2ea:33c9 with SMTP id y20-20020a2e7d14000000b002b702ea33c9mr9517343ljc.20.1690295054098; Tue, 25 Jul 2023 07:24:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690295054; cv=none; d=google.com; s=arc-20160816; b=IM2wabxiL8TAZcbEJJYLoItkLx5AB4/76UQeT8DrHDs3UsuBTskfGVb+G07QAUWS5Z 7RYMQ2Vwn1ozNuj/SRXSu/ZdQMqTJ4C1hdTFP/geQWjk3BLZ3maSyD8x58YvhukQrVIo BEptI2x1aDCfBu7Q2TUpxXeIMxY7i50U7WBZfqtqPhc7sqQ3BEEPWZpVeZxqhxkjmTvS wmq9GRhb8+rIjaHT008TG34JqVZnVUTIdToBMd8V1La8L2WGTBJ/VPId2KyDOI+hcHo0 iFmXtTnBGGW9GKYkWn5v+JA/2BwyIuvImqUImRpKWUj4Wm/gKd4cdzPefhN53O/bDdfK DAkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=wGqxpUXCoDxIaORETfr7uIrAMmzRtESmnNtnwpMTjcQ=; fh=ui4k1zEuPJDqUjpxX82LDW4rK68oZWIyqC8Vbh1LveA=; b=coozGB7xuWgY4Jl2ym+SA3evscbMAw+f1/wBir7Y9A53jHbcpLTTvvLzudlfPsxdut bVaO462TzFWcUjappZm0KeS+WCgXeVbsd5BQS9mLdwBtXe6zDXzach0LKEFmHN5tc9s6 gv189PFxCBybAqkcUSf9o5oqpJFj8Z6YKCgfTEwN8JQZtDtK0Hm5FUuyQpbSV8YdGSY+ 19DYwu/2QuhMJwPVC/FBNdwPVmuJXDvqp5VvzRETkcDGEkuvmTmhjTL9k8j5+ysx06JI ZCrsgNDe2kXS1DExRQTEqa+YZbYBcu6Rl+m+QRdzSaLXB79rBd7MCFWau6KAP51MNs/0 eELA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=f8AwNDDB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d24-20020a17090694d800b00992bfd00fdbsi7809401ejy.971.2023.07.25.07.23.48; Tue, 25 Jul 2023 07:24:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=f8AwNDDB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230114AbjGYNWC (ORCPT + 99 others); Tue, 25 Jul 2023 09:22:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230060AbjGYNWA (ORCPT ); Tue, 25 Jul 2023 09:22:00 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B297FE3; Tue, 25 Jul 2023 06:21:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=wGqxpUXCoDxIaORETfr7uIrAMmzRtESmnNtnwpMTjcQ=; b=f8AwNDDBNeSxUozU34si/zZBDf plnXfYPhG0x+P/nfzmym0QplxCPJ7SZ/55xlG0/wfRzFP3qjjqPhRviGKBsC7Zd5aD7/51ehEIcju h9rI0QP6TNGPTo8sDE7i8ZIUdODxaSbETaTglg2bkjgwSno/LOwy+t3L6frXOZ/gUdBsYDnf3fTWm i32WJGiyMpI0+74+v+giFGr+IGzPqQICkYSOV1tb2wK/ha9P4p+geQWGEfD/B23DNww43mcOaxmvj HnaHdDr5UPRMasRRiI/PqfVG343MRV53doZZI+vc7ByAdPWQKHp0/uzlgiPhp2WJKwweLe3q1Axtm CBejci1Q==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qOHzA-005Uxm-Md; Tue, 25 Jul 2023 13:21:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 5DFA9300155; Tue, 25 Jul 2023 15:21:55 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 42E6127D9B9A2; Tue, 25 Jul 2023 15:21:55 +0200 (CEST) Date: Tue, 25 Jul 2023 15:21:55 +0200 From: Peter Zijlstra To: Dave Hansen Cc: Valentin Schneider , Nadav Amit , Linux Kernel Mailing List , "linux-trace-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "kvm@vger.kernel.org" , linux-mm , bpf , the arch/x86 maintainers , "rcu@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: Re: [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Message-ID: <20230725132155.GJ3765278@hirez.programming.kicks-ass.net> References: <20230720163056.2564824-1-vschneid@redhat.com> <20230720163056.2564824-21-vschneid@redhat.com> <188AEA79-10E6-4DFF-86F4-FE624FD1880F@vmware.com> <2284d0db-f94a-e059-7bd0-bab4f112ed35@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2284d0db-f94a-e059-7bd0-bab4f112ed35@intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 24, 2023 at 10:40:04AM -0700, Dave Hansen wrote: > TLB flushes for freed page tables are another game entirely. The CPU is > free to cache any part of the paging hierarchy it wants at any time. > It's also free to set accessed and dirty bits at any time, even for > instructions that may never execute architecturally. > > That basically means that if you have *ANY* freed page table page > *ANYWHERE* in the page table hierarchy of any CPU at any time ... you're > screwed. > > There's no reasoning about accesses or ordering. As soon as the CPU > does *anything*, it's out to get you. > > You're going to need to do something a lot more radical to deal with > free page table pages. Ha! IIRC the only thing we can reasonably do there is to have strict per-cpu page-tables such that NOHZ_FULL CPUs can be isolated. That is, as long we the per-cpu tables do not contain -- and have never contained -- a particular table page, we can avoid flushing it. Because if it never was there, it also couldn't have speculatively loaded it. Now, x86 doesn't really do per-cpu page tables easily (otherwise we'd have done them ages ago) and doing them is going to be *major* surgery and pain. Other than that, we must take the TLBI-IPI when freeing page-table-pages. But yeah, I think Nadav is right, vmalloc.c never frees page-tables (or at least, I couldn't find it in a hurry either), but if we're going to be doing this, then that file must include a very prominent comment explaining it must never actually do so either. Not being able to free page-tables might be a 'problem' if we're going to be doing more of HUGE_VMALLOC, because that means it becomes rather hard to swizzle from small to large pages.