Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp81937rwb; Tue, 25 Jul 2023 12:15:50 -0700 (PDT) X-Google-Smtp-Source: APBJJlErl8z4agpUxVcqMUrGmCq1DiNMBX51sjmGIIounFok+KG6a1pjSJMYSRMLB+cwvYjR198U X-Received: by 2002:adf:fac1:0:b0:313:ebf3:f817 with SMTP id a1-20020adffac1000000b00313ebf3f817mr9821609wrs.22.1690312549853; Tue, 25 Jul 2023 12:15:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690312549; cv=none; d=google.com; s=arc-20160816; b=oP/hBRz9NQfviYDr6+8/mTnI/dhF2aMW2xRTJZ+M2xG613zCcW5HlA6sebZNosFSpa W+ssF+ffNj7oH3ePRZdKJJQdFq6UmaT1HB+hhoujeCb2Y+1jHNHUqUp7SImlxZOJQjTS GpcQlra2MWce+JUoVQ/nCg5wT7wpSyCvF58yjF1lRQwbBRy9vXHABTnHkH9368I/21Bs 94iSotPP7TvRzz3hS5DmYTvTAmH70lPGxGx1Ca/1vAhvcQdOdlY6sT7Z7sZEhf755Ply SV5aGuP32vyU5mL1XiuGcCsyXFz3Ze/rELgpg1LHYveZlC9oRdUz1z1gY/wyrNnEzx3G xoag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=U+W1KNKHyqBKM6hSZGmbZo/SIeRybXtKYBcosPEslug=; fh=ALY7vZFZdYHBhqGv1qPnaaOz22NvoyrUqmEdGaDnk0U=; b=E25PovGPWUY6JtOVxrqchEzcgJXHIp+4b0mA4yWys2B3tJPP6Sg6Kyo78DQtbEdrrJ Vz/6rzZhZByXD0AtLw9lv1uOAQ6y5tE+hC7aLXEmXo7Eij/QEheZmzJBUfplj7E/8ylZ EuyOjhH9qwr/gyoFNnrDAnNJl+G/QmXCcrCl3BKlfUWXtGQkoiyN6FfB675J/lyXv7r1 A2iDibDRqnJ3X8WpJJi6kdICntdBZrwO59WLGDW+yAOvE5/GtGnekuf0qk2mIRJaFnkI 5OdwwrNRAoAlgF3vY7xu2XdI3QBmFUBPEsxyBll8Uh7tlw7hbS2ITNGNQflJ3CqLdVIm ldrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QAHjsUaH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s6-20020aa7d786000000b0051e242852fasi8010781edq.584.2023.07.25.12.15.20; Tue, 25 Jul 2023 12:15:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QAHjsUaH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231379AbjGYRMb (ORCPT + 99 others); Tue, 25 Jul 2023 13:12:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230076AbjGYRM0 (ORCPT ); Tue, 25 Jul 2023 13:12:26 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A558C1718; Tue, 25 Jul 2023 10:12:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690305145; x=1721841145; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=fq0MiyGtfdmEy3j11xkXHlesEoaC2GKhOtTrGN15RYs=; b=QAHjsUaHOriLvxDau9WpfAXbQoPdZr6Ws3ztbkKz3v/ju1OVCccak5YO APcUwkkbVNmqp1onD9nPZ88T9kcgWtadBX/BJxfp106jyJDFVZaYFRMHy SE1VbAQvR1knQCRxgYaD/2Yf3Z8OHh8ycI+Cfranw6K6/wYXlk2ryFtbg Y13JQvkOyKvhPxz0GZfxhRxJN3QzUy6mVSFBmyAli2Upl5iUp4fYiCDoT 6t+j1VMvLUwTwu5R1DToj8gLuXZLezld/HSR+BV47mnlrjeX7L1cOyeC9 koo8mFRmbYtoZEJi5Kj7/VCg3dcQaY6SxhndKcgyZ7F3IiZiFgKGVhJg9 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10782"; a="366675112" X-IronPort-AV: E=Sophos;i="6.01,230,1684825200"; d="scan'208";a="366675112" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2023 10:12:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10782"; a="720114234" X-IronPort-AV: E=Sophos;i="6.01,230,1684825200"; d="scan'208";a="720114234" Received: from chrisper-mobl.amr.corp.intel.com (HELO [10.209.69.88]) ([10.209.69.88]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Jul 2023 10:12:22 -0700 Message-ID: Date: Tue, 25 Jul 2023 10:12:21 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Content-Language: en-US To: Marcelo Tosatti Cc: Valentin Schneider , Nadav Amit , Linux Kernel Mailing List , "linux-trace-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "kvm@vger.kernel.org" , linux-mm , bpf , the arch/x86 maintainers , "rcu@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?UTF-8?Q?Thomas_Wei=c3=9fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Yair Podemsky References: <20230720163056.2564824-1-vschneid@redhat.com> <20230720163056.2564824-21-vschneid@redhat.com> <188AEA79-10E6-4DFF-86F4-FE624FD1880F@vmware.com> <2284d0db-f94a-e059-7bd0-bab4f112ed35@intel.com> From: Dave Hansen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/25/23 09:37, Marcelo Tosatti wrote: >> TLB flushes for freed page tables are another game entirely. The CPU is >> free to cache any part of the paging hierarchy it wants at any time. > Depend on CONFIG_PAGE_TABLE_ISOLATION=y, which flushes TLB (and page > table caches) on user->kernel and kernel->user context switches ? Well, first of all, CONFIG_PAGE_TABLE_ISOLATION doesn't flush the TLB at all on user<->kernel switches when PCIDs are enabled. Second, even if it did, the CPU is still free to cache any portion of the paging hierarchy at any time. Without LASS[1], userspace can even _compel_ walks of the kernel portion of the address space, and we don't have any infrastructure to tell if a freed kernel page is exposed in the user copy of the page tables with PTI. Third, (also ignoring PCIDs) there are plenty of instructions between kernel entry and the MOV-to-CR3 that can flush the TLB. All those instructions architecturally permitted to speculatively set Accessed or Dirty bits in any part of the address space. If they run into a free page table page, things get ugly. These accesses are not _likely_. There probably isn't a predictor out there that's going to see a: movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2) and go off trying to dirty memory in the vmalloc() area. But we'd need some backward *and* forward-looking guarantees from our intrepid CPU designers to promise that this kind of thing is safe yesterday, today and tomorrow. I suspect such a guarantee is going to be hard to obtain. 1. https://lkml.kernel.org/r/20230110055204.3227669-1-yian.chen@intel.com