Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp3213161rwo; Fri, 4 Aug 2023 00:51:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGpEk6dt22r4WvI9shah/wTwKmX8xqPh+ftRz5/xoQvpq5wqTR/cT+wTBWqX6Pl0A+cV9lo X-Received: by 2002:a05:6a20:9719:b0:13f:c40c:379 with SMTP id hr25-20020a056a20971900b0013fc40c0379mr787289pzc.13.1691135467980; Fri, 04 Aug 2023 00:51:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691135467; cv=none; d=google.com; s=arc-20160816; b=vjhJ40+35hnf1fKbGC4Ew84tYFjH36msSMp2EX7JLziFBj9RCyQ2KDI2aA0eUH23Ul 1pD5Zp+YzSNS53lrIzdt1Qx15a1TRnsKINQR92K8Z5D9zJ+ONSGWPdBO3RRI1rAs9E/H yCc5EUfFezF45LfrcnPqMIsipyB4JH/1El1PTBu6hDDXvwQrZ4UOvzGKESTb8m+HlDQV gkywd8WDca+1V6WOnNrduc60zy+ZFmfce15oa0jMxzrZL+FjvbYM+O0zC24TzJ1SzNRP 7aCBMu5cAQp90gVO7AtAbdTMwxiEPOjofw3/We6Q9TUAz5niU5LfTZ4WkDNQTfxhmBq8 Bghw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=8ud3J10ezS84e5o0wdNOUEJoqenX6xK271OH+cEW2Rs=; fh=XkC4r1rRhib9MlR0uTzr9ziiO/AWnsDzaYsaIKeEtDw=; b=EMJC3xLx0SkQctRZs/kyCKin4iBfnfyGuzc1L1KkzpIPee8Ebss8AXeL1OtqnXlqIF iveR+Y6ehVOsO0dd/FZPzoBZglU2t9ZpBmpFjSIDSBq9Tr7ABx/3CKmOrB6TP9v+It3o asPbgOdefGMy4lmg7xPbZykkfunuGCMDA9Hq68gUx7PAEze51w2hPzNdy7vNRT1Ud/5P OeM2RAWPVW2n075EpsV56YukTcBk91JLdtAKXp1n1QfboOS/P9qeiLwXOmuBBq+GiXAI o380LBQ1Ipm/N0X9KrfI6nIfLB/TT0NsrczVgeADpY6PwiA5hP3MxGgHe7zLn9WsuAw7 Nueg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id be25-20020a056a001f1900b006825f34417fsi1263068pfb.238.2023.08.04.00.50.54; Fri, 04 Aug 2023 00:51:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234032AbjHDHsX (ORCPT + 99 others); Fri, 4 Aug 2023 03:48:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229882AbjHDHsV (ORCPT ); Fri, 4 Aug 2023 03:48:21 -0400 Received: from Atcsqr.andestech.com (60-248-80-70.hinet-ip.hinet.net [60.248.80.70]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9DB4E421E for ; Fri, 4 Aug 2023 00:48:18 -0700 (PDT) Received: from mail.andestech.com (ATCPCS16.andestech.com [10.0.1.222]) by Atcsqr.andestech.com with ESMTP id 3747m8i9028840; Fri, 4 Aug 2023 15:48:08 +0800 (+08) (envelope-from dylan@andestech.com) Received: from atctrx.andestech.com (10.0.15.173) by ATCPCS16.andestech.com (10.0.1.222) with Microsoft SMTP Server id 14.3.498.0; Fri, 4 Aug 2023 15:48:05 +0800 Date: Fri, 4 Aug 2023 15:48:05 +0800 From: Dylan Jhong To: Alexandre Ghiti CC: Conor Dooley , Guo Ren , Paul Walmsley , Palmer Dabbelt , Albert Ou , =?utf-8?B?QmrDtnJuIFTDtnBlbA==?= , , Subject: Re: [PATCH -fixes] riscv: Implement flush_cache_vmap() Message-ID: References: <20230725132246.817726-1-alexghiti@rivosinc.com> <20230803-stadium-unusable-6cf00e35ec22@wendy> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/2.1.4 (2021-12-11) X-Originating-IP: [10.0.15.173] X-DNSRBL: X-SPAM-SOURCE-CHECK: pass X-MAIL: Atcsqr.andestech.com 3747m8i9028840 X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 03, 2023 at 11:48:36AM +0200, Alexandre Ghiti wrote: > On Thu, Aug 3, 2023 at 11:25 AM Conor Dooley wrote: > > > > On Thu, Aug 03, 2023 at 05:14:15PM +0800, dylan wrote: > > > On Sun, Jul 30, 2023 at 01:08:17AM -0400, Guo Ren wrote: > > > > On Tue, Jul 25, 2023 at 9:22 AM Alexandre Ghiti wrote: > > > > > > > > > > The RISC-V kernel needs a sfence.vma after a page table modification: we > > > > > used to rely on the vmalloc fault handling to emit an sfence.vma, but > > > > > commit 7d3332be011e ("riscv: mm: Pre-allocate PGD entries for > > > > > vmalloc/modules area") got rid of this path for 64-bit kernels, so now we > > > > > need to explicitly emit a sfence.vma in flush_cache_vmap(). > > > > > > > > > > Note that we don't need to implement flush_cache_vunmap() as the generic > > > > > code should emit a flush tlb after unmapping a vmalloc region. > > > > > > > > > > Fixes: 7d3332be011e ("riscv: mm: Pre-allocate PGD entries for vmalloc/modules area") > > > > > Signed-off-by: Alexandre Ghiti > > > > > --- > > > > > arch/riscv/include/asm/cacheflush.h | 4 ++++ > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h > > > > > index 8091b8bf4883..b93ffddf8a61 100644 > > > > > --- a/arch/riscv/include/asm/cacheflush.h > > > > > +++ b/arch/riscv/include/asm/cacheflush.h > > > > > @@ -37,6 +37,10 @@ static inline void flush_dcache_page(struct page *page) > > > > > #define flush_icache_user_page(vma, pg, addr, len) \ > > > > > flush_icache_mm(vma->vm_mm, 0) > > > > > > > > > > +#ifdef CONFIG_64BIT > > > > > +#define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) > > > > Sorry, I couldn't agree with the above in a PIPT cache machine. It's > > > > not worth for. > > > > > > > > It would reduce the performance of vmap_pages_range, > > > > ioremap_page_range ... API, which may cause some drivers' performance > > > > issues when they install/uninstall memory frequently. > > > > > > > > > > Hi All, > > > > > > I think functional correctness should be more important than system performance > > > in this case. The "preventive" SFENCE.VMA became necessary due to the RISC-V > > > specification allowing invalidation entries to be cached in the TLB. > > > > We are at -rc4 and this stuff is broken. Taking the bigger hammer, which > > can be reverted later when a more targeted fix shows up, to make sure > > that v6.5 doesn't end up broken, sounds rather prudent. Otherwise, the > > original commit should probably be reverted. > > The original commit that removed vmalloc_fault() is required, handling > vmalloc faults in the page fault path is not possible (see the links > in the description of 7d3332be011e and the example that I gave in the > thread https://lore.kernel.org/linux-riscv/dc26625b-6658-c078-76d2-7e975a04b1d4@ghiti.fr/). > > I totally agree with Dylan that we'll work (I'm currently working on > that) on the performance side of the problem in the next release, we > need correctness and for that we need a preventive global sfence.vma > as we have no means (for now) to distinguish between uarch that cache > or not invalid entries. > > > > > > The problem[1] we are currently encountering is caused by not updating the TLB > > > after the page table is created, and the solution to this problem can only be > > > solved by updating the TLB immediately after the page table is created. > > > > > > There are currently two possible approaches to flush TLB: > > > 1. Flush TLB in flush_cache_vmap() > > > 2. Flush TLB in arch_sync_kernel_mappings() > > > > > > But I'm not quite sure if it's a good idea to operate on the TLB inside flush_cache_vmap(). > > > The name of this function indicates that it should be related to cache operations, maybe > > > it would be more appropriate to do TLB flush in arch_sync_kernel_mappings()? > > TLDR: The downsides to implementing arch_sync_kernel_mappings() > instead of flush_cache_vmap(): > > - 2 global flushes for vunmap instead of 1 for flush_cache_vmap() > - flushes the tlb in the noflush suffixed functions so it prevents any > flush optimization (ie: a loop of vmap_range_noflush() without flush > and then a final flush afterwards) > > So I'd favour the flush_cache_vmap() implementation which seems > lighter. powerpc does that > https://elixir.bootlin.com/linux/latest/source/arch/powerpc/include/asm/cacheflush.h#L27 > (but admits that it may not be the right place) > > Here is the long story (my raw notes): > > * arch_sync_kernel_mappings() is called from: > - _apply_to_page_range(): would only emit global sfence.vma if vmalloc > addresses, I guess that's ok. > - __vunmap_range_noflush(): it is noted here > https://elixir.bootlin.com/linux/latest/source/mm/vmalloc.c#L406 that > any caller must call flush_tlb_kernel_range(). Then the implementation > of arch_sync_kernel_mappings() would result in 2 global tlb flushes. > - vmap_range_noflush(): does not fit well with the noflush() suffix. > > * flush_cache_vmap() is called from: > - kasan_populate_vmalloc(): legit since it bypasses vmap api (but > called right a apply_to_page_range() so your patch would work here) > - kmsan_vunmap_range_noflush(): called twice for the mappings kmsan > establishes and flush_tlb_kernel_range() must be called afterwards => > 3 global tlb flushes but the 3 are needed as they target different > addresses. Implementing only arch_sync_kernel_mappings() would result > in way more global flushes (see the loop here > https://elixir.bootlin.com/linux/latest/source/mm/kmsan/hooks.c#L151 > where __vmap_pages_range_noflush() would result in more > flush_tlb_all()) > - kmsan_vmap_pages_range_noflush(): here we would flush twice, but > same thing for the arch_sync_kernel_mappings() implementation. > - ioremap_page_range(): legit, same as arch_sync_kernel_mappings() > implementation. > - vmap_pages_range(): legit, same as arch_sync_kernel_mappings() implementation. > > Let me know what you think! > > Alex > Hi Alex, Thank you for the detailed explanation. It is indeed undeniable that in certain situations, there might be a possibility of repeated flushing TLB. But I think there are some potential problem in flush_cache_vmap(). In most case, vmap_range_noflush() and flush_cache_vmap() will appear at the same time, so it should be no problem to choose one of them to do the TLB flush. But flush_cache_vmap() does not cover all the places where apply_to_page_range() appears (please correct me if I'm wrong), such as vmap_pfn()[1]. The function you mentioned here, each will eventually call: vmap_range_noflush() -> arch_sync_kernel_mappings() -> TLB Flush As for the performance, because the current parameter of flush_tlb_page() needs to pass *vma, we cannot pass in this parameter so we can only choose flush_tlb_all(). If it can be changed to flush_tlb_page() in the future, the performance should be improved. [1]: https://elixir.bootlin.com/linux/v6.5-rc4/source/mm/vmalloc.c#L2977 Best regards, Dylan Jhong > > > > > > [1]: http://lists.infradead.org/pipermail/linux-riscv/2023-August/037503.html > >