Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp4239167pxf; Tue, 16 Mar 2021 08:43:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzAC+6esVhaYi2vxSO8bH4n4WsKYIObqCxrIF3ognrJ5V1T0JlvG0/UsBB0ehKfJBiGpLgt X-Received: by 2002:aa7:df14:: with SMTP id c20mr36779363edy.197.1615909407244; Tue, 16 Mar 2021 08:43:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1615909407; cv=none; d=google.com; s=arc-20160816; b=OKKm4PIzMNUmR42UR3yI+24i7j3CMrf9tpsNDdZ9JsDLoEk+xggV7T/hbxIY2o9udC Pa4R2hngiCcRoWl6LB1gOE84hIZjgJ3qTZlHzRI680Um54m+w9pmeJmOrZmhpQkcDKYn 9nkM8oFSUMGiheAFe47kfIX1QIQ+zbxMfYHyzTHBAPizZ9g3OejAOcIgecSvnC9GjsX0 ybv4qiPGosXNdxnMKyeTNxAxb1IEi6ahjzXlJ8v/bD5Ip+ZLOItsrxuzUmdLLVVH7+JP 7FU+98fDNR/DJbfSK1XGmR8rYLTx58EU1NYRpv+5cQ0qM/Ih44Gt2iGtNHQUWgNXzgMj IE9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=gQQaJziqU+Sj3b6UF6K35KH3RUkO+iQjNkoGrSros8w=; b=QMUdGWPc+5vqTZUT4otglWAxynzYAhXe+6kYwA4kS4VtetLLJQr3pzGnGK2rBpurA7 3u6K9gl/+oDAyKsk3UirpQTwM++DLyIdfCaMbd4/2MynrTu0WcTQG7yznbZzZfDOvWiC nl6Zm2/HX5wSueaukDn2EE9FZzHQTZShG/qsD4lj/9Yi6yt76SpTQaVt9MAgneanRs1l z7afWBY/phTGdoUMjV2xddeC0CeCSZXaWi/uHn6QySBj5P4Xma09t76DqGGc3YTqcfx8 pcjISXZMT9UZAh0XdYgCplTZnbQOKegwWEwOeb7kKXesF0gPGdDsqIEyTXds40wWwflr WOzw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j3si6503234eds.542.2021.03.16.08.43.04; Tue, 16 Mar 2021 08:43:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237592AbhCPMGL (ORCPT + 99 others); Tue, 16 Mar 2021 08:06:11 -0400 Received: from relay4-d.mail.gandi.net ([217.70.183.196]:50473 "EHLO relay4-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232012AbhCPMFh (ORCPT ); Tue, 16 Mar 2021 08:05:37 -0400 X-Originating-IP: 81.185.168.196 Received: from [192.168.43.237] (196.168.185.81.rev.sfr.net [81.185.168.196]) (Authenticated sender: alex@ghiti.fr) by relay4-d.mail.gandi.net (Postfix) with ESMTPSA id A5AEBE000C; Tue, 16 Mar 2021 12:05:28 +0000 (UTC) Subject: Re: [PATCH] Insert SFENCE.VMA in function set_pte_at for RISCV To: Anup Patel , Andrew Waterman Cc: Jiuyang Liu , Paul Walmsley , Palmer Dabbelt , Albert Ou , Atish Patra , Anup Patel , Andrew Morton , Mike Rapoport , Kefeng Wang , Zong Li , Greentime Hu , linux-riscv , "linux-kernel@vger.kernel.org List" References: <20210316015328.13516-1-liu@jiuyang.me> <20210316034638.16276-1-liu@jiuyang.me> From: Alex Ghiti Message-ID: Date: Tue, 16 Mar 2021 08:05:27 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 3/16/21 ? 4:40 AM, Anup Patel a ?crit?: > On Tue, Mar 16, 2021 at 1:59 PM Andrew Waterman > wrote: >> >> On Tue, Mar 16, 2021 at 12:32 AM Anup Patel wrote: >>> >>> On Tue, Mar 16, 2021 at 12:27 PM Jiuyang Liu wrote: >>>> >>>>> As per my understanding, we don't need to explicitly invalidate local TLB >>>>> in set_pte() or set_pet_at() because generic Linux page table management >>>>> (/mm/*) will call the appropriate flush_tlb_xyz() function after page >>>>> table updates. >>>> >>>> I witnessed this bug in our micro-architecture: set_pte instruction is >>>> still in the store buffer, no functions are inserting SFENCE.VMA in >>>> the stack below, so TLB cannot witness this modification. >>>> Here is my call stack: >>>> set_pte >>>> set_pte_at >>>> map_vm_area >>>> __vmalloc_area_node >>>> __vmalloc_node_range >>>> __vmalloc_node >>>> __vmalloc_node_flags >>>> vzalloc >>>> n_tty_open >>>> I don't find this call stack, what I find is (the other way around): n_tty_open vzalloc __vmalloc_node __vmalloc_node_range __vmalloc_area_node map_kernel_range -> map_kernel_range_noflush flush_cache_vmap Which leads to the fact that we don't have flush_cache_vmap callback implemented: shouldn't we add the sfence.vma here ? Powerpc does something similar with "ptesync" (see below) instruction that seems to do the same as sfence.vma. ptesync: "The ptesync instruction after the Store instruction ensures that all searches of the Page Table that are performed after the ptesync instruction completes will use the value stored" >>>> I think this is an architecture specific code, so /mm/* should >>>> not be modified. >>>> And spec requires SFENCE.VMA to be inserted on each modification to >>>> TLB. So I added code here. >>> >>> The generic linux/mm/* already calls the appropriate tlb_flush_xyz() >>> function defined in arch/riscv/include/asm/tlbflush.h >>> >>> Better to have a write-barrier in set_pte(). >>> >>>> >>>>> Also, just local TLB flush is generally not sufficient because >>>>> a lot of page tables will be used across on multiple HARTs. >>>> >>>> Yes, this is the biggest issue, in RISC-V Volume 2, Privileged Spec v. >>>> 20190608 page 67 gave a solution: >>> >>> This is not an issue with RISC-V privilege spec rather it is more about >>> placing RISC-V fences at right locations. >>> >>>> Consequently, other harts must be notified separately when the >>>> memory-management data structures have been modified. One approach is >>>> to use >>>> 1) a local data fence to ensure local writes are visible globally, >>>> then 2) an interprocessor interrupt to the other thread, >>>> then 3) a local SFENCE.VMA in the interrupt handler of the remote thread, >>>> and finally 4) signal back to originating thread that operation is >>>> complete. This is, of course, the RISC-V analog to a TLB shootdown. >>> >>> I would suggest trying approach#1. >>> >>> You can include "asm/barrier.h" here and use wmb() or __smp_wmb() >>> in-place of local TLB flush. >> >> wmb() doesn't suffice to order older stores before younger page-table >> walks, so that might hide the problem without actually fixing it. > > If we assume page-table walks as reads then mb() might be more > suitable in this case ?? > > ARM64 also has an explicit barrier in set_pte() implementation. They are > doing "dsb(ishst); isb()" which is an inner-shareable store barrier followed > by an instruction barrier. > >> >> Based upon Jiuyang's description, it does sound plausible that we are >> missing an SFENCE.VMA (or TLB shootdown) somewhere. But I don't >> understand the situation well enough to know where that might be, or >> what the best fix is. > > Yes, I agree but set_pte() doesn't seem to be the right place for TLB > shootdown based on set_pte() implementations of other architectures. I agree as "flushing" the TLB after every set_pte() would be very costly, it's better to do it once at the end of the all the updates: like in flush_cache_vmap :) Alex > > Regards, > Anup > >> >> >>> >>>> >>>> In general, this patch didn't handle the G bit in PTE, kernel trap it >>>> to sbi_remote_sfence_vma. do you think I should use flush_tlb_all? >>>> >>>> Jiuyang >>>> >>>> >>>> >>>> >>>> arch/arm/mm/mmu.c >>>> void set_pte_at(struct mm_struct *mm, unsigned long addr, >>>> pte_t *ptep, pte_t pteval) >>>> { >>>> unsigned long ext = 0; >>>> >>>> if (addr < TASK_SIZE && pte_valid_user(pteval)) { >>>> if (!pte_special(pteval)) >>>> __sync_icache_dcache(pteval); >>>> ext |= PTE_EXT_NG; >>>> } >>>> >>>> set_pte_ext(ptep, pteval, ext); >>>> } >>>> >>>> arch/mips/include/asm/pgtable.h >>>> static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, >>>> pte_t *ptep, pte_t pteval) >>>> { >>>> >>>> if (!pte_present(pteval)) >>>> goto cache_sync_done; >>>> >>>> if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval))) >>>> goto cache_sync_done; >>>> >>>> __update_cache(addr, pteval); >>>> cache_sync_done: >>>> set_pte(ptep, pteval); >>>> } >>>> >>>> >>>> Also, just local TLB flush is generally not sufficient because >>>>> a lot of page tables will be used accross on multiple HARTs. >>>> >>>> >>>> On Tue, Mar 16, 2021 at 5:05 AM Anup Patel wrote: >>>>> >>>>> +Alex >>>>> >>>>> On Tue, Mar 16, 2021 at 9:20 AM Jiuyang Liu wrote: >>>>>> >>>>>> This patch inserts SFENCE.VMA after modifying PTE based on RISC-V >>>>>> specification. >>>>>> >>>>>> arch/riscv/include/asm/pgtable.h: >>>>>> 1. implement pte_user, pte_global and pte_leaf to check correspond >>>>>> attribute of a pte_t. >>>>> >>>>> Adding pte_user(), pte_global(), and pte_leaf() is fine. >>>>> >>>>>> >>>>>> 2. insert SFENCE.VMA in set_pte_at based on RISC-V Volume 2, Privileged >>>>>> Spec v. 20190608 page 66 and 67: >>>>>> If software modifies a non-leaf PTE, it should execute SFENCE.VMA with >>>>>> rs1=x0. If any PTE along the traversal path had its G bit set, rs2 must >>>>>> be x0; otherwise, rs2 should be set to the ASID for which the >>>>>> translation is being modified. >>>>>> If software modifies a leaf PTE, it should execute SFENCE.VMA with rs1 >>>>>> set to a virtual address within the page. If any PTE along the traversal >>>>>> path had its G bit set, rs2 must be x0; otherwise, rs2 should be set to >>>>>> the ASID for which the translation is being modified. >>>>>> >>>>>> arch/riscv/include/asm/tlbflush.h: >>>>>> 1. implement get_current_asid to get current program asid. >>>>>> 2. implement local_flush_tlb_asid to flush tlb with asid. >>>>> >>>>> As per my understanding, we don't need to explicitly invalidate local TLB >>>>> in set_pte() or set_pet_at() because generic Linux page table management >>>>> (/mm/*) will call the appropriate flush_tlb_xyz() function after page >>>>> table updates. Also, just local TLB flush is generally not sufficient because >>>>> a lot of page tables will be used accross on multiple HARTs. >>>>> >>>>>> >>>>>> Signed-off-by: Jiuyang Liu >>>>>> --- >>>>>> arch/riscv/include/asm/pgtable.h | 27 +++++++++++++++++++++++++++ >>>>>> arch/riscv/include/asm/tlbflush.h | 12 ++++++++++++ >>>>>> 2 files changed, 39 insertions(+) >>>>>> >>>>>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h >>>>>> index ebf817c1bdf4..5a47c60372c1 100644 >>>>>> --- a/arch/riscv/include/asm/pgtable.h >>>>>> +++ b/arch/riscv/include/asm/pgtable.h >>>>>> @@ -222,6 +222,16 @@ static inline int pte_write(pte_t pte) >>>>>> return pte_val(pte) & _PAGE_WRITE; >>>>>> } >>>>>> >>>>>> +static inline int pte_user(pte_t pte) >>>>>> +{ >>>>>> + return pte_val(pte) & _PAGE_USER; >>>>>> +} >>>>>> + >>>>>> +static inline int pte_global(pte_t pte) >>>>>> +{ >>>>>> + return pte_val(pte) & _PAGE_GLOBAL; >>>>>> +} >>>>>> + >>>>>> static inline int pte_exec(pte_t pte) >>>>>> { >>>>>> return pte_val(pte) & _PAGE_EXEC; >>>>>> @@ -248,6 +258,11 @@ static inline int pte_special(pte_t pte) >>>>>> return pte_val(pte) & _PAGE_SPECIAL; >>>>>> } >>>>>> >>>>>> +static inline int pte_leaf(pte_t pte) >>>>>> +{ >>>>>> + return pte_val(pte) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC); >>>>>> +} >>>>>> + >>>>>> /* static inline pte_t pte_rdprotect(pte_t pte) */ >>>>>> >>>>>> static inline pte_t pte_wrprotect(pte_t pte) >>>>>> @@ -358,6 +373,18 @@ static inline void set_pte_at(struct mm_struct *mm, >>>>>> flush_icache_pte(pteval); >>>>>> >>>>>> set_pte(ptep, pteval); >>>>>> + >>>>>> + if (pte_present(pteval)) { >>>>>> + if (pte_leaf(pteval)) { >>>>>> + local_flush_tlb_page(addr); >>>>>> + } else { >>>>>> + if (pte_global(pteval)) >>>>>> + local_flush_tlb_all(); >>>>>> + else >>>>>> + local_flush_tlb_asid(); >>>>>> + >>>>>> + } >>>>>> + } >>>>>> } >>>>>> >>>>>> static inline void pte_clear(struct mm_struct *mm, >>>>>> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h >>>>>> index 394cfbccdcd9..1f9b62b3670b 100644 >>>>>> --- a/arch/riscv/include/asm/tlbflush.h >>>>>> +++ b/arch/riscv/include/asm/tlbflush.h >>>>>> @@ -21,6 +21,18 @@ static inline void local_flush_tlb_page(unsigned long addr) >>>>>> { >>>>>> __asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"); >>>>>> } >>>>>> + >>>>>> +static inline unsigned long get_current_asid(void) >>>>>> +{ >>>>>> + return (csr_read(CSR_SATP) >> SATP_ASID_SHIFT) & SATP_ASID_MASK; >>>>>> +} >>>>>> + >>>>>> +static inline void local_flush_tlb_asid(void) >>>>>> +{ >>>>>> + unsigned long asid = get_current_asid(); >>>>>> + __asm__ __volatile__ ("sfence.vma x0, %0" : : "r" (asid) : "memory"); >>>>>> +} >>>>>> + >>>>>> #else /* CONFIG_MMU */ >>>>>> #define local_flush_tlb_all() do { } while (0) >>>>>> #define local_flush_tlb_page(addr) do { } while (0) >>>>>> -- >>>>>> 2.30.2 >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> linux-riscv mailing list >>>>>> linux-riscv@lists.infradead.org >>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv >>>>> >>>>> Regards, >>>>> Anup >>> >>> Regards, >>> Anup > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv >