Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp3829657rdb; Mon, 11 Dec 2023 00:41:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IHIdkzp+tY+T4kHmz9iLt9PTu8tvXD4ZmlDnGeY2UbfTrXgxIr6GkaM3nUMgahtkzhw3D/O X-Received: by 2002:a17:902:d4ce:b0:1d0:53f2:9be with SMTP id o14-20020a170902d4ce00b001d053f209bemr1527488plg.46.1702284119511; Mon, 11 Dec 2023 00:41:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702284119; cv=none; d=google.com; s=arc-20160816; b=cA48EAiqfEV7317sOfabmRY7XcE9U6Gvq82O/KsUMdlQVk7ME5+zeDGGh4D/fJHIBM YLxs7Ma+hdN+5ohgOF9wNnvPkWV1H8GV4EnsovMCwnQv0K6fSvFAP2cc8WYpjGRnvtac YT0KelUZiOlgdkVctI3oJrmENEl2q3R4jW1kfoutDOH2FdVIuPSgNsa2icpW8PUI3esT Qj9N0aFRkUaeqZKNoX0NtLSgjD5B8av7hQLRbS/156w2B/Yio7fx3IGHUbT66VeqHQsH 3p0JHxaGIhcYYfg8qklMgJZEg5RcerhRrO+F4Y5nkR2+jeA04xkMCIXEc9RxXScQvKhy NrPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Tkjv+xjYbVFFW8DO6lZ0zMuWRCracKOL4tGOCM2yQBs=; fh=W3XohCKAy5WsuqYNtUjiuAJXQsv18JAY1XBlhFdKYOg=; b=t0FVLOEuyzjQmig39CISFqjn3Yr0AAAr94e6V++gm2KmtjIbD/2TNE4rQEpFu6+00v bag5+p7oxWVuTRFuxdJLQF7iyHJ81Uteuhncu/EMRdhVgKSCI2ByhDaNb1AALbW+klJW tTXNzLoY4HDiHt4rKKj5P6/J1eQb3j/2GBjmSgW/36IgSJA5Es/f0xLf+ZLyN1Db0zhG q/bBhesYnU2NcnRxkDXL32CYIvQ9wmIjikM+o2BbuOmI+lHC5qRS87Qzbp3DIVkFesHh 1HLeu7AB2g7wTg/I+0ZYOCFJC/FH9QBG/JvOuK1r5ehq/vAKbrIvTzprS5YMtFgCFIAl IO1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SqRbPNZi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id d14-20020a170903230e00b001d0afec5d5esi5948531plh.453.2023.12.11.00.41.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 00:41:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SqRbPNZi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 7E1C080937B8; Mon, 11 Dec 2023 00:41:56 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234183AbjLKIlm (ORCPT + 99 others); Mon, 11 Dec 2023 03:41:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234185AbjLKIli (ORCPT ); Mon, 11 Dec 2023 03:41:38 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F52712B for ; Mon, 11 Dec 2023 00:41:43 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0E446C433C8 for ; Mon, 11 Dec 2023 08:41:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702284103; bh=yUXy5KMGAXSGc2rYF6QzrrccUQw8a5P/gu1jqOP4kTY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=SqRbPNZi3Tzo1WjiBzZGiLtX2/w8UDGCBsHDl9dsCqAqNgnLrfiWj3xXdNW4M4sfa nHqHvcspUfdyjSS0lY/QEFAWu9Sp1x+BxCI/frlVeS02S4YfVi/GIREBpfkEWnsjUC uS44fMmoINQKQ9HQSIrnhPDHeWAHw6ElZsw9n7BVAZy1I26UKzmT4vXSuT0WWtjJfR DswCh7Sdg1U0EaJPbVZR7/yZmr8bOfCdgLZO1nqEfF8U1FQBQGmfqH4TYwjZxdPEZ1 uu7e+gOMcZ/f1wUBsiX/cjYcV4kEqV/ACUWX5Qa34BBtfWuNre0R4kPN3wOFl2NW7o dOk8w84gLVLpw== Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a1f653e3c3dso384523566b.2 for ; Mon, 11 Dec 2023 00:41:42 -0800 (PST) X-Gm-Message-State: AOJu0YzbGyqs6hhCUtAu9XFpw437078BgONf5IHlY+kyeDMam/fVidLR t4GGkIeEiyVG/M24BBLmxs9731wqHOq7r7gVbCo= X-Received: by 2002:a17:907:7f1f:b0:a19:d40a:d1fd with SMTP id qf31-20020a1709077f1f00b00a19d40ad1fdmr1448805ejc.201.1702284101520; Mon, 11 Dec 2023 00:41:41 -0800 (PST) MIME-Version: 1.0 References: <20231208151036.2458921-1-guoren@kernel.org> In-Reply-To: From: Guo Ren Date: Mon, 11 Dec 2023 16:41:28 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] riscv: pgtable: Enhance set_pte to prevent OoO risk To: Alexandre Ghiti Cc: paul.walmsley@sifive.com, palmer@dabbelt.com, akpm@linux-foundation.org, catalin.marinas@arm.com, willy@infradead.org, david@redhat.com, muchun.song@linux.dev, will@kernel.org, peterz@infradead.org, rppt@kernel.org, paulmck@kernel.org, atishp@atishpatra.org, anup@brainfault.org, alex@ghiti.fr, mike.kravetz@oracle.com, dfustini@baylibre.com, wefu@redhat.com, jszhang@kernel.org, falcon@tinylab.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Guo Ren Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 11 Dec 2023 00:41:56 -0800 (PST) On Mon, Dec 11, 2023 at 1:52=E2=80=AFPM Alexandre Ghiti wrote: > > Hi Guo, > > On Fri, Dec 8, 2023 at 4:10=E2=80=AFPM wrote: > > > > From: Guo Ren > > > > When changing from an invalid pte to a valid one for a kernel page, > > there is no need for tlb_flush. It's okay for the TSO memory model, but > > there is an OoO risk for the Weak one. eg: > > > > sd t0, (a0) // a0 =3D pte address, pteval is changed from invalid to va= lid > > ... > > ld t1, (a1) // a1 =3D va of above pte > > > > If the ld instruction is executed speculatively before the sd > > instruction. Then it would bring an invalid entry into the TLB, and whe= n > > the ld instruction retired, a spurious page fault occurred. Because the > > vmemmap has been ignored by vmalloc_fault, the spurious page fault woul= d > > cause kernel panic. > > > > This patch was inspired by the commit: 7f0b1bf04511 ("arm64: Fix barrie= rs > > used for page table modifications"). For RISC-V, there is no requiremen= t > > in the spec to guarantee all tlb entries are valid and no requirement t= o > > PTW filter out invalid entries. Of course, micro-arch could give a more > > robust design, but here, use a software fence to guarantee. > > > > Signed-off-by: Guo Ren > > Signed-off-by: Guo Ren > > --- > > arch/riscv/include/asm/pgtable.h | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/= pgtable.h > > index 294044429e8e..2fae5a5438e0 100644 > > --- a/arch/riscv/include/asm/pgtable.h > > +++ b/arch/riscv/include/asm/pgtable.h > > @@ -511,6 +511,13 @@ static inline int pte_same(pte_t pte_a, pte_t pte_= b) > > static inline void set_pte(pte_t *ptep, pte_t pteval) > > { > > *ptep =3D pteval; > > + > > + /* > > + * Only if the new pte is present and kernel, otherwise TLB > > + * maintenance or update_mmu_cache() have the necessary barrier= s. > > + */ > > + if (pte_val(pteval) & (_PAGE_PRESENT | _PAGE_GLOBAL)) > > + RISCV_FENCE(rw,rw); > > Only a sfence.vma can guarantee that the PTW actually sees a new > mapping, a fence is not enough. That being said, new kernel mappings > (vmalloc ones) are correctly handled in the kernel by using > flush_cache_vmap(). Did you observe something that this patch fixes? Thx for the reply! The sfence.vma is too expensive, so the situation is tricky. See the arm64 commit: 7f0b1bf04511 ("arm64: Fix barriers used for page table modifications"), which is similar. That is, linux assumes invalid pte won't get into TLB. Think about memory hotplug: mm/sparse.c: sparse_add_section() { ... memmap =3D section_activate(nid, start_pfn, nr_pages, altmap, pgmap= ); if (IS_ERR(memmap)) return PTR_ERR(memmap); /* * Poison uninitialized struct pages in order to catch invalid flag= s * combinations. */ page_init_poison(memmap, sizeof(struct page) * nr_pages); ... } The section_activate would use set_pte to setup vmemmap, and page_init_poison would access these pages' struct. That means: sd t0, (a0) // a0 =3D struct page's pte address, pteval is changed from invalid to valid ... lw/sw t1, (a1) // a1 =3D va of struct page If the lw/sw instruction is executed speculatively before the set_pte, we need a fence to prevent this. > > Thanks, > > Alex > > > } > > > > void flush_icache_pte(pte_t pte); > > -- > > 2.40.1 > > --=20 Best Regards Guo Ren