Received: by 2002:ab2:6991:0:b0:1f7:f6c3:9cb1 with SMTP id v17csp434428lqo; Wed, 8 May 2024 04:35:37 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVmvPUZ3G1WQciLfId+9CqMzJSSS2MgH9OnjWffs4MihGKGhIokqhM1fHE4lTxHaJd77RfErhbyfyzOcmg3Sg+r273yWzkqUYFJg0boCA== X-Google-Smtp-Source: AGHT+IHgJ2OtBGT8MmUvTR5uGxCG28yy+DpffsPokJeMhttwGe676MY9bfbip0lp2BnjWf2xSyj9 X-Received: by 2002:a17:902:b208:b0:1ea:1d2:90bc with SMTP id d9443c01a7336-1eeb0593923mr21259165ad.24.1715168137187; Wed, 08 May 2024 04:35:37 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715168137; cv=pass; d=google.com; s=arc-20160816; b=fYaA4GCPZYW2Pgyw62uGj2U8G3Okk4t1BVOvE1kXQb8a+PvzEYUpbxXvzK3gvV+Dpc DN+DjmBmmNWJo4QyjN37P4jMsEEiTHx+1SR6JtcOllrhAGSRmXX9hoWhYGMLDqU4YFK4 aEwkNEZwXnhu5WOOd/owEc3SYhWZArvroTCQhqmCX3ZDVUSv/mjywVpPwr48v7uov+WO L3O6atMs96va3J/yewyv4ToXn0HixcOs7NIshjO7dwSnMs/TW8I92K3Ry8qppDN6DZ78 zHwFjfU1xlWJhlGLFwH3+jMMkbWAKyTKSKf66mJzdvtKb+JgsOHzDK+CWze5/cLl4fgU gnFw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=eptZ1TC6/9ugjFjSR/N9yJ6dEI/3FvKeew1j70YJt7s=; fh=H8n2IfXczHoRFQwtFimtpWlRE58h2+6Yru4Y+09PFvI=; b=lDDZwJot6cvOEbfJBbfF2qjmOGzjjll5Zyj7gNICttZMfi3K7f9hjq/Hpp7lipxJC7 B73z7z4GI6u/WEGcHjk7z5+E48cqclNSO3g8bAFGyDZlf7noUjmdOSW707D0GkMXiSBB ySwUfqeRyY93nkG3LEEorsEkVwZPvk5iZzP723TcRWTOdQX8l+PqVEyTdlpfCwt4Bv3d 2o5UZ+fiLH7BLxEt+x/0PJaFU24wzPIhOCGRE7y0N/SIGeDQjksTj65qZW6MIoF6K7T6 qzgCRG6Z9HI96KmIKhj2ydrM0IkpoVgwiW5jqLXKvObi55Cp+AsSZYvO1U1Mry3tem6B bwvg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=yYHCS9+d; arc=pass (i=1 spf=pass spfdomain=rivosinc.com dkim=pass dkdomain=rivosinc-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-173146-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-173146-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id im23-20020a170902bb1700b001eb5472b214si11957725plb.619.2024.05.08.04.35.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 May 2024 04:35:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-173146-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=yYHCS9+d; arc=pass (i=1 spf=pass spfdomain=rivosinc.com dkim=pass dkdomain=rivosinc-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-173146-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-173146-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 8C090282F8A for ; Wed, 8 May 2024 11:35:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6D03982484; Wed, 8 May 2024 11:35:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b="yYHCS9+d" Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61EBF45018 for ; Wed, 8 May 2024 11:35:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.42 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715168129; cv=none; b=XlWQyxuj2BEgjli+9+y2XVQGnNQqObUxlQgMqafpLx6Uev4LS3hqNPA0wa15LrMPOYumzjVN0fFp1K3sVfIYSbSI3mdP8a0+aiqKU7Ij4+VH1yHTOjM7oHkJfdeJqX2GwFxaHcH/io5uYjvfTYfI+6jm3egXi/qrFDZWm+bhKb0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715168129; c=relaxed/simple; bh=8ZxiyUP5gIiH0WRirWNIF7sS4bbZ6hp/AtvYaVBucsk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=EqmbF5fiA0aGa+jD4PZ8aus8ppMLE7Bx0tBxx2MU0Q5Zsf1bPPfQv0AT/EGKdw+UIZ1aQD7f0LfkPawBOCUoQT4C0wUMZYD/NHayP3VsxwpGqaZnWFt8l6QcyGvsJS3bJ7/upsyqiY2FgYXL/dxXAcUDokvTuv0Q9ECe3DpwYgo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com; spf=pass smtp.mailfrom=rivosinc.com; dkim=pass (2048-bit key) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.b=yYHCS9+d; arc=none smtp.client-ip=209.85.167.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-520f9d559f6so2919891e87.3 for ; Wed, 08 May 2024 04:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1715168125; x=1715772925; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eptZ1TC6/9ugjFjSR/N9yJ6dEI/3FvKeew1j70YJt7s=; b=yYHCS9+dShWa/gMiXUSIj42Vfz+IBGsXlLmotMSwb3jh1OJ78WB1NweiWIUWq/FlOU s6123aXt6+bBjhFtSB5jT332qBNkqChf6+/qXzXSCf/iy43wFkhU8YOxK6TJaSw4qAM9 LM0cZotoYL7jqQuCOROsQ538cisshLeE/vgqzmgCOk8o3+Cs3BjyLddrQ3syZhD6XBoO x79RYoi+Dw2ncRewQQCFv1GfqNsdPrKOwywgLLsW/8dYrW4O4vilYzGnOs8Lj8rxWEMA /0xlT1Z2HAkozdGloyil/uINwJsxq7CVqcfYvIRaRBuwoqTTQ5DVO7LO7e2V0LLzNuNq /5Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715168125; x=1715772925; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eptZ1TC6/9ugjFjSR/N9yJ6dEI/3FvKeew1j70YJt7s=; b=FiDDhIfxedYL1mHzTIUksDCpyZcaA8VbIpSoWN1645OrwKAZ7ZPuUiiZh/h7vfw9w4 wIoSD92VkYCxC56CVn2XPTBipheOkXdUq8rQ0VjEJMde8zcS6OIoQ8bCADqjkJYaejcm aGsSZJOxeptF7LaiX8RAC+GQ70USEEjpGwRxB2WepbhBMCXt0/A7jhd1Skd5VkdW7fMR h9EVflF7cQBVYMrn3tM05PDDu2PnP83An+0ELTyVGW981KQyKwmxLbJ81NjmZwtjdTdK 5w0z4ARy6Y6Ds9RJINmIHearxJ3z6fHSUcZIZXAZT+PhreZo5hJHl1PlITCYA3ld3NKT 8tpg== X-Forwarded-Encrypted: i=1; AJvYcCUxNTZUu97r9X2vuj8PBs39DTi04j+V4T7rqu/WFcvKLQFvQ0nYYF8u80UmNPRoy2PwbS5IDIbjZpHyRCMLb3QZeJYsjL4hRmEi10cz X-Gm-Message-State: AOJu0YwOfEER3fXpzcci4S8f93QT2K6OQFdzbF+hmyCS34Kl2TH83VMj 62VfYAi2u1HKxEYVk7zUvpRTrKryNb3on2K7dA+7GewcVTNY4J5NHTSGYEXirXU= X-Received: by 2002:a05:6512:12c3:b0:520:9775:5d4b with SMTP id 2adb3069b0e04-5217c3714demr2111385e87.13.1715168125494; Wed, 08 May 2024 04:35:25 -0700 (PDT) Received: from alex-rivos.ba.rivosinc.com (amontpellier-656-1-456-62.w92-145.abo.wanadoo.fr. [92.145.124.62]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-41facbd295fsm1760585e9.36.2024.05.08.04.35.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 May 2024 04:35:25 -0700 (PDT) From: Alexandre Ghiti To: Catalin Marinas , Will Deacon , Ryan Roberts , Mark Rutland , Paul Walmsley , Palmer Dabbelt , Albert Ou , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, linux-mm@kvack.org Cc: Alexandre Ghiti Subject: [PATCH RESEND v2 1/9] riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code Date: Wed, 8 May 2024 13:34:11 +0200 Message-Id: <20240508113419.18620-2-alexghiti@rivosinc.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240508113419.18620-1-alexghiti@rivosinc.com> References: <20240508113419.18620-1-alexghiti@rivosinc.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit The core mm code expects to be able to extract the pfn from a pte. NAPOT mappings work differently since its ptes actually point to the first pfn of the mapping, the other bits being used to encode the size of the mapping. So modify ptep_get() so that it returns a pte value that contains the *real* pfn (which is then different from what the HW expects) and right before storing the ptes to the page table, reset the pfn LSBs to the size of the mapping. And make sure that all NAPOT mappings are set using set_ptes(). Signed-off-by: Alexandre Ghiti --- arch/riscv/include/asm/pgtable-64.h | 11 +++ arch/riscv/include/asm/pgtable.h | 105 ++++++++++++++++++++++++++-- arch/riscv/mm/hugetlbpage.c | 38 +++++----- 3 files changed, 128 insertions(+), 26 deletions(-) diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h index 221a5c1ee287..9fe076fc503e 100644 --- a/arch/riscv/include/asm/pgtable-64.h +++ b/arch/riscv/include/asm/pgtable-64.h @@ -106,6 +106,17 @@ enum napot_cont_order { #define napot_cont_mask(order) (~(napot_cont_size(order) - 1UL)) #define napot_pte_num(order) BIT(order) +static inline bool is_napot_order(unsigned int order) +{ + unsigned int napot_order; + + for_each_napot_order(napot_order) + if (order == napot_order) + return true; + + return false; +} + #ifdef CONFIG_RISCV_ISA_SVNAPOT #define HUGE_MAX_HSTATE (2 + (NAPOT_ORDER_MAX - NAPOT_CONT_ORDER_BASE)) #else diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 9f8ea0e33eb1..268c828f5152 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -297,6 +297,8 @@ static inline unsigned long pte_napot(pte_t pte) return pte_val(pte) & _PAGE_NAPOT; } +#define pte_valid_napot(pte) (pte_present(pte) && pte_napot(pte)) + static inline pte_t pte_mknapot(pte_t pte, unsigned int order) { int pos = order - 1 + _PAGE_PFN_SHIFT; @@ -306,6 +308,12 @@ static inline pte_t pte_mknapot(pte_t pte, unsigned int order) return __pte((pte_val(pte) & napot_mask) | napot_bit | _PAGE_NAPOT); } +/* pte at entry must *not* encode the mapping size in the pfn LSBs. */ +static inline pte_t pte_clear_napot(pte_t pte) +{ + return __pte(pte_val(pte) & ~_PAGE_NAPOT); +} + #else static __always_inline bool has_svnapot(void) { return false; } @@ -315,17 +323,14 @@ static inline unsigned long pte_napot(pte_t pte) return 0; } +#define pte_valid_napot(pte) false + #endif /* CONFIG_RISCV_ISA_SVNAPOT */ /* Yields the page frame number (PFN) of a page table entry */ static inline unsigned long pte_pfn(pte_t pte) { - unsigned long res = __page_val_to_pfn(pte_val(pte)); - - if (has_svnapot() && pte_napot(pte)) - res = res & (res - 1UL); - - return res; + return __page_val_to_pfn(pte_val(pte)); } #define pte_page(x) pfn_to_page(pte_pfn(x)) @@ -525,9 +530,91 @@ static inline void __set_pte_at(struct mm_struct *mm, pte_t *ptep, pte_t pteval) #define PFN_PTE_SHIFT _PAGE_PFN_SHIFT +#ifdef CONFIG_RISCV_ISA_SVNAPOT +static inline int arch_contpte_get_num_contig(pte_t *ptep, unsigned long size, + size_t *pgsize) +{ + pte_t __pte; + + /* We must read the raw value of the pte to get the size of the mapping */ + __pte = READ_ONCE(*ptep); + + if (pgsize) { + if (size >= PGDIR_SIZE) + *pgsize = PGDIR_SIZE; + else if (size >= P4D_SIZE) + *pgsize = P4D_SIZE; + else if (size >= PUD_SIZE) + *pgsize = PUD_SIZE; + else if (size >= PMD_SIZE) + *pgsize = PMD_SIZE; + else + *pgsize = PAGE_SIZE; + } + + /* Make sure __pte is not a swap entry */ + if (pte_valid_napot(__pte)) + return napot_pte_num(napot_cont_order(__pte)); + + return 1; +} +#endif + +static inline pte_t ptep_get(pte_t *ptep) +{ + pte_t pte = READ_ONCE(*ptep); + +#ifdef CONFIG_RISCV_ISA_SVNAPOT + /* + * The pte we load has the N bit set and the size of the mapping in + * the pfn LSBs: keep the N bit and replace the mapping size with + * the *real* pfn since the core mm code expects to find it there. + * The mapping size will be reset just before being written to the + * page table in set_ptes(). + */ + if (unlikely(pte_valid_napot(pte))) { + unsigned int order = napot_cont_order(pte); + int pos = order - 1 + _PAGE_PFN_SHIFT; + unsigned long napot_mask = ~GENMASK(pos, _PAGE_PFN_SHIFT); + pte_t *orig_ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * napot_pte_num(order)); + + pte = __pte((pte_val(pte) & napot_mask) + ((ptep - orig_ptep) << _PAGE_PFN_SHIFT)); + } +#endif + + return pte; +} +#define ptep_get ptep_get + static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval, unsigned int nr) { +#ifdef CONFIG_RISCV_ISA_SVNAPOT + if (unlikely(pte_valid_napot(pteval))) { + unsigned int order = ilog2(nr); + + if (!is_napot_order(order)) { + /* + * Something's weird, we are given a NAPOT pte but the + * size of the mapping is not a known NAPOT mapping + * size, so clear the NAPOT bit and map this without + * NAPOT support: core mm only manipulates pte with the + * real pfn so we know the pte is valid without the N + * bit. + */ + pr_err("Incorrect NAPOT mapping, resetting.\n"); + pteval = pte_clear_napot(pteval); + } else { + /* + * NAPOT ptes that arrive here only have the N bit set + * and their pfn does not contain the mapping size, so + * set that here. + */ + pteval = pte_mknapot(pteval, order); + } + } +#endif + page_table_check_ptes_set(mm, ptep, pteval, nr); for (;;) { @@ -535,6 +622,12 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, if (--nr == 0) break; ptep++; + +#ifdef CONFIG_RISCV_ISA_SVNAPOT + if (unlikely(pte_valid_napot(pteval))) + continue; +#endif + pte_val(pteval) += 1 << _PAGE_PFN_SHIFT; } } diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index 5ef2a6891158..fe8067ee71b4 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -256,8 +256,7 @@ void set_huge_pte_at(struct mm_struct *mm, clear_flush(mm, addr, ptep, pgsize, pte_num); - for (i = 0; i < pte_num; i++, ptep++, addr += pgsize) - set_pte_at(mm, addr, ptep, pte); + set_ptes(mm, addr, ptep, pte, pte_num); } int huge_ptep_set_access_flags(struct vm_area_struct *vma, @@ -267,16 +266,16 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, int dirty) { struct mm_struct *mm = vma->vm_mm; - unsigned long order; + size_t pgsize; pte_t orig_pte; - int i, pte_num; + int pte_num; if (!pte_napot(pte)) return ptep_set_access_flags(vma, addr, ptep, pte, dirty); - order = napot_cont_order(pte); - pte_num = napot_pte_num(order); - ptep = huge_pte_offset(mm, addr, napot_cont_size(order)); + pte_num = arch_contpte_get_num_contig(ptep, 0, &pgsize); + ptep = huge_pte_offset(mm, addr, pte_num * pgsize); + orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num); if (pte_dirty(orig_pte)) @@ -285,8 +284,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, if (pte_young(orig_pte)) pte = pte_mkyoung(pte); - for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) - set_pte_at(mm, addr, ptep, pte); + set_ptes(mm, addr, ptep, pte, pte_num); return true; } @@ -301,7 +299,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, if (!pte_napot(orig_pte)) return ptep_get_and_clear(mm, addr, ptep); - pte_num = napot_pte_num(napot_cont_order(orig_pte)); + pte_num = arch_contpte_get_num_contig(ptep, 0, NULL); return get_clear_contig(mm, addr, ptep, pte_num); } @@ -311,24 +309,23 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, pte_t *ptep) { pte_t pte = ptep_get(ptep); - unsigned long order; + size_t pgsize; pte_t orig_pte; - int i, pte_num; + int pte_num; if (!pte_napot(pte)) { ptep_set_wrprotect(mm, addr, ptep); return; } - order = napot_cont_order(pte); - pte_num = napot_pte_num(order); - ptep = huge_pte_offset(mm, addr, napot_cont_size(order)); + pte_num = arch_contpte_get_num_contig(ptep, 0, &pgsize); + ptep = huge_pte_offset(mm, addr, pte_num * pgsize); + orig_pte = get_clear_contig_flush(mm, addr, ptep, pte_num); orig_pte = pte_wrprotect(orig_pte); - for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) - set_pte_at(mm, addr, ptep, orig_pte); + set_ptes(mm, addr, ptep, orig_pte, pte_num); } pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, @@ -341,7 +338,7 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, if (!pte_napot(pte)) return ptep_clear_flush(vma, addr, ptep); - pte_num = napot_pte_num(napot_cont_order(pte)); + pte_num = arch_contpte_get_num_contig(ptep, 0, NULL); return get_clear_contig_flush(vma->vm_mm, addr, ptep, pte_num); } @@ -351,6 +348,7 @@ void huge_pte_clear(struct mm_struct *mm, pte_t *ptep, unsigned long sz) { + size_t pgsize; pte_t pte = ptep_get(ptep); int i, pte_num; @@ -359,8 +357,8 @@ void huge_pte_clear(struct mm_struct *mm, return; } - pte_num = napot_pte_num(napot_cont_order(pte)); - for (i = 0; i < pte_num; i++, addr += PAGE_SIZE, ptep++) + pte_num = arch_contpte_get_num_contig(ptep, 0, &pgsize); + for (i = 0; i < pte_num; i++, addr += pgsize, ptep++) pte_clear(mm, addr, ptep); } -- 2.39.2