Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2158206pxa; Mon, 17 Aug 2020 02:24:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJywzNkV9wc88QRi4DUPWckeVtq8M+0+luDGdNIoJuhRdlVKyBfNk+kgCa6+ddoIxUeNHyLS X-Received: by 2002:a17:906:5a8a:: with SMTP id l10mr13552175ejq.397.1597656262984; Mon, 17 Aug 2020 02:24:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597656262; cv=none; d=google.com; s=arc-20160816; b=juKMZuUR2D6J9qJbD6PyKeo+Sy50rrSNEgYad8R0r6zpZV5AQ01TsmPV/rkMVhXIJN Iw/cRv38nAvwQEn/0qI2XhQqzL41qWcfxbHUb4oFc22JmcP56qDDS4Ohywjwg4sMwfSf pj8DN5ScyYlRQ7XsfRS2s8f2ZQF+QhAj4PONL/oQMp5UhEz9usNHlSyVdbI5h0nnlK/F Cv3h7tHDiU3cme7WUSTwWOwptB92D3WLrHjGQIQEfsDsMrhOM8YF4aYy5agpA9WtHFzr QHprpMlkU0fUEn4g6zYHVIIJJ/Kue8sd8e3VtnpnvQA3CnRd9+zbSzoxQTU0ab8Qvgbf BXHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=yNjF/YXyn1zDS444rW+BRcD4YHN8RJoOs6YLXqmi+KQ=; b=DmK0V3shFwxr03SjkzGaCj4nwH6Xx5hmf7uLzM5VTbafbLq8kwy+z1/oyF0NaTgzVH EqQexoukFHND2G2WYDSvZQvJ68E57VGyxYJMiK+p/v3CW6c0ku8jR3E+G5yyCxLrrvqH i51yhcTuvzdVNyJikMeF1DFUPYgdd35uq9QEHi/bOL0WkoVoRp3Qpj+kwRpwdNu/Gfb5 XuFbU7N6I/+07pHpso2DjyJU2uw7o9Knr5CgogmKbLyP4uZPw6gArub2BhNnOdbHHvDQ 7sBGi/xLPbH5J+fDLL8WEkJdixuLortTbGBtWTypYBN8frnGxNifCqqc7y/YENQG5UIC WObw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w7si11890248edl.234.2020.08.17.02.24.00; Mon, 17 Aug 2020 02:24:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728966AbgHQJUj (ORCPT + 99 others); Mon, 17 Aug 2020 05:20:39 -0400 Received: from foss.arm.com ([217.140.110.172]:51636 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728400AbgHQJUf (ORCPT ); Mon, 17 Aug 2020 05:20:35 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 551021045; Mon, 17 Aug 2020 02:20:34 -0700 (PDT) Received: from p8cg001049571a15.arm.com (unknown [10.163.65.199]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 464803F6CF; Mon, 17 Aug 2020 02:20:31 -0700 (PDT) From: Anshuman Khandual To: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Cc: catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, Anshuman Khandual , Mark Rutland , Marc Zyngier , Suzuki Poulose , linux-kernel@vger.kernel.org Subject: [PATCH 1/2] arm64/mm: Change THP helpers to comply with generic MM semantics Date: Mon, 17 Aug 2020 14:49:43 +0530 Message-Id: <1597655984-15428-2-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1597655984-15428-1-git-send-email-anshuman.khandual@arm.com> References: <1597655984-15428-1-git-send-email-anshuman.khandual@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org pmd_present() and pmd_trans_huge() are expected to behave in the following manner during various phases of a given PMD. It is derived from a previous detailed discussion on this topic [1] and present THP documentation [2]. pmd_present(pmd): - Returns true if pmd refers to system RAM with a valid pmd_page(pmd) - Returns false if pmd does not refer to system RAM - Invalid pmd_page(pmd) pmd_trans_huge(pmd): - Returns true if pmd refers to system RAM and is a trans huge mapping ------------------------------------------------------------------------- | PMD states | pmd_present | pmd_trans_huge | ------------------------------------------------------------------------- | Mapped | Yes | Yes | ------------------------------------------------------------------------- | Splitting | Yes | Yes | ------------------------------------------------------------------------- | Migration/Swap | No | No | ------------------------------------------------------------------------- The problem: PMD is first invalidated with pmdp_invalidate() before it's splitting. This invalidation clears PMD_SECT_VALID as below. PMD Split -> pmdp_invalidate() -> pmd_mkinvalid -> Clears PMD_SECT_VALID Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re- affirm pmd_present() as true during the THP split process. To comply with above mentioned semantics, pmd_trans_huge() should also check pmd_present() first before testing presence of an actual transparent huge mapping. The solution: Ideally PMD_TYPE_SECT should have been used here instead. But it shares the bit position with PMD_SECT_VALID which is used for THP invalidation. Hence it will not be there for pmd_present() check after pmdp_invalidate(). A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD entry during invalidation which can help pmd_present() return true and in recognizing the fact that it still points to memory. This bit is transient. During the split process it will be overridden by a page table page representing normal pages in place of erstwhile huge page. Other pmdp_invalidate() callers always write a fresh PMD value on the entry overriding this transient PMD_PRESENT_INVALID bit, which makes it safe. [1]: https://lkml.org/lkml/2018/10/17/231 [2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txt Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: Marc Zyngier Cc: Suzuki Poulose Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/include/asm/pgtable-prot.h | 7 ++++++ arch/arm64/include/asm/pgtable.h | 34 ++++++++++++++++++++++++--- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 4d867c6446c4..28792fdd9627 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -19,6 +19,13 @@ #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) #define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */ +/* + * This help indicate that the entry is present i.e pmd_page() + * still points to a valid huge page in memory even if the pmd + * has been invalidated. + */ +#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */ + #ifndef __ASSEMBLY__ #include diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index d5d3fbe73953..7aa69cace784 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -145,6 +145,18 @@ static inline pte_t set_pte_bit(pte_t pte, pgprot_t prot) return pte; } +static inline pmd_t clr_pmd_bit(pmd_t pmd, pgprot_t prot) +{ + pmd_val(pmd) &= ~pgprot_val(prot); + return pmd; +} + +static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot) +{ + pmd_val(pmd) |= pgprot_val(prot); + return pmd; +} + static inline pte_t pte_wrprotect(pte_t pte) { pte = clear_pte_bit(pte, __pgprot(PTE_WRITE)); @@ -363,15 +375,24 @@ static inline int pmd_protnone(pmd_t pmd) } #endif +#define pmd_present_invalid(pmd) (!!(pmd_val(pmd) & PMD_PRESENT_INVALID)) + +static inline int pmd_present(pmd_t pmd) +{ + return pte_present(pmd_pte(pmd)) || pmd_present_invalid(pmd); +} + /* * THP definitions. */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define pmd_trans_huge(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) +static inline int pmd_trans_huge(pmd_t pmd) +{ + return pmd_val(pmd) && pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT); +} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#define pmd_present(pmd) pte_present(pmd_pte(pmd)) #define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) #define pmd_young(pmd) pte_young(pmd_pte(pmd)) #define pmd_valid(pmd) pte_valid(pmd_pte(pmd)) @@ -381,7 +402,14 @@ static inline int pmd_protnone(pmd_t pmd) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) -#define pmd_mkinvalid(pmd) (__pmd(pmd_val(pmd) & ~PMD_SECT_VALID)) + +static inline pmd_t pmd_mkinvalid(pmd_t pmd) +{ + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); + pmd = clr_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); + + return pmd; +} #define pmd_thp_or_huge(pmd) (pmd_huge(pmd) || pmd_trans_huge(pmd)) -- 2.20.1