Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp3126736pxb; Sat, 9 Oct 2021 02:28:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy6Ympjyl3qDTmyM9or4b0izmep8sjl+ZqaQWxPmPZXz7xtf44O9Y7bfyg2LDr5S7Kx/EUF X-Received: by 2002:a17:90b:1a86:: with SMTP id ng6mr17097810pjb.120.1633771686838; Sat, 09 Oct 2021 02:28:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633771686; cv=none; d=google.com; s=arc-20160816; b=CsA6LqJbKJ8/pY+JP8V+kMQTWP8LlBtM1FVsA6S/auq6AaIgINjXwUIIYozI/EpJRG nrGr4wk3JwD+Lru6hB/ApXzv24R405JUYcUTJcrMg5xxEuOyYOQ/umP477yXxpt+GZFF mFS0TpUsDrMuZwm40u9ZvGtXs5Eg7GNlKiCa44zy70RaGe1oY/W5gvZaA6m/j83QSZJ7 Ne7zWWenCOccT6/l0qDPDRRL7JDYdUgZZSRM1J6VB2P6o+S1R1pvb5IkS/MY7RiVgor2 u5pj0jgNI6PNhfdpS1EEbBxNilh+U430HSOLxhznqQRqM0Y0Mcb+ETJ8qcqX2IEZjlvp B8WQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=BtVcxXqsne5qC79FCsRftchrFklpwff8K6tV1NCwjgg=; b=fiETgYdxc+4WF79Nj9l7sPwrFqMmtItK52ePgOPTRCirBhyVDWIgQwnuY+sdM2Nc57 VAu77aHwmAp8m+nDdCqNSAozt419nMfJHpcDQEtj35LeUEkmtXJLnIAiRbpyROYYYqP4 XaRaA4ucU19sCebCOLuellHwyagEDyPdTJD8Hwgyl8ilCv0kQqMj4hPsWAluS7xl1rPz 1+lnN/sKSB6zVP9TCbg6QHfD02YXGkhQ0mpqaaNNSNWrPIDtuSH5pV7AglV4kNdoO/Nw DdxkWQ+YHT0VfLfKwaC2uBApTkGrLnqhLwCCO07KuD41CGZHLzmQoRD8Hw2EecrtQARI taZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m1si2240691pfc.190.2021.10.09.02.27.54; Sat, 09 Oct 2021 02:28:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244364AbhJIJ3G (ORCPT + 99 others); Sat, 9 Oct 2021 05:29:06 -0400 Received: from out30-45.freemail.mail.aliyun.com ([115.124.30.45]:33624 "EHLO out30-45.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231698AbhJIJ3A (ORCPT ); Sat, 9 Oct 2021 05:29:00 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=12;SR=0;TI=SMTPD_---0Ur5qOcb_1633771618; Received: from localhost.localdomain(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0Ur5qOcb_1633771618) by smtp.aliyun-inc.com(127.0.0.1); Sat, 09 Oct 2021 17:27:00 +0800 From: Rongwei Wang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Cc: akpm@linux-foundation.org, willy@infradead.org, viro@zeniv.linux.org.uk, song@kernel.org, william.kucharski@oracle.com, hughd@google.com, shy828301@gmail.com, linmiaohe@huawei.com, peterx@redhat.com Subject: [PATCH 1/3] mm, thp: support binaries transparent use of file THP Date: Sat, 9 Oct 2021 17:26:56 +0800 Message-Id: <20211009092658.59665-2-rongwei.wang@linux.alibaba.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211009092658.59665-1-rongwei.wang@linux.alibaba.com> References: <20211009092658.59665-1-rongwei.wang@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The file THP for .text is not convenient to use at present. Applications need to explicitly madvise MADV_HUGEPAGE for .text, which is not friendly for tasks in the production environment. This patch extends READ_ONLY_THP_FOR_FS, introduces a new sysfs interface: hugetext_enabled, to make the read-only file-backed pages THPeligible proactively and transparently. Compared with original design, It not depend on 'madvise()' any more. And because of 'hugetext_enabled' introduced, users are no longer limited to 'enabled' setting (e.g., always, madvise and never). There are two methods to enable or disable this feature: To enable hugetext: 1. echo 1 > /sys/kernel/mm/transparent_hugepage/hugetext_enabled 2. hugetext=1 in boot cmdline To disable hugetext: 1. echo 0 > /sys/kernel/mm/transparent_hugepage/hugetext_enabled 2. hugetext=0 in boot cmdline This feature is disabled by default. Signed-off-by: Gang Deng Signed-off-by: Xu Yu Signed-off-by: Rongwei Wang --- include/linux/huge_mm.h | 24 ++++++++++++++++ include/linux/khugepaged.h | 9 ++++++ mm/Kconfig | 11 ++++++++ mm/huge_memory.c | 57 ++++++++++++++++++++++++++++++++++++++ mm/khugepaged.c | 4 +++ mm/memory.c | 12 ++++++++ 6 files changed, 117 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index f123e15d966e..95b718031ef3 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -87,6 +87,9 @@ enum transparent_hugepage_flag { TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG, TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG, +#ifdef CONFIG_HUGETEXT + TRANSPARENT_HUGEPAGE_HUGETEXT_ENABLED_FLAG, +#endif }; struct kobject; @@ -140,6 +143,27 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, return true; } +#ifdef CONFIG_HUGETEXT +#define hugetext_enabled() \ + (transparent_hugepage_flags & \ + (1<vm_file && !inode_is_open_for_write(vma->vm_file->f_inode)) + return IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff, + HPAGE_PMD_NR); + + return false; +} + /* * to be used on vmas which are known to support THP. * Use transparent_hugepage_active otherwise diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 2fcc01891b47..ad56f75a2fda 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -26,10 +26,18 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm, } #endif +#ifdef CONFIG_HUGETEXT +#define khugepaged_enabled() \ + (transparent_hugepage_flags & \ + ((1<vm_mm->flags)) if ((khugepaged_always() || (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) || + (hugetext_enabled() && vma_is_hugetext(vma, vm_flags)) || (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) && !(vm_flags & VM_NOHUGEPAGE) && !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) diff --git a/mm/Kconfig b/mm/Kconfig index d16ba9249bc5..5aa3fa86e7b1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -868,6 +868,17 @@ config READ_ONLY_THP_FOR_FS support of file THPs will be developed in the next few release cycles. +config HUGETEXT + bool "THP for text segments" + depends on READ_ONLY_THP_FOR_FS + + help + Allow khugepaged to put read-only file-backed pages, including + shared libraries, as well as the anonymous and executable pages + in THP. + + This feature builds on and extends READ_ONLY_THP_FOR_FS. + config ARCH_HAS_PTE_SPECIAL bool diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5e9ef0fc261e..f6fffb5c5130 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -330,6 +330,35 @@ static ssize_t hpage_pmd_size_show(struct kobject *kobj, static struct kobj_attribute hpage_pmd_size_attr = __ATTR_RO(hpage_pmd_size); +#ifdef CONFIG_HUGETEXT +static ssize_t hugetext_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return single_hugepage_flag_show(kobj, attr, buf, + TRANSPARENT_HUGEPAGE_HUGETEXT_ENABLED_FLAG); +} + +static ssize_t hugetext_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + ssize_t ret = count; + + ret = single_hugepage_flag_store(kobj, attr, buf, count, + TRANSPARENT_HUGEPAGE_HUGETEXT_ENABLED_FLAG); + + if (ret > 0) { + int err = start_stop_khugepaged(); + + if (err) + ret = err; + } + + return ret; +} +struct kobj_attribute hugetext_enabled_attr = + __ATTR(hugetext_enabled, 0644, hugetext_enabled_show, hugetext_enabled_store); +#endif /* CONFIG_HUGETEXT */ + static struct attribute *hugepage_attr[] = { &enabled_attr.attr, &defrag_attr.attr, @@ -337,6 +366,9 @@ static struct attribute *hugepage_attr[] = { &hpage_pmd_size_attr.attr, #ifdef CONFIG_SHMEM &shmem_enabled_attr.attr, +#endif +#ifdef CONFIG_HUGETEXT + &hugetext_enabled_attr.attr, #endif NULL, }; @@ -491,6 +523,31 @@ static int __init setup_transparent_hugepage(char *str) } __setup("transparent_hugepage=", setup_transparent_hugepage); +#ifdef CONFIG_HUGETEXT +static int __init setup_hugetext(char *str) +{ + int ret = 0; + + if (!str) + goto out; + if (!strcmp(str, "1")) { + set_bit(TRANSPARENT_HUGEPAGE_HUGETEXT_ENABLED_FLAG, + &transparent_hugepage_flags); + ret = 1; + } else if (!strcmp(str, "0")) { + clear_bit(TRANSPARENT_HUGEPAGE_HUGETEXT_ENABLED_FLAG, + &transparent_hugepage_flags); + ret = 1; + } + +out: + if (!ret) + pr_warn("hugetext= cannot parse, ignored\n"); + return ret; +} +__setup("hugetext=", setup_hugetext); +#endif + pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { if (likely(vma->vm_flags & VM_WRITE)) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 045cc579f724..2810bc1962b3 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -451,6 +451,10 @@ static bool hugepage_vma_check(struct vm_area_struct *vma, HPAGE_PMD_NR); } + /* Make hugetext independent of THP settings */ + if (hugetext_enabled() && vma_is_hugetext(vma, vm_flags)) + return true; + /* THP settings require madvise. */ if (!(vm_flags & VM_HUGEPAGE) && !khugepaged_always()) return false; diff --git a/mm/memory.c b/mm/memory.c index adf9b9ef8277..b0d0889af6ab 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -73,6 +73,7 @@ #include #include #include +#include #include @@ -4157,6 +4158,17 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf) struct vm_area_struct *vma = vmf->vma; vm_fault_t ret = 0; +#ifdef CONFIG_HUGETEXT + /* Add the candidate hugetext vma into khugepaged scan list */ + if (pmd_none(*vmf->pmd) && hugetext_enabled() + && vma_is_hugetext(vma, vma->vm_flags)) { + unsigned long haddr = vmf->address & HPAGE_PMD_MASK; + + if (transhuge_vma_suitable(vma, haddr)) + khugepaged_enter(vma, vma->vm_flags); + } +#endif + /* * Let's call ->map_pages() first and use ->fault() as fallback * if page by the offset is not ready to be mapped (cold cache or -- 2.27.0