Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp6030873iob; Tue, 10 May 2022 08:51:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWD4qxNNJUY21GpDOAaseMhH8DQXWxT+E6tsOGEIda7HYaKdhr8VWKHRSbCP1P1KCWiMQv X-Received: by 2002:a17:907:6d9e:b0:6f9:b861:828e with SMTP id sb30-20020a1709076d9e00b006f9b861828emr12238096ejc.427.1652197862025; Tue, 10 May 2022 08:51:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652197862; cv=none; d=google.com; s=arc-20160816; b=jaqyd6tl8m0Gwj/JeB6NFr9CGEjvNXe7PtS7zW4DsSTavyVk4rn059EyResJ9YNHqF vfhOtqKHN+4keHHGNEW60wRMsfM/MIWFb2fiGz35sbhi7FPitCk/4xYUevdBPhj09LqT FRfsR3qa0EwDAogjIMIOUM9rO4JAw8iydiRDHm2Bp+7xw3Clsf/oz44VWvB12/8yMn/v Nzb5ZlQB6yorPiUuug63HVUT/1F5zYIlcpvY/lQLwX3Zp5oEEC03QGoA2u3BfK1gZAuc 5wxF7ZHeFK4zYKVeleyqHT0QwmdkaM1F+dL4ND83pZuGxs/4yn5pqFxwe1qT3H4469hz I1dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=jUn/XcQte1+YKQ1/fBhkzirtoSXJDM63v5+XTC+eQm8=; b=lUt0VtdYkYZ6W1HosUYjphuf9YEP/HU38r215w+Wpv3hHhFc6TvjgDZztkbPUrepHa f2FICIJGD8VFN9rLbq8YTcboYtfKMp6FC/kiqF4fzEWJ2+cJXsBURyoPTOrnYkS9mwIz toYQ6IxW+TAXP+T/r4c/3TuNigWohliL0b5gTPe2QjzhGkOWUqp1zsam0NJD+D3i5Ctd eFU+Ieuu3NsF/rL3JpEgJxwafsb2f2KRN7k6QdLC4Lup5tUKHnD6r6gemd3cdc2DWq1t 4e5sERMNzT7SVL3tFtjASTEmZ7IQsl/J5G3qb4RvQiJx6J99DQ6Z7KfloumccJHJ56ec Sb8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=DgtMpAac; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gx24-20020a1709068a5800b006e7f5f78446si16652779ejc.241.2022.05.10.08.50.37; Tue, 10 May 2022 08:51:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=DgtMpAac; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241675AbiEJM04 (ORCPT + 99 others); Tue, 10 May 2022 08:26:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241328AbiEJM0z (ORCPT ); Tue, 10 May 2022 08:26:55 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06AB014085B; Tue, 10 May 2022 05:22:58 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id a11so14801343pff.1; Tue, 10 May 2022 05:22:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=jUn/XcQte1+YKQ1/fBhkzirtoSXJDM63v5+XTC+eQm8=; b=DgtMpAac3k9bfyxUkQ6yRKX0pzlJa5x4fa4kUJl+yBH99MPF6kVeu2kxzlxvxEtWZ9 Fwv6R/VcgPw8vFGs1cbczhgTWYF04byFteIyx7UnEC3o9X3knDLD8FDLEJDPCF4uDB9r CifR8+keWO5thkPklRZUaYQeTZMPbG0ywGE5ngydFyp2d7Oq4HOrKC/SP/eBk3qt0PAJ tgUL821Mzd7bxzrrEjYDm0kYMo8cr63lmMToUdv1049vCiHrJX+Rma2BVoLOEy+AXocB GZBQDYA1x8jEZRY2g1Gez9UdVaxri4PsUa63rniW0LOHKczOIpUnSdzOk5HSdPxkuPDV rIsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=jUn/XcQte1+YKQ1/fBhkzirtoSXJDM63v5+XTC+eQm8=; b=CHBklOjMWs/XCGnL6Fs26Qa58gb2YDTh3Gm7MNhCI3VwLdqKYUf/+ZegUpPq7UAFS0 6LU+0GBFQeRkEI8k0yuYoAyv7/g4mDkF4lm9kZzQOeyD9MyUiVZa2jV42BmqFXBVh0cl tsQJrjSvolUk5uwSJcJiNvwGoVbzYXVl5aLQKwWZIydEmxUZM3QFRw/rc+pUbJbsrdTu JKATL98I0fICDmQtXTGNyXCFA0ubzCm6VyEYs9eGAbyATo81/UML8GLD3k6gfg11I3p/ UXf0XxKmpqnnq1AP0lNC70v6jnWmrMHOpa04V3VWJJ6/QJHg6VUz6303aXe70yYqr8DS 0qfQ== X-Gm-Message-State: AOAM531+X9VCVq9+SzAAbW+UIGsemA/nGlojR5zEZpvPOLFI5ZFoLiyq x6VanvbgkOo2ql/afMRBjN0= X-Received: by 2002:a05:6a00:ad0:b0:50a:51b3:1e3d with SMTP id c16-20020a056a000ad000b0050a51b31e3dmr20491133pfl.18.1652185377453; Tue, 10 May 2022 05:22:57 -0700 (PDT) Received: from localhost.localdomain ([193.203.214.57]) by smtp.gmail.com with ESMTPSA id ik19-20020a170902ab1300b0015e8d4eb297sm1827946plb.225.2022.05.10.05.22.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 May 2022 05:22:57 -0700 (PDT) From: cgel.zte@gmail.com X-Google-Original-From: xu.xin16@zte.com.cn To: akpm@linux-foundation.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, corbet@lwn.net, xu xin , Yang Yang , Ran Xiaokai , wangyong , Yunkai Zhang , Matthew Wilcox Subject: [PATCH v6] mm/ksm: introduce ksm_force for each process Date: Tue, 10 May 2022 12:22:42 +0000 Message-Id: <20220510122242.1380536-1-xu.xin16@zte.com.cn> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: xu xin To use KSM, we have to explicitly call madvise() in application code, which means installed apps on OS needs to be uninstall and source code needs to be modified. It is inconvenient. In order to change this situation, We add a new proc file ksm_force under /proc// to support turning on/off KSM scanning of a process's mm dynamically. If ksm_force is set to 1, force all anonymous and 'qualified' VMAs of this mm to be involved in KSM scanning without explicitly calling madvise to mark VMA as MADV_MERGEABLE. But It is effective only when the klob of /sys/kernel/mm/ksm/run is set as 1. If ksm_force is set to 0, cancel the feature of ksm_force of this process (fallback to the default state) and unmerge those merged pages belonging to VMAs which is not madvised as MADV_MERGEABLE of this process, but still leave MADV_MERGEABLE areas merged. Signed-off-by: xu xin Reviewed-by: Yang Yang Reviewed-by: Ran Xiaokai Reviewed-by: wangyong Reviewed-by: Yunkai Zhang Suggested-by: Matthew Wilcox --- v6: - modify the way of "return" - remove unnecessary words in Documentation/admin-guide/mm/ksm.rst - add additional notes to "set 0 to ksm_force" in Documentation/../ksm.rst and Documentation/../proc.rst v5: - fix typos in Documentation/filesystem/proc.rst v4: - fix typos in commit log - add interface descriptions under Documentation/ v3: - fix compile error of mm/ksm.c v2: - fix a spelling error in commit log. - remove a redundant condition check in ksm_force_write(). --- Documentation/admin-guide/mm/ksm.rst | 19 +++++- Documentation/filesystems/proc.rst | 17 +++++ fs/proc/base.c | 93 ++++++++++++++++++++++++++++ include/linux/mm_types.h | 9 +++ mm/ksm.c | 32 +++++++++- 5 files changed, 167 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst index b244f0202a03..8cabc2504005 100644 --- a/Documentation/admin-guide/mm/ksm.rst +++ b/Documentation/admin-guide/mm/ksm.rst @@ -32,7 +32,7 @@ are swapped back in: ksmd must rediscover their identity and merge again). Controlling KSM with madvise ============================ -KSM only operates on those areas of address space which an application +KSM can operates on those areas of address space which an application has advised to be likely candidates for merging, by using the madvise(2) system call:: @@ -70,6 +70,23 @@ Applications should be considerate in their use of MADV_MERGEABLE, restricting its use to areas likely to benefit. KSM's scans may use a lot of processing power: some installations will disable KSM for that reason. +Controlling KSM with procfs +=========================== + +KSM can also operate on anonymous areas of address space of those processes's +knob ``/proc//ksm_force`` is on, even if app codes doesn't call madvise() +explicitly to advise specific areas as MADV_MERGEABLE. + +You can set ksm_force to 1 to force all anonymous and qualified VMAs of +this process to be involved in KSM scanning. + e.g. ``echo 1 > /proc//ksm_force`` + +You can also set ksm_force to 0 to cancel that force feature of this process +and unmerge those merged pages which belongs to those VMAs not marked as +MADV_MERGEABLE of this process. But that still leave those pages belonging to +VMAs marked as MADV_MERGEABLE merged (fallback to the default state). + e.g. ``echo 0 > /proc//ksm_force`` + .. _ksm_sysfs: KSM daemon sysfs interface diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 061744c436d9..8890b8b457a4 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold June 9 2009 3.10 /proc//timerslack_ns - Task timerslack value 3.11 /proc//patch_state - Livepatch patch operation state 3.12 /proc//arch_status - Task architecture specific information + 3.13 /proc//ksm_force - Setting of mandatory involvement in KSM 4 Configuring procfs 4.1 Mount options @@ -2176,6 +2177,22 @@ AVX512_elapsed_ms the task is unlikely an AVX512 user, but depends on the workload and the scheduling scenario, it also could be a false negative mentioned above. +3.13 /proc//ksm_force - Setting of mandatory involvement in KSM +----------------------------------------------------------------------- +When CONFIG_KSM is enabled, this file can be used to specify if this +process's anonymous memory can be involved in KSM scanning without app codes +explicitly calling madvise to mark memory address as MADV_MERGEABLE. + +If writing 1 to this file, the kernel will force all anonymous and qualified +memory to be involved in KSM scanning without explicitly calling madvise to +mark memory address as MADV_MERGEABLE. But that is effective only when the +klob of '/sys/kernel/mm/ksm/run' is set as 1. + +If writing 0 to this file, the mandatory KSM feature of this process's will +be cancelled and unmerge those merged pages which belongs to those areas not +marked as MADV_MERGEABLE of this process, but leave those pages belonging to +areas marked as MADV_MERGEABLE merged (fallback to the default state). + Chapter 4: Configuring procfs ============================= diff --git a/fs/proc/base.c b/fs/proc/base.c index 8dfa36a99c74..d60f7342f79e 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -96,6 +96,7 @@ #include #include #include +#include #include #include "internal.h" #include "fd.h" @@ -3168,6 +3169,96 @@ static int proc_pid_ksm_merging_pages(struct seq_file *m, struct pid_namespace * return 0; } + +static ssize_t ksm_force_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos) +{ + struct task_struct *task; + struct mm_struct *mm; + char buffer[PROC_NUMBUF]; + ssize_t len; + int ret; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + + mm = get_task_mm(task); + ret = 0; + if (mm) { + len = snprintf(buffer, sizeof(buffer), "%d\n", mm->ksm_force); + ret = simple_read_from_buffer(buf, count, ppos, buffer, len); + mmput(mm); + } + + return ret; +} + +static ssize_t ksm_force_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + struct mm_struct *mm; + char buffer[PROC_NUMBUF]; + int force; + int err = 0; + + memset(buffer, 0, sizeof(buffer)); + if (count > sizeof(buffer) - 1) + count = sizeof(buffer) - 1; + if (copy_from_user(buffer, buf, count)) + return -EFAULT; + + err = kstrtoint(strstrip(buffer), 0, &force); + if (err) + return err; + + if (force != 0 && force != 1) + return -EINVAL; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + + mm = get_task_mm(task); + if (!mm) + goto out_put_task; + + if (mm->ksm_force != force) { + if (mmap_write_lock_killable(mm)) { + err = -EINTR; + goto out_mmput; + } + + if (force == 0) + mm->ksm_force = force; + else { + /* + * Force anonymous pages of this mm to be involved in KSM merging + * without explicitly calling madvise. + */ + if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) + err = __ksm_enter(mm); + if (!err) + mm->ksm_force = force; + } + + mmap_write_unlock(mm); + } + +out_mmput: + mmput(mm); +out_put_task: + put_task_struct(task); + + return err < 0 ? err : count; +} + +static const struct file_operations proc_pid_ksm_force_operations = { + .read = ksm_force_read, + .write = ksm_force_write, + .llseek = generic_file_llseek, +}; #endif /* CONFIG_KSM */ #ifdef CONFIG_STACKLEAK_METRICS @@ -3303,6 +3394,7 @@ static const struct pid_entry tgid_base_stuff[] = { #endif #ifdef CONFIG_KSM ONE("ksm_merging_pages", S_IRUSR, proc_pid_ksm_merging_pages), + REG("ksm_force", S_IRUSR|S_IWUSR, proc_pid_ksm_force_operations), #endif }; @@ -3639,6 +3731,7 @@ static const struct pid_entry tid_base_stuff[] = { #endif #ifdef CONFIG_KSM ONE("ksm_merging_pages", S_IRUSR, proc_pid_ksm_merging_pages), + REG("ksm_force", S_IRUSR|S_IWUSR, proc_pid_ksm_force_operations), #endif }; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index b34ff2cdbc4f..1b1592c2f5cf 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -661,6 +661,15 @@ struct mm_struct { * merging. */ unsigned long ksm_merging_pages; + /* + * If true, force anonymous pages of this mm to be involved in KSM + * merging without explicitly calling madvise. It is effctive only + * when the klob of '/sys/kernel/mm/ksm/run' is set as 1. If false, + * cancel the feature of ksm_force of this process and unmerge + * those merged pages which is not madvised as MERGEABLE of this + * process, but leave MERGEABLE areas merged. + */ + bool ksm_force; #endif } __randomize_layout; diff --git a/mm/ksm.c b/mm/ksm.c index 38360285497a..c9f672dcc72e 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -334,6 +334,34 @@ static void __init ksm_slab_free(void) mm_slot_cache = NULL; } +/* Check if vma is qualified for ksmd scanning */ +static bool ksm_vma_check(struct vm_area_struct *vma) +{ + unsigned long vm_flags = vma->vm_flags; + + if (!(vma->vm_flags & VM_MERGEABLE) && !(vma->vm_mm->ksm_force)) + return false; + + if (vm_flags & (VM_SHARED | VM_MAYSHARE | + VM_PFNMAP | VM_IO | VM_DONTEXPAND | + VM_HUGETLB | VM_MIXEDMAP)) + return false; /* just ignore this vma*/ + + if (vma_is_dax(vma)) + return false; + +#ifdef VM_SAO + if (vm_flags & VM_SAO) + return false; +#endif +#ifdef VM_SPARC_ADI + if (vm_flags & VM_SPARC_ADI) + return false; +#endif + + return true; +} + static __always_inline bool is_stable_node_chain(struct stable_node *chain) { return chain->rmap_hlist_len == STABLE_NODE_CHAIN; @@ -523,7 +551,7 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm, if (ksm_test_exit(mm)) return NULL; vma = vma_lookup(mm, addr); - if (!vma || !(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) + if (!vma || !ksm_vma_check(vma) || !vma->anon_vma) return NULL; return vma; } @@ -2297,7 +2325,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) vma = find_vma(mm, ksm_scan.address); for (; vma; vma = vma->vm_next) { - if (!(vma->vm_flags & VM_MERGEABLE)) + if (!ksm_vma_check(vma)) continue; if (ksm_scan.address < vma->vm_start) ksm_scan.address = vma->vm_start; -- 2.25.1