Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp112468iob; Tue, 17 May 2022 20:37:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyhBM/Np+KPyXsQ1UrPA3HBRLCZUhcmLbx8QJOHE9y2IXjhFZyHiZlJTZe6ePQo1H/s2Ao2 X-Received: by 2002:a05:6a00:f9a:b0:518:2dde:6693 with SMTP id ct26-20020a056a000f9a00b005182dde6693mr345839pfb.29.1652845060246; Tue, 17 May 2022 20:37:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652845060; cv=none; d=google.com; s=arc-20160816; b=brneKhg6fkvPzLIMDwxldec2PuRvH4Hliu+JBB+MAAJ9p+LV4kXo89MNQHp7yggkLs QmhUiKRUWLHl8X2AzAxHYMg6AwYOfrXqCJenDO2iFX08TYdlKmmnRRQzxWUw2/NI2Nl+ e1WTbzCzFtYYeYQx1CeUT9GIgZI5P3XE39lDnZZPfZESHGyuV0UEej0mPZFKQVEUFJyf uciJHq3Sd5aj6jk3bgasn8DSz0nGc5I3Z/yxCAQCkws/+7Ko4TmmCtL2s1uFgEAnsHLB Gf8yycMvyeDAW8ViJnwX5KVFcqNKkT3vDMzEZ2HI9jpD18JTMCA4+kZ2FLQlsA0H3Czv bJmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=sCdB8SY5NgwEYS11KPsEvbneXbAkyfzNn3ogCW0CDBo=; b=Y06Di0T4SbOYVfOi5UC1CMpOVoEZiSOYNFm+jvu55GiOjMLbDdEFirGIZ0OOVXEahQ M8bOUxyODI+Kti1SKtKcqISGJMY+10h/NJxX5HgDxi8ZEzuwXK+WFgE0mELZL6FqrEGh atzPSVuSUlpy72epLifDxBpIxKZFUWpb3oFrmIP1BZ/DNhbXwuh+DfGzpNv5arUmK6fP eNkZwKbZpsJIUZ86PufbbMu4U2q707IYPyhIz9HQoaRJ+zoTI5Q4BP+UyHrccJGRbuxv TigzKWc1QhUcFkd+U/xkJWVPTT83ADSAE7tXh+Hyv3cR1kLzebA4yyQAa6PNsVaju4gY qglA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=fIKPoB3y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id t17-20020a170902e85100b0015835fea6dbsi1276712plg.403.2022.05.17.20.37.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 May 2022 20:37:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=fIKPoB3y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 84DAF644F3; Tue, 17 May 2022 20:26:30 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232584AbiEQJ1m (ORCPT + 99 others); Tue, 17 May 2022 05:27:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244909AbiEQJ1Y (ORCPT ); Tue, 17 May 2022 05:27:24 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91BC0396AE for ; Tue, 17 May 2022 02:27:08 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id n8so16828775plh.1 for ; Tue, 17 May 2022 02:27:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=sCdB8SY5NgwEYS11KPsEvbneXbAkyfzNn3ogCW0CDBo=; b=fIKPoB3yoJzgBWBcwM+JBQN/pMWjaFfpT2mJnHanzoJyhSBZ3223oPRPnqJPE8Vm6Z svEm7jbeHwgzYG0tqKXV6yl7gqVThQpEXrbYf8TscOddXHKbyVGUyrJjiGbrABw9mZBX X+POGJ9oK9eLuLbEpPZ4D1AltOlJYfpdTLXAwlX95bZ3xXn5OQwlxyt/5maQ6u2EPZEN BJsSzckIAO58uvDmCE1RDbMH9Q2+14dAkB2t3fFT3QwPslee8qwLrY7CekcDS/sLLu8S rbr/xyQvsVLBNb32egFeQmJZ2VSlEo2juwgK478l+ZpLFTLC/Iqo2bwuaicBEVMtmaIp 59cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=sCdB8SY5NgwEYS11KPsEvbneXbAkyfzNn3ogCW0CDBo=; b=6VGDqhBROnWGA6rX9TfGm4zUwT30xdYTJgZ0FPYuVzi9lUMsk7Rw0nocuogPJ/2fjN oOSTeKeoHe0xcxLFvvZNEpEdwIlWg9L6d/1RM7hBllmvU+I9PdrRSUiHz0GGcROZliHR MSnQS2Rn08ngQeHTUeJTWEaKEZhkUPd+Qx4Xqdy2kROQruEq3IlfyLf5UzHMWLe1kLC2 Nur72kxkaHt3SYUaEkHcFXB/i7dqj11HMtCAMvz5sKQ+hWy9aByRuOdW/ahrJ/L3R4SH slYv3fEFQLsW52DwQ6SlP5HZWmnj+BK2tLy9ROhAp+W5y4IhtUXDaa6lcIZUh9nWeCNr uIuw== X-Gm-Message-State: AOAM532MpfKQuHUmqzokj93v6svGd3QTYsyMCyOv3q8viTMwAwinCrSl GLtiV1kvBkHeEBAB218KpY4= X-Received: by 2002:a17:902:d191:b0:161:5c4f:ce9e with SMTP id m17-20020a170902d19100b001615c4fce9emr13516050plb.159.1652779628003; Tue, 17 May 2022 02:27:08 -0700 (PDT) Received: from localhost.localdomain ([193.203.214.57]) by smtp.gmail.com with ESMTPSA id by13-20020a056a00400d00b0050dc76281aasm8184298pfb.132.2022.05.17.02.27.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 May 2022 02:27:07 -0700 (PDT) From: cgel.zte@gmail.com X-Google-Original-From: xu.xin16@zte.com.cn To: akpm@linux-foundation.org Cc: ammarfaizi2@gnuweeb.org, oleksandr@natalenko.name, willy@infradead.org, linux-mm@kvack.org, corbet@lwn.net, linux-kernel@vger.kernel.org, xu xin , Yang Yang , Ran Xiaokai , wangyong , Yunkai Zhang , Jiang Xuexin , CGEL Subject: [PATCH] mm/ksm: introduce ksm_enabled for each process Date: Tue, 17 May 2022 09:27:01 +0000 Message-Id: <20220517092701.1662641-1-xu.xin16@zte.com.cn> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: xu xin For now, if we want to use KSM to merge pages of some apps, we have to explicitly call madvise() in application code, which means installed apps on OS needs to be uninstall and source code needs to be modified. It is very inconvenient because sometimes users or app developers are not willing to modify their app source codes for any reasons. So to use KSM more flexibly, we provide a new proc file "ksm_enabled" under /proc//. We can pass parameter into this file with one of three values as follows: always: force all anonymous and eligible VMAs of this process to be scanned by ksmd. madvise: the default state, unless user code call madvise, ksmd doesn't scan this process. never: this process will never be scanned by ksmd and no merged pages occurred in this process. With this patch, we can control KSM with ``/proc//ksm_enabled`` based on every process. KSM for each process can be entirely disabled (mostly for debugging purposes) or only enabled inside MADV_MERGEABLE regions (to avoid the risk of consuming more cpu resources to scan for ksmd) or enabled entirely for a process. Signed-off-by: xu xin Reviewed-by: Yang Yang Reviewed-by: Ran Xiaokai Reviewed-by: wangyong Reviewed-by: Yunkai Zhang Reviewed-by: Jiang Xuexin Signed-off-by: CGEL --- Documentation/admin-guide/mm/ksm.rst | 24 ++++++- Documentation/filesystems/proc.rst | 14 ++++ fs/proc/base.c | 102 ++++++++++++++++++++++++++- include/linux/ksm.h | 5 ++ include/linux/mm_types.h | 10 +++ mm/ksm.c | 35 ++++++++- 6 files changed, 185 insertions(+), 5 deletions(-) diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst index b244f0202a03..91326198e37f 100644 --- a/Documentation/admin-guide/mm/ksm.rst +++ b/Documentation/admin-guide/mm/ksm.rst @@ -32,7 +32,7 @@ are swapped back in: ksmd must rediscover their identity and merge again). Controlling KSM with madvise ============================ -KSM only operates on those areas of address space which an application +KSM can operate on those areas of address space which an application has advised to be likely candidates for merging, by using the madvise(2) system call:: @@ -70,6 +70,28 @@ Applications should be considerate in their use of MADV_MERGEABLE, restricting its use to areas likely to benefit. KSM's scans may use a lot of processing power: some installations will disable KSM for that reason. +Controlling KSM with procfs +=========================== +We can also control KSM with ``/proc//ksm_enabled`` based on every +process. KSM for each process can be entirely disabled (mostly for +debugging purposes) or only enabled inside MADV_MERGEABLE regions (to avoid +the risk of consuming more cpu resources to scan for ksmd) or enabled entirely +for a process. This can be achieved with one of:: + + echo always > /proc//ksm_enabled + echo madvise > /proc//ksm_enabled + echo never > /proc//ksm_enabled + +always: + force all anonymous and eligible VMAs of this process to be scanned + by ksmd. +madvise: + the default state, unless user code call madvise, ksmd doesn't scan + this process. +never: + this process will never be scanned by ksmd and no merged pages + occurred in this process. + .. _ksm_sysfs: KSM daemon sysfs interface diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 1bc91fb8c321..ea7e08a1c143 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold June 9 2009 3.10 /proc//timerslack_ns - Task timerslack value 3.11 /proc//patch_state - Livepatch patch operation state 3.12 /proc//arch_status - Task architecture specific information + 3.13 /proc//ksm_enabled - Controlling KSM based on process 4 Configuring procfs 4.1 Mount options @@ -2140,6 +2141,19 @@ AVX512_elapsed_ms the task is unlikely an AVX512 user, but depends on the workload and the scheduling scenario, it also could be a false negative mentioned above. +3.13 /proc//ksm_enabled - Controlling KSM based on process +--------------------------------------------------------------- +When CONFIG_KSM is enabled, this file can be used to specify how this +process's anonymous memory gets involved in KSM scanning. + +If writing "always" to this file, it will force all anonymous and eligible +VMAs of this process to be scanned by ksmd. + +If writing "madvise" to this file, turn to the default state, unless user +code call madvise, ksmd doesn't scan this process. + +If writing "never" to this file, this process will never be scanned by ksmd. + Chapter 4: Configuring procfs ============================= diff --git a/fs/proc/base.c b/fs/proc/base.c index 617816168748..760ceeab4aa1 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -96,6 +96,7 @@ #include #include #include +#include #include #include "internal.h" #include "fd.h" @@ -3171,7 +3172,104 @@ static int proc_pid_ksm_merging_pages(struct seq_file *m, struct pid_namespace * return 0; } -#endif /* CONFIG_KSM */ + +static int ksm_enabled_show(struct seq_file *m, void *v) +{ + struct inode *inode = m->private; + struct mm_struct *mm; + struct task_struct *task = get_proc_task(inode); + + if (!task) + return -ESRCH; + + mm = get_task_mm(task); + if (mm) { + if (mm->ksm_enabled == KSM_PROC_ALWAYS) + seq_puts(m, "[always] madvise never\n"); + else if (mm->ksm_enabled == KSM_PROC_MADVISE) + seq_puts(m, "always [madvise] never\n"); + else + seq_puts(m, "always madvise [never]\n"); + mmput(mm); + } + + put_task_struct(task); + return 0; +} + +static ssize_t ksm_enabled_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + struct task_struct *task; + struct mm_struct *mm; + char buffer[PROC_NUMBUF]; + int value; + int err = 0; + long str_len; + + if (count > sizeof(buffer) - 1) + count = sizeof(buffer) - 1; + str_len = strncpy_from_user(buffer, buf, count); + if (str_len < 0) + return -EFAULT; + buffer[str_len - 1] = '\0'; + + if (!strcmp("always", buffer)) + value = KSM_PROC_ALWAYS; + else if (!strcmp("madvise", buffer)) + value = KSM_PROC_MADVISE; + else if (!strcmp("never", buffer)) + value = KSM_PROC_NEVER; + else + return -EINVAL; + + task = get_proc_task(file_inode(file)); + if (!task) + return -ESRCH; + mm = get_task_mm(task); + if (!mm) + goto out_put_task; + + if (mm->ksm_enabled != value) { + if (mmap_write_lock_killable(mm)) { + err = -EINTR; + goto out_mmput; + } + if (value == KSM_PROC_NEVER) + mm->ksm_enabled = value; + else { + /* + * No matter whether it's KSM_PROC_ALWAYS or KSM_PROC_MADVISE, we need + * to recheck mm->flags to guarantee that this mm is in ksm_scan. + */ + if (!test_bit(MMF_VM_MERGEABLE, &mm->flags)) + err = __ksm_enter(mm); + if (!err) + mm->ksm_enabled = value; + } + mmap_write_unlock(mm); + } + +out_mmput: + mmput(mm); +out_put_task: + put_task_struct(task); + return err < 0 ? err : count; +} + +static int ksm_enabled_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, ksm_enabled_show, inode); +} + +static const struct file_operations proc_pid_ksm_enabled_operations = { + .open = ksm_enabled_open, + .read = seq_read, + .write = ksm_enabled_write, + .llseek = seq_lseek, + .release = single_release, +}; +#endif /*CONFIG_KSM */ #ifdef CONFIG_STACKLEAK_METRICS static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns, @@ -3306,6 +3404,7 @@ static const struct pid_entry tgid_base_stuff[] = { #endif #ifdef CONFIG_KSM ONE("ksm_merging_pages", S_IRUSR, proc_pid_ksm_merging_pages), + REG("ksm_enabled", S_IRUGO|S_IWUSR, proc_pid_ksm_enabled_operations), #endif }; @@ -3642,6 +3741,7 @@ static const struct pid_entry tid_base_stuff[] = { #endif #ifdef CONFIG_KSM ONE("ksm_merging_pages", S_IRUSR, proc_pid_ksm_merging_pages), + REG("ksm_enabled", S_IRUGO|S_IWUSR, proc_pid_ksm_enabled_operations), #endif }; diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 0b4f17418f64..29d23d208b54 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -19,6 +19,11 @@ struct stable_node; struct mem_cgroup; #ifdef CONFIG_KSM + +#define KSM_PROC_MADVISE 0 +#define KSM_PROC_ALWAYS 1 +#define KSM_PROC_NEVER 2 + int ksm_madvise(struct vm_area_struct *vma, unsigned long start, unsigned long end, int advice, unsigned long *vm_flags); int __ksm_enter(struct mm_struct *mm); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 417ef1519475..29fd4c84d08c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -649,6 +649,16 @@ struct mm_struct { * merging. */ unsigned long ksm_merging_pages; + + /* + * Represent the state of this mm involing in KSM, with 3 states: + * 1) KSM_PROC_ALWAYS: force all anonymous VMAs of this process to + * be scanned. + * 2) KSM_PROC_MADVISE: the default state, unless user code call + * madvise, don't scan this process. + * 3) KSM_PROC_NEVER: never be involed in KSM. + */ + int ksm_enabled; #endif } __randomize_layout; diff --git a/mm/ksm.c b/mm/ksm.c index 26da7f813f23..90cc8eda8bca 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -334,6 +334,35 @@ static void __init ksm_slab_free(void) mm_slot_cache = NULL; } +static bool vma_scannable(struct vm_area_struct *vma) +{ + unsigned long vm_flags = vma->vm_flags; + struct mm_struct *mm = vma->vm_mm; + + if (mm->ksm_enabled == KSM_PROC_NEVER || + (!(vma->vm_flags & VM_MERGEABLE) && + mm->ksm_enabled != KSM_PROC_ALWAYS)) + return false; + + if (vm_flags & (VM_SHARED | VM_MAYSHARE | + VM_PFNMAP | VM_IO | VM_DONTEXPAND | + VM_HUGETLB | VM_MIXEDMAP)) + return false; /* just ignore this vma*/ + + if (vma_is_dax(vma)) + return false; +#ifdef VM_SAO + if (vm_flags & VM_SAO) + return false; +#endif +#ifdef VM_SPARC_ADI + if (vm_flags & VM_SPARC_ADI) + return false; +#endif + + return true; +} + static __always_inline bool is_stable_node_chain(struct stable_node *chain) { return chain->rmap_hlist_len == STABLE_NODE_CHAIN; @@ -523,7 +552,7 @@ static struct vm_area_struct *find_mergeable_vma(struct mm_struct *mm, if (ksm_test_exit(mm)) return NULL; vma = vma_lookup(mm, addr); - if (!vma || !(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) + if (!vma || !vma_scannable(vma) || !vma->anon_vma) return NULL; return vma; } @@ -990,7 +1019,7 @@ static int unmerge_and_remove_all_rmap_items(void) for_each_vma(vmi, vma) { if (ksm_test_exit(mm)) break; - if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) + if (!vma_scannable(vma) || !vma->anon_vma) continue; err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end); @@ -2300,7 +2329,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page) goto no_vmas; for_each_vma(vmi, vma) { - if (!(vma->vm_flags & VM_MERGEABLE)) + if (!vma_scannable(vma)) continue; if (ksm_scan.address < vma->vm_start) ksm_scan.address = vma->vm_start; -- 2.25.1