Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp991177iob; Fri, 13 May 2022 18:50:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwuPVVq6fdoD+n/WNkIFTcjDCwgh+JZ+5Mnfchy5Je6Cwtv1bW/f5h1eGekjHeu0yqLKEtr X-Received: by 2002:a05:600c:2216:b0:394:54e4:ac9 with SMTP id z22-20020a05600c221600b0039454e40ac9mr6910197wml.47.1652493049563; Fri, 13 May 2022 18:50:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652493049; cv=none; d=google.com; s=arc-20160816; b=tUb5Y1VZ2Q8Z1wui1rlG0B3wn3QnJx2Q+QJ+VM0G6tPQFfON2j9CYuYjQ+YOgN20Vg dFjDK0Cj++tec3v18UOXRhXaXbwDnFT/HMCf1dKCmJTKCqui6GyZc+Jsk7VQtTqYXN4I 2yNZqfj7L3TP36G1+U1vWUiqeVfK+LudoxnvKgaQXoPGt9VYBZoqagYUYXcnazfk4TtX tBFysERu18KO1HfhdfKNWulVe550iWAJNU5phKwuLS4mMYo9lGDGX/ewS6ccP4tho9OJ EdHVYAEVuK/Mm+NWjKt7H5RwMTm10jyj+ii9M7pMnDCDjBt7PPMUvuVD7NfO8c6HI6Wm 8xcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=aWuQ6rMM2VHTE4MKLnHU9hlCcUp9jUr61sgEWyl+a0Q=; b=dfHC71rgK0MA5GGb94NKouJC9n8kzev2ZboQc4rYLh+ek/61cTCwFg6cOF+oKTmPf0 iAnFLgREWmdqUZiwvKVGT2wlCvlZzdMtShDzCHqukMhrIuI5gB5gK6vYhQ+1nC1EryNN ouc/rI8Uxcp/gIWuWlc9cnjt8+dMuF6BeNHM+4z/JEwbP76XGzwS4aaBs0TktgrCOaXV piQUlHbRblAJ3vOLQ8Xksg5IxcgmEfUxvH8hW5C6xCajEvFPa5ZUN6Ogb3z4+uA9JQru IQ4hFrM/C1NrzH26eOFMYe/f5KwG2au/rMUPtiPVNQtHOepnsQ705yceB/eQEUfYSRzy W1cw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=GnVFoZiT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id p6-20020a5d59a6000000b00205da9cf24bsi3695972wrr.1032.2022.05.13.18.50.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 May 2022 18:50:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=GnVFoZiT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CC04444A021; Fri, 13 May 2022 17:14:35 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384574AbiEMUdW (ORCPT + 99 others); Fri, 13 May 2022 16:33:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384479AbiEMUdM (ORCPT ); Fri, 13 May 2022 16:33:12 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 141D56D1AD; Fri, 13 May 2022 13:32:13 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 948C8B831C4; Fri, 13 May 2022 20:32:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DA1DAC34100; Fri, 13 May 2022 20:32:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1652473931; bh=gmC5AwVMGEU2yYbKfPpZCKlEafrDstnH7NSBfxHTE8s=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=GnVFoZiTL3uqj38AiZl1Vz7AxQAjGSbWngWz20DyGxSP0DgtxFcsxMN9FvY1MqChm 1P+o4f+nAG2ZqVKZEYsQSnUTWeH2EC3JvmBHa4LQ0TD4oxpD7+6Za6gESm2+KpQLN/ MxbE+y77lmCFjQXNV3L9gXUYmz/10kBHSZ3Gd8d4= Date: Fri, 13 May 2022 13:32:10 -0700 From: Andrew Morton To: Oleksandr Natalenko Cc: cgel.zte@gmail.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, corbet@lwn.net, xu xin , Yang Yang , Ran Xiaokai , wangyong , Yunkai Zhang , Matthew Wilcox Subject: Re: [PATCH v6] mm/ksm: introduce ksm_force for each process Message-Id: <20220513133210.9dd0a4216bd8baaa1047562c@linux-foundation.org> In-Reply-To: <1835064.A2aMcgg3dW@natalenko.name> References: <20220510122242.1380536-1-xu.xin16@zte.com.cn> <5820954.lOV4Wx5bFT@natalenko.name> <20220512153753.6f999fa8f5519753d43b8fd5@linux-foundation.org> <1835064.A2aMcgg3dW@natalenko.name> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 13 May 2022 11:51:53 +0200 Oleksandr Natalenko wrote: > Hello. > > On pátek 13. května 2022 0:37:53 CEST Andrew Morton wrote: > > On Tue, 10 May 2022 15:30:36 +0200 Oleksandr Natalenko wrote: > > > > > > If ksm_force is set to 1, force all anonymous and 'qualified' VMAs > > > > of this mm to be involved in KSM scanning without explicitly calling > > > > madvise to mark VMA as MADV_MERGEABLE. But It is effective only when > > > > the klob of /sys/kernel/mm/ksm/run is set as 1. > > > > > > > > If ksm_force is set to 0, cancel the feature of ksm_force of this > > > > process (fallback to the default state) and unmerge those merged pages > > > > belonging to VMAs which is not madvised as MADV_MERGEABLE of this process, > > > > but still leave MADV_MERGEABLE areas merged. > > > > > > To my best knowledge, last time a forcible KSM was discussed (see threads [1], [2], [3] and probably others) it was concluded that a) procfs was a horrible interface for things like this one; and b) process_madvise() syscall was among the best suggested places to implement this (which would require a more tricky handling from userspace, but still). > > > > > > So, what changed since that discussion? > > > > > > P.S. For now I do it via dedicated syscall, but I'm not trying to upstream this approach. > > > > Why are you patching the kernel with a new syscall rather than using > > process_madvise()? > > Because I'm not sure how to use `process_madvise()` to achieve $subj properly. > > The objective is to mark all the eligible VMAs of the target task for KSM to consider them for merging. > > For that, all the eligible VMAs have to be traversed. > > Given `process_madvise()` has got an iovec API, this means the process that will call `process_madvise()` has to know the list of VMAs of the target process. In order to traverse them in a race-free manner the target task has to be SIGSTOP'ped or frozen, then the list of VMAs has to be obtained, then `process_madvise()` has to be called, and the the target task can continue. This is: > > a) superfluous (the kernel already knows the list of VMAs of the target tasks, why proxy it through the userspace then?); and > b) may induce more latency than needed because the target task has to be stopped to avoid races. OK. And what happens to new vmas that the target process creates after the process_madvise()? > OTOH, IIUC, even if `MADV_MERGEABLE` is allowed for `process_madvise()`, Is it not? > I cannot just call it like this: > > ``` > iovec.iov_base = 0; > iovec.iov_len = ~0ULL; > process_madvise(pidfd, &iovec, 1, MADV_MERGEABLE, 0); > ``` > > to cover the whole address space because iovec expects total size to be under ssize_t. > > Or maybe there's no need to cover the whole address space, only the lower half of it? Call process_madvise() twice, once for each half? > Or maybe there's another way of doing things, and I just look stupid and do not understand how this is supposed to work?.. > > I'm more than happy to read your comments on this. > I see the problem. I do like the simplicity of the ksm_force concept. Are there alternative ideas?