Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp6983836ybf; Fri, 6 Mar 2020 08:10:15 -0800 (PST) X-Google-Smtp-Source: ADFU+vsRUYXVZeCR95kUMq8KX01omsva2M22fNkpccjHPb/FGwiy94+jBI76eEKQXhA7pASB98Ii X-Received: by 2002:a05:6830:11d8:: with SMTP id v24mr3204168otq.288.1583511015448; Fri, 06 Mar 2020 08:10:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583511015; cv=none; d=google.com; s=arc-20160816; b=HMUPAH7hZoyhq61xsKvbGKeaJ8JoRSaHcRIZkisIzBfVZ2HcWt5UfoNNKcg4+BZXK6 OHBHiwKkTAE1peHQ5Ut1gwMZ18qB6e1qVqRdq0VNz6LoVttBAve2w7InsJ41QuF5YJbO IMubZe243MCoWtEG5s01x4YXBGD+AeVP1MNjf5YmY4rj0M0vc8tuaaFbOUZ9ay2aC9d/ elT1FvyRjLKrHaoafmduCg1Wbk6eeEAh9DZH/jM0DJksjm31m8g8fGGkuWkSIoS+jt63 6/rEw25fn9FyQC4qKyTH4GmrmaiMeqEo4bPtPVNprP7irL1OBI3EXY5/+NhNrHxqogZ0 KueQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=MLx8xrgBCF1aIb3aJ99cVaVgz4WKE2J7KfOoRtwxARg=; b=QpHwF4tGuxvMYaeZ6MimpQH3WM6/ZEfvpHTpMyPEPu0iWMqk+nE1O7ANGK3M61ZJhR EDfW5VciZitX/YrtCTA5c9ZhjZl7I/EuoPGZjEjJdgc/Z1pri7hCEoKjwRTTSB2JtSyR f2GIJCJZV4zjJJhna22xJSkJhEwYimT6JGlzxi4tAyR1GV339gACZ3lvEWr6t5I6ALaT d9H0IwuMuRpeQiUfrhy18Z9bL4HI60oVTnhz9V/RbE3pgRAGR8L5Y++KK2WmvrlNAYgR Q51FeakjioEwJqKOPd1CYALAy3dvzbXWVTQDlxWIoNDCB7qgyl6p2Bshrpg+3L1a19gq A5vg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z16si1646649oth.23.2020.03.06.08.10.03; Fri, 06 Mar 2020 08:10:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727065AbgCFQIW (ORCPT + 99 others); Fri, 6 Mar 2020 11:08:22 -0500 Received: from mx2.suse.de ([195.135.220.15]:43728 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726738AbgCFQIV (ORCPT ); Fri, 6 Mar 2020 11:08:21 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 10064ADC8; Fri, 6 Mar 2020 16:08:19 +0000 (UTC) Subject: Re: [PATCH v7 7/7] mm/madvise: allow KSM hints for remote API To: Oleksandr Natalenko Cc: Minchan Kim , Andrew Morton , LKML , linux-mm , linux-api@vger.kernel.org, Suren Baghdasaryan , Tim Murray , Daniel Colascione , Sandeep Patil , Sonny Rao , Brian Geffon , Michal Hocko , Johannes Weiner , Shakeel Butt , John Dias , Joel Fernandes , Jann Horn , alexander.h.duyck@linux.intel.com, sj38.park@gmail.com, SeongJae Park References: <20200302193630.68771-1-minchan@kernel.org> <20200302193630.68771-8-minchan@kernel.org> <2a66abd8-4103-f11b-06d1-07762667eee6@suse.cz> <20200306134146.mqiyvsdnqty7so53@butterfly.localdomain> From: Vlastimil Babka Message-ID: Date: Fri, 6 Mar 2020 17:08:18 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <20200306134146.mqiyvsdnqty7so53@butterfly.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/6/20 2:41 PM, Oleksandr Natalenko wrote: > On Fri, Mar 06, 2020 at 02:13:49PM +0100, Vlastimil Babka wrote: >> On 3/2/20 8:36 PM, Minchan Kim wrote: >> > From: Oleksandr Natalenko >> > >> > It all began with the fact that KSM works only on memory that is marked >> > by madvise(). And the only way to get around that is to either: >> > >> > * use LD_PRELOAD; or >> > * patch the kernel with something like UKSM or PKSM. >> > >> > (i skip ptrace can of worms here intentionally) >> > >> > To overcome this restriction, lets employ a new remote madvise API. This >> > can be used by some small userspace helper daemon that will do auto-KSM >> > job for us. >> > >> > I think of two major consumers of remote KSM hints: >> > >> > * hosts, that run containers, especially similar ones and especially in >> > a trusted environment, sharing the same runtime like Node.js; Ah, I forgot to ask, given the discussion of races in patch 2 (Question 2), where android can stop the tasks to apply the madvise hints in a race-free manner, how does that work for remote KSM hints in your scenarios, especially the one above? >> > >> > * heavy applications, that can be run in multiple instances, not >> > limited to opensource ones like Firefox, but also those that cannot be >> > modified since they are binary-only and, maybe, statically linked. >> > >> > Speaking of statistics, more numbers can be found in the very first >> > submission, that is related to this one [1]. For my current setup with >> > two Firefox instances I get 100 to 200 MiB saved for the second instance >> > depending on the amount of tabs. >> > >> > 1 FF instance with 15 tabs: >> > >> > $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc >> > 410 >> > >> > 2 FF instances, second one has 12 tabs (all the tabs are different): >> > >> > $ echo "$(cat /sys/kernel/mm/ksm/pages_sharing) * 4 / 1024" | bc >> > 592 >> > >> > At the very moment I do not have specific numbers for containerised >> > workload, but those should be comparable in case the containers share >> > similar/same runtime. >> > >> > [1] https://lore.kernel.org/patchwork/patch/1012142/ >> > >> > Reviewed-by: SeongJae Park >> > Signed-off-by: Oleksandr Natalenko >> > Signed-off-by: Minchan Kim >> >> This will lead to one process calling unmerge_ksm_pages() of another. There's a >> (signal_pending(current)) test there, should it check also the other task, >> analogically to task 3? > > Do we care about current there then? Shall we just pass mm into unmerge_ksm_pages and check the signals of the target task only, be it current or something else? Dunno, it's nice to react to signals quickly, for any proces that gets them, no? >> Then break_ksm() is fine as it is, as ksmd also calls it, right? > > I think break_ksm() cares only about mmap_sem protection, so we should > be fine here. > >> >> > --- >> > mm/madvise.c | 4 ++++ >> > 1 file changed, 4 insertions(+) >> > >> > diff --git a/mm/madvise.c b/mm/madvise.c >> > index e77c6c1fad34..f4fa962ee74d 100644 >> > --- a/mm/madvise.c >> > +++ b/mm/madvise.c >> > @@ -1005,6 +1005,10 @@ process_madvise_behavior_valid(int behavior) >> > switch (behavior) { >> > case MADV_COLD: >> > case MADV_PAGEOUT: >> > +#ifdef CONFIG_KSM >> > + case MADV_MERGEABLE: >> > + case MADV_UNMERGEABLE: >> > +#endif >> > return true; >> > default: >> > return false; >> > >> >