Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5099437yba; Mon, 13 May 2019 05:27:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqySaxGXQX1R7zoIxDK+gBGCgFSGaM4mTeT1WZ7HfSlf7BvJbJEMDCNGJRJIzHHiy9nVpvWn X-Received: by 2002:a17:902:a405:: with SMTP id p5mr25203703plq.51.1557750478045; Mon, 13 May 2019 05:27:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557750478; cv=none; d=google.com; s=arc-20160816; b=RICK7TqWNDaiUkZ/KxGZ/rIzLwo21NjuvlEumn9IqGbX1/7WYLayU7h1eMstxZp614 uyoA4cPDsSu6JSMG/pnD1+k8RfdSB+IXJLO0o+uv3ntmmunyCJxsJaqrfRO/yMl2hUmy 3n+L40Y8647hwCnij3Dfqn7WC7Uyj9RuC0T5+J9LPL/MUxxNKSU1JSoHLok3CWdXmaHA ykJDcw/tMnbwaBgzWlqiXryWmXcjFHx8RGBzA/96QRP6nf1OFYjl+LfQL56oY1C6trZM ACz0qVxvyCV0RU2KUp2oyhsLjL7g/SpyNP3DWOHpdt1fvcCENnukB4YD5XRjA/Iuv1TR VcGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=9Li4SO9HMenFNFq/6Vli8uiQ34IKkiUmLjzOGtl6ymg=; b=ZcO9tfM8TPbak0jJ6fy36a0EdYifFzXaX39xNTJCRelCFI2ZCMogkETCWBLCULowTj yvhhYyx8zGktKZV3+ytfVyfB8uYs2v4jPaPu5FyJcgdzTuLJhnLNag5oOStIctHZP6KR yQETbpbyfMmZiR19urDANoczfpzQsno1GpDBI7L73mygbOjNgoy4cR1vY1UGM5Cyt2r/ 24sstzbQuJ2U6cy31kcyuFzRTj/ZG1q5fkmz47lo9aSRzOe+/Ad9HwWBbX7sddBHpW0E 6yYmuAN8nr1Sc97HQaGOrRKAuO3ObpDixTq010QX2H7OBY1orfxlhvuES2kz33qTwYTX zg2A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e3si17324537pfa.155.2019.05.13.05.27.42; Mon, 13 May 2019 05:27:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728947AbfEMKjI (ORCPT + 99 others); Mon, 13 May 2019 06:39:08 -0400 Received: from relay.sw.ru ([185.231.240.75]:43700 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727272AbfEMKjI (ORCPT ); Mon, 13 May 2019 06:39:08 -0400 Received: from [172.16.25.169] by relay.sw.ru with esmtp (Exim 4.91) (envelope-from ) id 1hQ8M3-0005RO-PD; Mon, 13 May 2019 13:38:48 +0300 Subject: Re: [PATCH RFC 0/4] mm/ksm: add option to automerge VMAs To: Oleksandr Natalenko , linux-kernel@vger.kernel.org Cc: Vlastimil Babka , Michal Hocko , Matthew Wilcox , Pavel Tatashin , Timofey Titovets , Aaron Tomlin , linux-mm@kvack.org References: <20190510072125.18059-1-oleksandr@redhat.com> From: Kirill Tkhai Message-ID: <36a71f93-5a32-b154-b01d-2a420bca2679@virtuozzo.com> Date: Mon, 13 May 2019 13:38:43 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190510072125.18059-1-oleksandr@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Oleksandr, On 10.05.2019 10:21, Oleksandr Natalenko wrote: > By default, KSM works only on memory that is marked by madvise(). And the > only way to get around that is to either: > > * use LD_PRELOAD; or > * patch the kernel with something like UKSM or PKSM. > > Instead, lets implement a so-called "always" mode, which allows marking > VMAs as mergeable on do_anonymous_page() call automatically. > > The submission introduces a new sysctl knob as well as kernel cmdline option > to control which mode to use. The default mode is to maintain old > (madvise-based) behaviour. > > Due to security concerns, this submission also introduces VM_UNMERGEABLE > vmaflag for apps to explicitly opt out of automerging. Because of adding > a new vmaflag, the whole work is available for 64-bit architectures only. >> This patchset is based on earlier Timofey's submission [1], but it doesn't > use dedicated kthread to walk through the list of tasks/VMAs. > > For my laptop it saves up to 300 MiB of RAM for usual workflow (browser, > terminal, player, chats etc). Timofey's submission also mentions > containerised workload that benefits from automerging too. This all approach looks complicated for me, and I'm not sure the shown profit for desktop is big enough to introduce contradictory vma flags, boot option and advance page fault handler. Also, 32/64bit defines do not look good for me. I had tried something like this on my laptop some time ago, and the result was bad even in absolute (not in memory percentage) meaning. Isn't LD_PRELOAD trick enough to desktop? Your workload is same all the time, so you may statically insert correct preload to /etc/profile and replace your mmap forever. Speaking about containers, something like this may have a sense, I think. The probability of that several containers have the same pages are higher, than that desktop applications have the same pages; also LD_PRELOAD for containers is not applicable. But 1)this could be made for trusted containers only (are there similar issues with KSM like with hardware side-channel attacks?!); 2) the most shared data for containers in my experience is file cache, which is not supported by KSM. There are good results by the link [1], but it's difficult to analyze them without knowledge about what happens inside them there. Some of tests have "VM" prefix. What the reason the hypervisor don't mark their VMAs as mergeable? Can't this be fixed in hypervisor? What is the generic reason that VMAs are not marked in all the tests? In case of there is a fundamental problem of calling madvise, can't we just implement an easier workaround like a new write-only file: #echo $task > /sys/kernel/mm/ksm/force_madvise which will mark all anon VMAs as mergeable for a passed task's mm? A small userspace daemon may write mergeable tasks there from time to time. Then we won't need to introduce additional vm flags and to change anon pagefault handler, and the changes will be small and only related to mm/ksm.c, and good enough for both 32 and 64 bit machines. Thanks, Kirill