Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp5858336ybi; Wed, 12 Jun 2019 09:38:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqz17GFLPHVJgFkpFWiJCewiKI6FgQ2EwsBLwLO0nzYAjkpvmGUXY+qptRBK3Ai9zf9Lm5wk X-Received: by 2002:a63:4006:: with SMTP id n6mr26562826pga.424.1560357488695; Wed, 12 Jun 2019 09:38:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560357488; cv=none; d=google.com; s=arc-20160816; b=in/PFlH3dJevCLzCn3f94Fp96pyISQGpfTHb587tCT7WuE349KNWh7jGihjKxkEu5D WymwO5+CpUZtZaaZbqlpKUAnlSEz3PO30fSL9GHIVDjJIWInhLqORWJ998JE+UuH/S3E 28DWno9yTlPYsEY6XCu/gghUNIjXEg9MvXUTMLsUN3vOSu0czH91/OT6mxaBkyx1wxAv E6xq3NVDbM+1GazvqVJ5lRLPONg0ptMyY/GkH9HRJS8OWpyvPuBxoFLL7j8bpNAosEC/ TvVRuoanF3IDu9mWX1+IW5StW9aaxie4pqgolR23nb8h6BlWlsiT6PP294U3bRc/Wx/g dtXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=2DY9BNuwvWwHmPyKTH7LGXdzKvzCIaqNI7kWqlmuhvc=; b=Xub+htkrvlzhb76Qz/+8i16V3HF6xZbui8kXDn+XiUSenQ4uM3YBjfcllNg+HqFNiJ +Xfu2ZgZKH2njIr1iyDgrJmdO7CcnI/4EPFfQLWXio394wgwqq/r2vpNNJJHpOd7hQZe Gl93P+aalwrfo1xo+R48i7o/gcRni/tEyB36M6UTl1At7JnxrY/Sq/zVDfBEWWgHiFt0 NfTBcVPcZ62QwBGOkHIEfoVUFS8ihk7gN2CDk6rzPbQXdDMQNikOKjZErCV3eopbBRbl Nb45QgzY/KW78NwM2w9J8D/jaCJyAs7C9Z9ziYBbenYs9hzaPTj0u6+0SmNiwIWPjxpp RqDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d16si205929pfr.229.2019.06.12.09.37.53; Wed, 12 Jun 2019 09:38:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730466AbfFLLTZ (ORCPT + 99 others); Wed, 12 Jun 2019 07:19:25 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:39559 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728571AbfFLLTY (ORCPT ); Wed, 12 Jun 2019 07:19:24 -0400 Received: by mail-wr1-f66.google.com with SMTP id x4so13813554wrt.6 for ; Wed, 12 Jun 2019 04:19:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=2DY9BNuwvWwHmPyKTH7LGXdzKvzCIaqNI7kWqlmuhvc=; b=hI8dj441BY+nSdqyQWpYVwsJ4r0V1tI8u7JHPljqueG0Da6STcc3e2kAFbEQkWKtYd JLxhpyRd1rOX4IOk2ri2L06hT7pyUUDSTplrZ6kLqB0mD8X4X2uUakWqs5uCQEaC2u3/ fkRygTk5kwE1Edt5vv4X4z/CP/JKTqh0BgnDQDhpjCuNy8NnPo15uQGrD1RELdwINFKx DHrKuye64GLUL2Oh6dunNI/bk8hcRRpeU4S+b0p7cZeH36OWZvxCwmPWRoTJiQXo4xsh 8o5qvYdMOh2mySK+DNkQw7G6HletxD5n89NQJTRmt9Lc1TJDxtM+3mf97U+wEmhZjIuT k4Ww== X-Gm-Message-State: APjAAAURPXmIG0NDmprLJUt/iCPv2T0xokDbHtXUUBJ4Wkeym0wbaHUM syzMaD0w+TUgBC0Sk5TJrHLtCQ== X-Received: by 2002:a5d:488b:: with SMTP id g11mr42236505wrq.72.1560338362423; Wed, 12 Jun 2019 04:19:22 -0700 (PDT) Received: from localhost (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.gmail.com with ESMTPSA id m21sm4710436wmc.1.2019.06.12.04.19.21 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 12 Jun 2019 04:19:21 -0700 (PDT) Date: Wed, 12 Jun 2019 13:19:20 +0200 From: Oleksandr Natalenko To: Pavel Machek Cc: Minchan Kim , Andrew Morton , linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , jannh@google.com, oleg@redhat.com, christian@brauner.io, hdanton@sina.com, lizeb@google.com Subject: Re: [PATCH v2 0/5] Introduce MADV_COLD and MADV_PAGEOUT Message-ID: <20190612111920.evedpmre63ivnxkz@butterfly.localdomain> References: <20190610111252.239156-1-minchan@kernel.org> <20190612105945.GA16442@amd> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190612105945.GA16442@amd> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 12, 2019 at 12:59:45PM +0200, Pavel Machek wrote: > > - Problem > > > > Naturally, cached apps were dominant consumers of memory on the system. > > However, they were not significant consumers of swap even though they are > > good candidate for swap. Under investigation, swapping out only begins > > once the low zone watermark is hit and kswapd wakes up, but the overall > > allocation rate in the system might trip lmkd thresholds and cause a cached > > process to be killed(we measured performance swapping out vs. zapping the > > memory by killing a process. Unsurprisingly, zapping is 10x times faster > > even though we use zram which is much faster than real storage) so kill > > from lmkd will often satisfy the high zone watermark, resulting in very > > few pages actually being moved to swap. > > Is it still faster to swap-in the application than to restart it? It's the same type of question I was addressing earlier in the remote KSM discussion: making applications aware of all the memory management stuff or delegate the decision to some supervising task. In this case, we cannot rewrite all the application to handle imaginary SIGRESTART (or whatever you invent to handle restarts gracefully). SIGTERM may require more memory to finish stuff to not lose your data (and I guess you don't want to lose your data, right?), and SIGKILL is pretty much destructive. Offloading proactive memory management to a process that knows how to do it allows to handle not only throwaway containers/microservices, but also usual desktop/mobile workflow. > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > > information required to make the reclaim decision is not known to the app. > > Instead, it is known to a centralized userspace daemon, and that daemon > > must be able to initiate reclaim on its own without any app involvement. > > To solve the concern, this patch introduces new syscall - > > > > struct pr_madvise_param { > > int size; /* the size of this structure */ > > int cookie; /* reserved to support atomicity */ > > int nr_elem; /* count of below arrary fields */ > > int __user *hints; /* hints for each range */ > > /* to store result of each operation */ > > const struct iovec __user *results; > > /* input address ranges */ > > const struct iovec __user *ranges; > > }; > > > > int process_madvise(int pidfd, struct pr_madvise_param *u_param, > > unsigned long flags); > > That's quite a complex interface. > > Could we simply have feel_free_to_swap_out(int pid) syscall? :-). I wonder for how long we'll go on with adding new syscalls each time we need some amendment to existing interfaces. Yes, clone6(), I'm looking at you :(. In case of process_madvise() keep in mind it will be focused not only on MADV_COLD, but also, potentially, on other MADV_ flags as well. I can hardly imagine we'll add one syscall per each flag. -- Best regards, Oleksandr Natalenko (post-factum) Senior Software Maintenance Engineer