Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp1780946ybg; Sat, 19 Oct 2019 02:22:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqxueCh03vlr4byn5jo52FhiQT6v1QnY49I6fX5B9IIbLXsHWZ5Vm2sK2J71D10HpiNeiFYJ X-Received: by 2002:a17:906:7e50:: with SMTP id z16mr1180681ejr.130.1571476927267; Sat, 19 Oct 2019 02:22:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571476927; cv=none; d=google.com; s=arc-20160816; b=E+8pxGFzVeRtfT9tSesqe55w+m9ZCXVdIQiXB/9Chn0SoVjnKwo1WSGjljDDpwMnaz g0Geuk5iaCqTFVJt+OJ+h+xG8rg83d88G6kLY88gpja6gxlZ7dM1xVaexrTtg+x0UYeK DjJchi6q/nZ2+uucPkxdVEk3EMIIJbt6DJ2BTqvREr+LU6LwlCYcD8hEQM4e40gg2pFn cbazqG8yYzFvQU+lYnBnExDGMhoDEEDSbgIass+eXMXGYECylzU4j4/sSF3uWRcMkVzM rO0g73qrjzgg1r0L30Ed+6Iqz+2Vt1wJkqG2i8D8C9K+yHifXTnZ5r0iVNBYvIgAP3Ge T+kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=f6k6Qcz0XIMruasWL54uOmkX+OaPWhnAxt/2ABdxQoo=; b=0pogGhBtmooP8OMwv6ALuvdkhNjWEyC8kJN6C8KHaI3MSlZwOZDd5kk16bsdvxNq60 DkkP0w22VjG6Zai6qkzFMBkfPEisHfe9g/cS+akZ9AOVNifAuJISxK1zteiAwbE2fnHX PMoyRgs7Fv4LhWe6wgcAEwbcxDcLH4LEuXk7NO114KxMXtJlbWCXfni7ddWOLzEcMqIF uvWcj43d9FV0egWwmP3QLLG5Vm8iIPmeL7Cl2cEFi+LjZQyS+WeXT0J7a4vCTf3+kVUT NLOEz9MloSgUs3OcZpZTUem2YHlLokGDDgWpvS/7Xt9RUo8Lea57xSvf0HB2U9exu63L 4lHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=eYJAofCd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c14si5042278ejz.242.2019.10.19.02.21.43; Sat, 19 Oct 2019 02:22:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=eYJAofCd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2506220AbfJRVoY (ORCPT + 99 others); Fri, 18 Oct 2019 17:44:24 -0400 Received: from mail-qk1-f194.google.com ([209.85.222.194]:34077 "EHLO mail-qk1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2502528AbfJRVoY (ORCPT ); Fri, 18 Oct 2019 17:44:24 -0400 Received: by mail-qk1-f194.google.com with SMTP id f18so6109896qkm.1 for ; Fri, 18 Oct 2019 14:44:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=f6k6Qcz0XIMruasWL54uOmkX+OaPWhnAxt/2ABdxQoo=; b=eYJAofCde05cTsTfaMyE6JeN+7sR2G7MIGdQnW2wOcWHORSY/qB7ldYTVk8whSwnmB L8gKSJrf8GAQZ6KwNRjlzol1gOoIMpCr+14U3yY5JayWUT8uz9Hubf9i6wdwchFvG8wW KffxTYu1geR2SSmJW5dHX3DZUWn4gxpSWhDluMq5Za5xEyz9+xZTqOM/n+DLBJTuyO6d bP1zJpBEQZw+bjeAgNOihhWKAHHWsRoDBxznbeDZpa7s667hVNVykqZu0tblKtAjkZAx 7B1XHG/eSaCwNarrohViGtTx+/yL6wo9WRmnWkmEnkSujGaqy0b/ZJd+fkT9pgxiq40U EpFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=f6k6Qcz0XIMruasWL54uOmkX+OaPWhnAxt/2ABdxQoo=; b=mNpzQCbKwP0ojgDe2b1ABUBWrNx53RzQxauj8x1XFy90ZPZzV4tiARxTeMDfR35sB6 SWYkKFdHNNTKd3RpIvbatqUDoYxiCeWC9DypM7fY/WEXZi3wYMtb6JfntNA9CUlLU7Zp qQ9TIRoXg0/8pNBfwpQfOUL82tcdIR7W9wF97CizGMVEpzCocW7mkoarmaLE1i2JAzAX jjrqatq2oXJ8g5mIIdrWJb/THb4zfxQAilNATQqe8iytoJShMZ1DZClR1aPDLJCjMEBQ np5I1Ma+wlFxdGm6XxpOfwx4qO8m7kSeFtJJOcI4Mvn/6dFGoRBZ94pcLeskTcXAWN00 B9tQ== X-Gm-Message-State: APjAAAV9BeH2nCP5M7qXVTfnojYHTT9ra7XNikxXEU7M5ATyJoZhzkLh X0THB3V6WIPavvTaU8O+Hq0xYaRMaawN1H+70ZA= X-Received: by 2002:a37:f70f:: with SMTP id q15mr10224368qkj.428.1571435061780; Fri, 18 Oct 2019 14:44:21 -0700 (PDT) MIME-Version: 1.0 References: <20191016221148.F9CCD155@viggo.jf.intel.com> <496566a6-2581-17f4-a4f2-e5def7f97582@intel.com> In-Reply-To: From: Yang Shi Date: Fri, 18 Oct 2019 14:44:08 -0700 Message-ID: Subject: Re: [PATCH 0/4] [RFC] Migrate Pages in lieu of discard To: Shakeel Butt Cc: Dave Hansen , Dave Hansen , LKML , Linux MM , Dan Williams , Jonathan Adams , "Chen, Tim C" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 17, 2019 at 3:58 PM Shakeel Butt wrote: > > On Thu, Oct 17, 2019 at 10:20 AM Yang Shi wrote: > > > > On Thu, Oct 17, 2019 at 7:26 AM Dave Hansen wrote: > > > > > > On 10/16/19 8:45 PM, Shakeel Butt wrote: > > > > On Wed, Oct 16, 2019 at 3:49 PM Dave Hansen wrote: > > > >> This set implements a solution to these problems. At the end of the > > > >> reclaim process in shrink_page_list() just before the last page > > > >> refcount is dropped, the page is migrated to persistent memory instead > > > >> of being dropped. > > > ..> The memory cgroup part of the story is missing here. Since PMEM is > > > > treated as slow DRAM, shouldn't its usage be accounted to the > > > > corresponding memcg's memory/memsw counters and the migration should > > > > not happen for memcg limit reclaim? Otherwise some jobs can hog the > > > > whole PMEM. > > > > > > My expectation (and I haven't confirmed this) is that the any memory use > > > is accounted to the owning cgroup, whether it is DRAM or PMEM. memcg > > > limit reclaim and global reclaim both end up doing migrations and > > > neither should have a net effect on the counters. > > > > Yes, your expectation is correct. As long as PMEM is a NUMA node, it > > is treated as regular memory by memcg. But, I don't think memcg limit > > reclaim should do migration since limit reclaim is used to reduce > > memory usage, but migration doesn't reduce usage, it just moves memory > > from one node to the other. > > > > In my implementation, I just skip migration for memcg limit reclaim, > > please see: https://lore.kernel.org/linux-mm/1560468577-101178-7-git-send-email-yang.shi@linux.alibaba.com/ > > > > > > > > There is certainly a problem here because DRAM is a more valuable > > > resource vs. PMEM, and memcg accounts for them as if they were equally > > > valuable. I really want to see memcg account for this cost discrepancy > > > at some point, but I'm not quite sure what form it would take. Any > > > feedback from you heavy memcg users out there would be much appreciated. > > > > We did have some demands to control the ratio between DRAM and PMEM as > > I mentioned in LSF/MM. Mel Gorman did suggest make memcg account DRAM > > and PMEM respectively or something similar. > > > > Can you please describe how you plan to use this ratio? Are > applications supposed to use this ratio or the admins will be > adjusting this ratio? Also should it dynamically updated based on the > workload i.e. as the working set or hot pages grows we want more DRAM > and as cold pages grows we want more PMEM? Basically I am trying to > see if we have something like smart auto-numa balancing to fulfill > your use-case. We thought it should be controlled by admins and transparent to the end users. The ratio is fixed, but the memory could be moved between DRAM and PMEM dynamically as long as it doesn't exceed the ratio so that we could keep warmer data in DRAM and colder data in PMEM. I talked this about in LSF/MM, please check this out: https://lwn.net/Articles/787418/ > > > > > > > > Also what happens when PMEM is full? Can the memory migrated to PMEM > > > > be reclaimed (or discarded)? > > > > > > Yep. The "migration path" can be as long as you want, but once the data > > > hits a "terminal node" it will stop getting migrated and normal discard > > > at the end of reclaim happens. > > > > I recalled I had a hallway conversation with Keith about this in > > LSF/MM. We all agree there should be not a cycle. But, IMHO, I don't > > think exporting migration path to userspace (or letting user to define > > migration path) and having multiple migration stops are good ideas in > > general. > > > > >