Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1185532ybt; Wed, 1 Jul 2020 22:05:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwHwmhiGM8+lkyegs/HiW5uOpdgWRjtUTNW4vna+hHYGT8HvewvbBwcvhFZOhj6h4oFT1Eo X-Received: by 2002:a50:e047:: with SMTP id g7mr32002661edl.290.1593666316521; Wed, 01 Jul 2020 22:05:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593666316; cv=none; d=google.com; s=arc-20160816; b=psNOA43+szOguRS2NaHTEZSWDLskIDbcYTYKuMZ/HX+XVJxpOr+VIWaOm8Im7G9Ij3 6GPmXy4/exIXd5sHOo//Jb5QVXsYsNRbZ23iWJ8dXyArtgy6kzVF2hub13VOlAFO77ym Uk+OPImVVNh0pFNCATTTUfhpS9nluUAsTeVJJmpPv8Ulcm8TKG4rKq3W0aTIhNVsd4Gq spPb4C/ST3uDgY+7TWHeE7spc+Mio7DwWT3aJH8kjm/dSxAnH/gQI9R5sDkvVIXhNsIh JqgT9qs2NxUkhIca5U9dqTltodaiekoQfQLhThki5PnUSIOn7rmfuv1ILe39XFlesb7e ffBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from:ironport-sdr :ironport-sdr; bh=QV3Q7TiyJtz+fWsGgMclDlRPEea/VO9SEA7dXsLwsv8=; b=s9cah7BQ1D65IZh0ZbsEM1Isa54SkOh01wnmV+cpOH6+j1JI2OAKyWL7/yyVvVmfQx qQrZmSM+KmP5lbpZ40LCUrRwWmkzePUNdUlsxpeK2bClQZH97WkOo00jNsluwDvm/lWx oOZ7mwhcxShV1+Vh7yNxSgqLHYtifINUetkUjd6/i7NyxXayyTj6BoPXq5NJJgvqszTd 8PKv98oWesvt+GYet0KJ2XGX7lpYVGMt/C2FijAg+hF/oV4rkLVBJBids5TLkXR5yOWW VSEvEXGWAjsqAW/HOG22X0+cfL5979d21j3cOICPr/S7vi42Z5LOCSZvPIkI6C9cgxlG nMLw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qc14si2296888ejb.611.2020.07.01.22.04.53; Wed, 01 Jul 2020 22:05:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726029AbgGBFCJ (ORCPT + 99 others); Thu, 2 Jul 2020 01:02:09 -0400 Received: from mga18.intel.com ([134.134.136.126]:56821 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725994AbgGBFCJ (ORCPT ); Thu, 2 Jul 2020 01:02:09 -0400 IronPort-SDR: f9cFO4cXLqCC4JFFiVoIQ5wBcoSKpbIeo3fgn6msJD2F6q1txIA0Ugl1J88Ux5jBoDgup5K9Eu ijxZkXRSEzBQ== X-IronPort-AV: E=McAfee;i="6000,8403,9669"; a="134248177" X-IronPort-AV: E=Sophos;i="5.75,302,1589266800"; d="scan'208";a="134248177" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jul 2020 22:02:08 -0700 IronPort-SDR: MyyxQQ8g6JkMfyH8jmNxP9XjOz5K/O3Nte8eq9pu15YuiVO558HbtrUYlrwT9E+2rkAfKAFid9 wAdyKi1O0cqw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,302,1589266800"; d="scan'208";a="266975060" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.23]) by fmsmga008.fm.intel.com with ESMTP; 01 Jul 2020 22:02:06 -0700 From: "Huang\, Ying" To: David Rientjes Cc: Dave Hansen , Dave Hansen , , , , , Subject: Re: [RFC][PATCH 3/8] mm/vmscan: Attempt to migrate page in lieu of discard References: <20200629234503.749E5340@viggo.jf.intel.com> <20200629234509.8F89C4EF@viggo.jf.intel.com> Date: Thu, 02 Jul 2020 13:02:03 +0800 In-Reply-To: (David Rientjes's message of "Wed, 1 Jul 2020 12:25:08 -0700") Message-ID: <87mu4ijyr8.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David Rientjes writes: > On Wed, 1 Jul 2020, Dave Hansen wrote: > >> > Could this cause us to break a user's mbind() or allow a user to >> > circumvent their cpuset.mems? >> >> In its current form, yes. >> >> My current rationale for this is that while it's not as deferential as >> it can be to the user/kernel ABI contract, it's good *overall* behavior. >> The auto-migration only kicks in when the data is about to go away. So >> while the user's data might be slower than they like, it is *WAY* faster >> than they deserve because it should be off on the disk. >> > > It's outside the scope of this patchset, but eventually there will be a > promotion path that I think requires a strict 1:1 relationship between > DRAM and PMEM nodes because otherwise mbind(), set_mempolicy(), and > cpuset.mems become ineffective for nodes facing memory pressure. I have posted an patchset for AutoNUMA based promotion support, https://lore.kernel.org/lkml/20200218082634.1596727-1-ying.huang@intel.com/ Where, the page is promoted upon NUMA hint page fault. So all memory policy (mbind(), set_mempolicy(), and cpuset.mems) are available. We can refuse promoting the page to the DRAM nodes that are not allowed by any memory policy. So, 1:1 relationship isn't necessary for promotion. > For the purposes of this patchset, agreed that DRAM -> PMEM -> swap makes > perfect sense. Theoretically, I think you could have DRAM N0 and N1 and > then a single PMEM N2 and this N2 can be the terminal node for both N0 and > N1. On promotion, I think we need to rely on something stronger than > autonuma to decide which DRAM node to promote to: specifically any user > policy put into effect (memory tiering or autonuma shouldn't be allowed to > subvert these user policies). > > As others have mentioned, we lose the allocation or process context at the > time of demotion or promotion As above, we have process context at time of promotion. > and any workaround for that requires some > hacks, such as mapping the page to cpuset (what is the right solution for > shared pages?) or adding NUMA locality handling to memcg. It sounds natural to me to add NUMA nodes restriction to memcg. Best Regards, Huang, Ying