Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp1185984pxb; Thu, 15 Apr 2021 16:46:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxQwoNSLZrZgx9/H9SilAqpP4GBbsHezGc+hjGpOX4iR2M0j54OgIRITRao4z0AAgAWFdpM X-Received: by 2002:a17:90b:812:: with SMTP id bk18mr6666508pjb.145.1618530389657; Thu, 15 Apr 2021 16:46:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618530389; cv=none; d=google.com; s=arc-20160816; b=nLS7LNaUI9YGrwd+/a9wfJ5pZicDP13CyA1uLaY090oKg8xlfCW5Ye2YcxgC11uzPv E/nNRXrGxpfqg+FZaeIZVn6t+EcIp94lk+roucGfRgMNWFASW3EDskcZgTtvsee9Yb5d L+vmSL7JGRbDHP+25U2kCknfYUqMwgYllivMHaCxLUkjDZUnqJqy0IcTYcgBY+Kndww9 83J13EtnHRaEDQ4ELmpThlcOXVl+RXtxScMMJPE0Y8nkDOSAYx/SX893UkRMeO3bNdrV k0ol2TzXE4MvxcDOvT/QcZrDqonO281ELJcBHWPoFTLhMc6CrrjTyailxpLuUje0rNGf xKOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=REMCzlNgiE2zwwbBzmi6PWwAwGptHaF+s0h/jh0BZKY=; b=sjMM6YAd0spIB4MqDZ9C/sAqePTdnawELYTIt15RRQfZdOTCLlah+5E+6sVjHeidoy 8zXwrvHdUv3f/ngGuJeeRB1PmzJpaVcjaY/uJv3sGv4mdJ5fXCEyMlUVS38NEF7HXiis nnuyVvjjt6usMd4ttoipc9W9CzDbdKnOvOLObSIw+iQkAPcjP029ijA58gxLxIXA7wUn sQ3MpJ9M6MNkzQFlpvU6VuYE2gjIuqyBJsQ1QtCAnVL8Z72X28qYTPQR3XOKV+PNpdwa kBHg3hX4HgQPOd4DLR4W567HBj7BEoDNzPz+jy0F0/plc6FEjoTmX50K8sCZmpYD6QYw tWOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j16si5450622pfi.107.2021.04.15.16.46.15; Thu, 15 Apr 2021 16:46:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235057AbhDOWcO (ORCPT + 99 others); Thu, 15 Apr 2021 18:32:14 -0400 Received: from mga18.intel.com ([134.134.136.126]:26527 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234764AbhDOWcM (ORCPT ); Thu, 15 Apr 2021 18:32:12 -0400 IronPort-SDR: Z545+vNWPPckl0KnvurPgzhFitYbQBbnlKvrR1ZgK+WD+aKRW2ue6G4n+fH7i0xgGIo35z1ZPx hWRGx7AKljkw== X-IronPort-AV: E=McAfee;i="6200,9189,9955"; a="182451049" X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="182451049" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2021 15:31:48 -0700 IronPort-SDR: Vz5L/aqOpzPiervUyJdJvGmElFJ3yi7owML1W7i9kVQSa51SUX+IF0Ae+DeYJ+ot/rKL0yKwgo WSqa4xn66c/g== X-IronPort-AV: E=Sophos;i="5.82,226,1613462400"; d="scan'208";a="453108163" Received: from schen9-mobl.amr.corp.intel.com ([10.209.21.67]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Apr 2021 15:31:46 -0700 Subject: Re: [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory To: Michal Hocko , Shakeel Butt Cc: Yang Shi , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Linux MM , Cgroups , LKML References: From: Tim Chen Message-ID: <4a864946-a316-3d9c-8780-64c6281276d1@linux.intel.com> Date: Thu, 15 Apr 2021 15:31:46 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/9/21 12:24 AM, Michal Hocko wrote: > On Thu 08-04-21 13:29:08, Shakeel Butt wrote: >> On Thu, Apr 8, 2021 at 11:01 AM Yang Shi wrote: > [...] >>> The low priority jobs should be able to be restricted by cpuset, for >>> example, just keep them on second tier memory nodes. Then all the >>> above problems are gone. > > Yes, if the aim is to isolate some users from certain numa node then > cpuset is a good fit but as Shakeel says this is very likely not what > this work is aiming for. > >> Yes that's an extreme way to overcome the issue but we can do less >> extreme by just (hard) limiting the top tier usage of low priority >> jobs. > > Per numa node high/hard limit would help with a more fine grained control. > The configuration would be tricky though. All low priority memcgs would > have to be carefully configured to leave enough for your important > processes. That includes also memory which is not accounted to any > memcg. > The behavior of those limits would be quite tricky for OOM situations > as well due to a lack of NUMA aware oom killer. > Another downside of putting limits on individual NUMA node is it would limit flexibility. For example two memory nodes are similar enough in performance, that you really only care about a cgroup not using more than a threshold of the combined capacity from the two memory nodes. But when you put a hard limit on NUMA node, then you are tied down to a fix allocation partition for each node. Perhaps there are some kernel resources that are pre-allocated primarily from one node. A cgroup may bump into the limit on the node and failed the allocation, even when it has a lot of slack in the other node. This makes getting the configuration right trickier. There are some differences in opinion currently on whether grouping memory nodes into tiers, and putting limit on using them by cgroup is a desirable. Many people want the management constraint placed at individual NUMA node for each cgroup, instead of at the tier level. Will appreciate feedbacks from folks who have insights on how such NUMA based control interface will work, so we at least agree here in order to move forward. Tim