Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp1381710ybk; Thu, 21 May 2020 05:42:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxVTgkyzwO46gTbWoBzNvlaco3Dget+RWatlqgyFszjpstzfT3IHlodXNgLTiC4EYgUT2UC X-Received: by 2002:a05:6402:22a6:: with SMTP id cx6mr7652537edb.76.1590064935152; Thu, 21 May 2020 05:42:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590064935; cv=none; d=google.com; s=arc-20160816; b=MaQ/ZW6vHGiTgwGmu6k7GnY7tI2cLxxqCzqxGX1jtgQEOfQHi7VKvWw/6v4V+ED+sf 15GFcJ2uPCZlMaDH6rb4S85R07rQKviEmmcJ0wMRCKad+o9fr009vmySizNijXl54BnX 8SCbkAGR3lrvqe+4HRxJAej5Us24fMjVjql29YuPjxqLZ4rLdrRo1f6xrydPfWywbmCb jSq2VbcCHkXepHfV1sTUkkNk/8zuvMjwhPN+eR9RacUt+zpkr1ohn2fQUnMMjIznHLEw QfoixbAMm+htq7w/fMGWbL7hZ+4pV7jXGiTg2FJcxqZuMH+x6EwAxExXENjwK1JG6Qs5 UGAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=2vhA5VZC0nBvo7ys++OSqz3+TT1aqRd8PraCHE7WA7k=; b=VhnLGHdz24o8D1t3QDpF2mBbjDgps96kS5T6vcqPD6YrFRLGH0FN9+7hggm1qSfEEA FBRmFxCOiup33fKUqtGepI7fPkyiBo6FEUMtrJJQVP65V0Erk0iyYFYUp7Acywr3vUOT 4aTBEsxOnw6BOjIlO4Ao8rpsdsn8IJzDN3hmcK1U/iKfgJsRDMDCtWCGVUCf4APK+eA2 u65Eh9lwIiYTnRSFUA4M150Vt18lyuvbLYtOhT+z7xGVP+I8fb6YvGLICSCu/YIs9xg0 w8NmFbQ3rp4qMlZU9apEff+1G3L1vdzgAIoCjN+jnjNIX/jPPLgWyqtfGYJPYtkig8Cs qoJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ec9si3227849ejb.160.2020.05.21.05.41.50; Thu, 21 May 2020 05:42:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729317AbgEUMhr (ORCPT + 99 others); Thu, 21 May 2020 08:37:47 -0400 Received: from mail-ej1-f66.google.com ([209.85.218.66]:44425 "EHLO mail-ej1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728186AbgEUMhq (ORCPT ); Thu, 21 May 2020 08:37:46 -0400 Received: by mail-ej1-f66.google.com with SMTP id x20so8566421ejb.11; Thu, 21 May 2020 05:37:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2vhA5VZC0nBvo7ys++OSqz3+TT1aqRd8PraCHE7WA7k=; b=U6/k5ElFSdZME2o5n7me/vMJMF0YoXJvrpKyg3z2aPljSwtldN699JLNHL1FZVpg+R UsOZlQu1c0lL24UDvOEesP184yfPNmSlqhrM2kvewi3cHVWYmh+ky5de4pUus9ZwV1Mu bQN3pvfxfERnfuGfwGJIOGG725jSLCys8EGm8Y1iYlkCWLzeh8uDjji0JRfCRQLQIf8m yG+i3eINvP1f9fsu1rUvOLheEt0ZLz5K58a6VuoMWh6LqHqTOrCr2HxVpBUptOSJj2os jDEexTAcCjK9HuCrJMJVsJGDwG1ayPCh9DrfcWeIkAFkVj3zlqpNRoOtbuB6/ADK4eQI eQZg== X-Gm-Message-State: AOAM531TI9xQkKUzwuRTulJpjUMduKjNRYI7kbULCKfATVe8k+wqub/t vUJeFl/J+JWGw0S5gphV174= X-Received: by 2002:a17:906:9518:: with SMTP id u24mr3393542ejx.137.1590064664511; Thu, 21 May 2020 05:37:44 -0700 (PDT) Received: from localhost (ip-37-188-180-112.eurotel.cz. [37.188.180.112]) by smtp.gmail.com with ESMTPSA id lx8sm4721542ejb.75.2020.05.21.05.37.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 May 2020 05:37:43 -0700 (PDT) Date: Thu, 21 May 2020 14:37:42 +0200 From: Michal Hocko To: Chris Down Cc: Andrew Morton , Johannes Weiner , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm, memcg: reclaim more aggressively before high allocator throttling Message-ID: <20200521123742.GO6462@dhcp22.suse.cz> References: <20200520143712.GA749486@chrisdown.name> <20200520160756.GE6462@dhcp22.suse.cz> <20200520202650.GB558281@chrisdown.name> <20200521071929.GH6462@dhcp22.suse.cz> <20200521112711.GA990580@chrisdown.name> <20200521120455.GM6462@dhcp22.suse.cz> <20200521122327.GB990580@chrisdown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200521122327.GB990580@chrisdown.name> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 21-05-20 13:23:27, Chris Down wrote: > (I'll leave the dirty throttling discussion to Johannes, because I'm not so > familiar with that code or its history.) > > Michal Hocko writes: > > > > The main problem I see with that approach is that the loop could easily > > > > lead to reclaim unfairness when a heavy producer which doesn't leave the > > > > kernel (e.g. a large read/write call) can keep a different task doing > > > > all the reclaim work. The loop is effectivelly unbound when there is a > > > > reclaim progress and so the return to the userspace is by no means > > > > proportional to the requested memory/charge. > > > > > > It's not unbound when there is reclaim progress, it stops when we are within > > > the memory.high throttling grace period. Right after reclaim, we check if > > > penalty_jiffies is less than 10ms, and abort and further reclaim or > > > allocator throttling: > > > > Just imagine that you have parallel producers increasing the high limit > > excess while somebody reclaims those. Sure in practice the loop will be > > bounded but the reclaimer might perform much more work on behalf of > > other tasks. > > A cgroup is a unit and breaking it down into "reclaim fairness" for > individual tasks like this seems suspect to me. For example, if one task in > a cgroup is leaking unreclaimable memory like crazy, everyone in that cgroup > is going to be penalised by allocator throttling as a result, even if they > aren't "responsible" for that reclaim. You are right, but that doesn't mean that it is desirable that some tasks would be throttled unexpectedly too long because of the other's activity. We already have that behavior for the direct reclaim and I have to say I really hate it and had to spend a lot of time debugging latency issues. Our excuse has been that the system is struggling at that time so any quality of service is simply out of picture. I do not think the same argument can be applied to memory.high which doesn't really represent a mark when the memcg is struggling so hard to drop any signs of fairness on the floor. > So the options here are as follows when a cgroup is over memory.high and a > single reclaim isn't enough: > > 1. Decline further reclaim. Instead, throttle for up to 2 seconds. > 2. Keep on reclaiming. Only throttle if we can't get back under memory.high. > > The outcome of your suggestion to decline further reclaim is case #1, which > is significantly more practically "unfair" to that task. Throttling is > extremely disruptive to tasks and should be a last resort when we've > exhausted all other practical options. It shouldn't be something you get > just because you didn't try to reclaim hard enough. I believe I have asked in other email in this thread. Could you explain why enforcint the requested target (memcg_nr_pages_over_high) is insufficient for the problem you are dealing with? Because that would make sense for large targets to me while it would keep relatively reasonable semantic of the throttling - aka proportional to the memory demand rather than the excess. -- Michal Hocko SUSE Labs