Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp2959538ybg; Thu, 24 Oct 2019 18:43:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqy3zUz//VZyRbZhu748w7IzrIMwgQTFupH6LywwW3Wu/XMRA8hXWqvyq31VpAV3BCveQ1RH X-Received: by 2002:aa7:c959:: with SMTP id h25mr1277016edt.216.1571967819712; Thu, 24 Oct 2019 18:43:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571967819; cv=none; d=google.com; s=arc-20160816; b=ltAGoWF3Ng0dw8ggdRDzetw61N1gONw2vAMwopL31hhskQuKhRGduGW9nIfkwfSyAj Ai03TWJystBjeJ8VAj5rgL1nxWbJQh5T8ipvY3HSoT8mvQIXjV4sQ1WSkbnAKVSex6ke u16RvhDxHw4z2vqvGHELPMdKOLoonUUAIo/RxqPMkKSZFkR7RoENvr+pPtxDozgK8tDM oiWXvgtznNWpBg5o43MrlsXDrmbNFLY9OkNIAx6dkJFj4hMmEiAi7C7lOoSfvQaGiAMw oBJnZ+Xv8TIsJUdR/D5Th+55Ux0OVvSIBuEyv25P5vlIPcfQw9oxZWYqAIwO5BeK3h1V Nj/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=3FlUM7woRMuazqe4pVyw82XUw7WWpGmc7C8mgS6HOXM=; b=tzGy0zFfMxJzEAdGUIf5OQwUA05FJVBMrBASYfBnDdd2iyU9WHmWXOTkMeA5nqyyW2 HMyzABttU1pmyMAZGXsz6IG69dz+nYM4Cr1Xj2sCPoJP+tFWmdTG5I3fQdle4+H9Xq+i 6jWpUFfzVyE2p3GMldelg8UAV/pOkrysKXt+5hw9m8iSlDrej1RZF2DdD/jYXElmk5+R UqDsme2eiDVVsE1dXpCQoDrFRaFl7pcroJPRO4BOC0kMcKeMn1qi8l9lLSt36b7wkl07 RpcQZSXfkePAH46NHlakzQchU2+dRfe/aDs/zO3rPxKbqHdhgA3KvilZpQ+Ac2OHNmf1 EmsA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 2si329486ejq.243.2019.10.24.18.43.02; Thu, 24 Oct 2019 18:43:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438157AbfJXIYt (ORCPT + 99 others); Thu, 24 Oct 2019 04:24:49 -0400 Received: from mx2.suse.de ([195.135.220.15]:32832 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725947AbfJXIYt (ORCPT ); Thu, 24 Oct 2019 04:24:49 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2F6C6B2AC; Thu, 24 Oct 2019 08:24:41 +0000 (UTC) Date: Thu, 24 Oct 2019 10:24:40 +0200 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH 2/2] mm: memcontrol: try harder to set a new memory.high Message-ID: <20191024082440.GT17610@dhcp22.suse.cz> References: <20191022201518.341216-1-hannes@cmpxchg.org> <20191022201518.341216-2-hannes@cmpxchg.org> <20191023065949.GD754@dhcp22.suse.cz> <20191023175724.GD366316@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191023175724.GD366316@cmpxchg.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 23-10-19 13:57:24, Johannes Weiner wrote: > On Wed, Oct 23, 2019 at 08:59:49AM +0200, Michal Hocko wrote: > > On Tue 22-10-19 16:15:18, Johannes Weiner wrote: > > > Setting a memory.high limit below the usage makes almost no effort to > > > shrink the cgroup to the new target size. > > > > > > While memory.high is a "soft" limit that isn't supposed to cause OOM > > > situations, we should still try harder to meet a user request through > > > persistent reclaim. > > > > > > For example, after setting a 10M memory.high on an 800M cgroup full of > > > file cache, the usage shrinks to about 350M: > > > > > > + cat /cgroup/workingset/memory.current > > > 841568256 > > > + echo 10M > > > + cat /cgroup/workingset/memory.current > > > 355729408 > > > > > > This isn't exactly what the user would expect to happen. Setting the > > > value a few more times eventually whittles the usage down to what we > > > are asking for: > > > > > > + echo 10M > > > + cat /cgroup/workingset/memory.current > > > 104181760 > > > + echo 10M > > > + cat /cgroup/workingset/memory.current > > > 31801344 > > > + echo 10M > > > + cat /cgroup/workingset/memory.current > > > 10440704 > > > > > > To improve this, add reclaim retry loops to the memory.high write() > > > callback, similar to what we do for memory.max, to make a reasonable > > > effort that the usage meets the requested size after the call returns. > > > > That suggests that the reclaim couldn't meet the given reclaim target > > but later attempts just made it through. Is this due to amount of dirty > > pages or what prevented the reclaim to do its job? > > > > While I am not against the reclaim retry loop I would like to understand > > the underlying issue. Because if this is really about dirty memory then > > we should probably be more pro-active in flushing it. Otherwise the > > retry might not be of any help. > > All the pages in my test case are clean cache. But they are active, > and they need to go through the inactive list before reclaiming. The > inactive list size is designed to pre-age just enough pages for > regular reclaim targets, i.e. pages in the SWAP_CLUSTER_MAX ballpark, > In this case, the reclaim goal for a single invocation is 790M and the > inactive list is a small funnel to put all that through, and we need > several iterations to accomplish that. Thanks for the clarification. > But 790M is not a reasonable reclaim target to ask of a single reclaim > invocation. And it wouldn't be reasonable to optimize the reclaim code > for it. So asking for the full size but retrying is not a bad choice > here: we express our intent, and benefit if reclaim becomes better at > handling larger requests, but we also acknowledge that some of the > deltas we can encounter in memory_high_write() are just too > ridiculously big for a single reclaim invocation to manage. Yes that makes sense and I think it should be a part of the changelog. Acked-by: Michal Hocko Thanks! -- Michal Hocko SUSE Labs