Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935201AbcCKIfC (ORCPT ); Fri, 11 Mar 2016 03:35:02 -0500 Received: from mx2.parallels.com ([199.115.105.18]:42987 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934612AbcCKIew (ORCPT ); Fri, 11 Mar 2016 03:34:52 -0500 Date: Fri, 11 Mar 2016 11:34:40 +0300 From: Vladimir Davydov To: Johannes Weiner CC: Andrew Morton , Michal Hocko , , , , Subject: Re: [PATCH] mm: memcontrol: reclaim when shrinking memory.high below usage Message-ID: <20160311083440.GI1946@esperanza> References: <1457643015-8828-1-git-send-email-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1457643015-8828-1-git-send-email-hannes@cmpxchg.org> X-ClientProxiedBy: US-EXCH.sw.swsoft.com (10.255.249.47) To US-EXCH2.sw.swsoft.com (10.255.249.46) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1359 Lines: 27 On Thu, Mar 10, 2016 at 03:50:13PM -0500, Johannes Weiner wrote: > When setting memory.high below usage, nothing happens until the next > charge comes along, and then it will only reclaim its own charge and > not the now potentially huge excess of the new memory.high. This can > cause groups to stay in excess of their memory.high indefinitely. > > To fix that, when shrinking memory.high, kick off a reclaim cycle that > goes after the delta. I agree that we should reclaim the high excess, but I don't think it's a good idea to do it synchronously. Currently, memory.low and memory.high knobs can be easily used by a single-threaded load manager implemented in userspace, because it doesn't need to care about potential stalls caused by writes to these files. After this change it might happen that a write to memory.high would take long, seconds perhaps, so in order to react quickly to changes in other cgroups, a load manager would have to spawn a thread per each write to memory.high, which would complicate its implementation significantly. Since, in contrast to memory.max, memory.high definition allows cgroup to breach it, I believe it would be better if we spawned an asynchronous reclaim work from the kernel on write to memory.high instead of doing this synchronously. I guess we could reuse mem_cgroup->high_work for that. Thanks, Vladimir