Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753252AbbHaPSd (ORCPT ); Mon, 31 Aug 2015 11:18:33 -0400 Received: from relay.parallels.com ([195.214.232.42]:36704 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751982AbbHaPSb (ORCPT ); Mon, 31 Aug 2015 11:18:31 -0400 Date: Mon, 31 Aug 2015 18:18:14 +0300 From: Vladimir Davydov To: Tejun Heo CC: Michal Hocko , Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , , Subject: Re: [PATCH 0/2] Fix memcg/memory.high in case kmem accounting is enabled Message-ID: <20150831151814.GC13814@esperanza> References: <20150831132414.GG29723@dhcp22.suse.cz> <20150831134335.GB2271@mtj.duckdns.org> <20150831143007.GA13814@esperanza> <20150831143939.GC2271@mtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20150831143939.GC2271@mtj.duckdns.org> X-ClientProxiedBy: US-EXCH2.sw.swsoft.com (10.255.249.46) To MSK-EXCH1.sw.swsoft.com (10.67.48.55) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2252 Lines: 44 On Mon, Aug 31, 2015 at 10:39:39AM -0400, Tejun Heo wrote: > On Mon, Aug 31, 2015 at 05:30:08PM +0300, Vladimir Davydov wrote: > > slab/slub can issue alloc_pages() any time with any flags they want and > > it won't be accounted to memcg, because kmem is accounted at slab/slub > > layer, not in buddy. > > Hmmm? I meant the eventual calling into try_charge w/ GFP_NOWAIT. > Speculative usage of GFP_NOWAIT is bound to increase and we don't want > to put on extra restrictions from memcg side. We already put restrictions on slab/slub from memcg side, because kmem accounting is a part of slab/slub. They have to cooperate in order to get things working. If slab/slub wants to make a speculative allocation for some reason, it should just put memcg_charge out of this speculative alloc section. This is what this patch set does. We have to be cautious about placing memcg_charge in slab/slub. To understand why, consider SLAB case, which first tries to allocate from all nodes in the order of preference w/o __GFP_WAIT and only if it fails falls back on an allocation from any node w/ __GFP_WAIT. This is its internal algorithm. If we blindly put memcg_charge to alloc_slab method, then, when we are near the memcg limit, we will go over all NUMA nodes in vain, then finally fall back to __GFP_WAIT allocation, which will get a slab from a random node. Not only we do more work than necessary due to walking over all NUMA nodes for nothing, but we also break SLAB internal logic! And you just can't fix it in memcg, because memcg knows nothing about the internal logic of SLAB, how it handles NUMA nodes. SLUB has a different problem. It tries to avoid high-order allocations if there is a risk of invoking costly memory compactor. It has nothing to do with memcg, because memcg does not care if the charge is for a high order page or not. Thanks, Vladimir > For memory.high, > punting to the return path is a pretty stright-forward solution which > should make the problem go away almost entirely. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/