Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753880AbbH3TDE (ORCPT ); Sun, 30 Aug 2015 15:03:04 -0400 Received: from mx2.parallels.com ([199.115.105.18]:35974 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753809AbbH3TCh (ORCPT ); Sun, 30 Aug 2015 15:02:37 -0400 From: Vladimir Davydov To: Andrew Morton CC: Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Michal Hocko , Tejun Heo , , Subject: [PATCH 2/2] mm/slub: do not bypass memcg reclaim for high-order page allocation Date: Sun, 30 Aug 2015 22:02:18 +0300 Message-ID: <077206b884045ae9d82fd603fddde51d2eb630b5.1440960578.git.vdavydov@parallels.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain X-ClientProxiedBy: US-EXCH.sw.swsoft.com (10.255.249.47) To US-EXCH2.sw.swsoft.com (10.255.249.46) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3504 Lines: 93 Commit 6af3142bed1f52 ("mm/slub: don't wait for high-order page allocation") made allocate_slab() try to allocate high order slab pages without __GFP_WAIT in order to avoid invoking reclaim/compaction when we can fall back on low order pages. However, it broke memcg/memory.high logic in case kmem accounting is enabled. The memory.high threshold works as a soft limit: an allocation does not fail if it is breached, but we call direct reclaim to compensate for the excess. Without __GFP_WAIT we cannot invoke reclaimer and therefore we will go on exceeding memory.high more and more until a normal __GFP_WAIT allocation is issued. Since memcg reclaim never triggers compaction, we can pass __GFP_WAIT to memcg_charge_slab() even on high order page allocations w/o any performance impact. So let us fix this problem by excluding __GFP_WAIT only from alloc_pages() while still forwarding it to memcg_charge_slab() if the context allows. Reported-by: Tejun Heo Signed-off-by: Vladimir Davydov --- mm/slub.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index e180f8dcd06d..416a332277cb 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1333,6 +1333,14 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s, if (memcg_charge_slab(s, flags, order)) return NULL; + /* + * Let the initial higher-order allocation fail under memory pressure + * so we fall-back to the minimum order allocation. + */ + if (oo_order(oo) > oo_order(s->min)) + flags = (flags | __GFP_NOWARN | __GFP_NOMEMALLOC) & + ~(__GFP_NOFAIL | __GFP_WAIT); + if (node == NUMA_NO_NODE) page = alloc_pages(flags, order); else @@ -1348,7 +1356,6 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) { struct page *page; struct kmem_cache_order_objects oo = s->oo; - gfp_t alloc_gfp; void *start, *p; int idx, order; @@ -1359,23 +1366,14 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) flags |= s->allocflags; - /* - * Let the initial higher-order allocation fail under memory pressure - * so we fall-back to the minimum order allocation. - */ - alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; - if ((alloc_gfp & __GFP_WAIT) && oo_order(oo) > oo_order(s->min)) - alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_WAIT; - - page = alloc_slab_page(s, alloc_gfp, node, oo); + page = alloc_slab_page(s, flags, node, oo); if (unlikely(!page)) { oo = s->min; - alloc_gfp = flags; /* * Allocation may have failed due to fragmentation. * Try a lower order alloc if possible */ - page = alloc_slab_page(s, alloc_gfp, node, oo); + page = alloc_slab_page(s, flags, node, oo); if (unlikely(!page)) goto out; stat(s, ORDER_FALLBACK); @@ -1385,7 +1383,7 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) !(s->flags & (SLAB_NOTRACK | DEBUG_DEFAULT_FLAGS))) { int pages = 1 << oo_order(oo); - kmemcheck_alloc_shadow(page, oo_order(oo), alloc_gfp, node); + kmemcheck_alloc_shadow(page, oo_order(oo), flags, node); /* * Objects from caches that have a constructor don't get -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/