Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp1118235ybe; Wed, 4 Sep 2019 12:56:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqy2npsZxPZ5I5T/7yrcQFs6GeQIeymg7MciTpDMXpSkolDDvxPJSfKBaZ67nHt7BWfqesim X-Received: by 2002:aa7:9303:: with SMTP id 3mr28220816pfj.29.1567626968433; Wed, 04 Sep 2019 12:56:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567626968; cv=none; d=google.com; s=arc-20160816; b=X6ei7/PlbAckJ8/99NiyD9bMzwEOW4i/OE0fJR83+Gnvf0q6u08sLmqwnNFj65qOE/ 87eLxoKxArUttTI+N+RNUTjBc5RvZvuZUn0i1om5QQ+7Yv359sg9KfbK/Lx/8oxtxyXP hN5ioy9Y9do04lcFlYBiL5eQ/dk7hMDZaXYondMzGvRPnjhm/emxr0qML9BHaJKq/add c8kjQDX+IHyRr6Y2jyr+4IM9bFI5CfNx0EitR6AdJFzg1Y/pVc9ABp9Z4VQZYufX3kHQ puquQElj+biEE7WathUr4elCdKwhshDqJaV90yUxd6qwm6p4SJjsOmfzFjy2ypUGzCi7 9rcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :subject:cc:to:from:date:dkim-signature; bh=B3LdhXMejYHV96KWaOzQTiZ+Z/f85k5LioNA2CuFGwA=; b=VXfsu1tVTVbZ8OSiIH9s/1ZYOjEJDk6yf7LhTK8fDPNQbAzulrYoVfMyw6+W1fUWyW +N0ScVrOeHaA0zjGV0QpPswgqvBqFKX7To7YOmUWrm1LuAunflgyE+3CQEsaI3WXrpv9 QrbzKb3pLieHqXyk5MN99emMaTssI+qr6IDW/BDZPBOan9vQx5AKE0JJvigxCjiUmigm irjEEHyvIDqGAmeV7bGhdK++JYMbsmzJMGP40/pSL6/JJYBOIyEwBsdDP7efOuhVewC6 zsmGv2Nyy7kQ6amUSzi8emED9Vi4csCeji21h3u7jP0nrFuSLwO4Y7i0hLi/pubSK+1e HAAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=or+awGJW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a7si10730285pfo.117.2019.09.04.12.55.52; Wed, 04 Sep 2019 12:56:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=or+awGJW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730424AbfIDTy1 (ORCPT + 99 others); Wed, 4 Sep 2019 15:54:27 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:46093 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730376AbfIDTyZ (ORCPT ); Wed, 4 Sep 2019 15:54:25 -0400 Received: by mail-pf1-f194.google.com with SMTP id q5so6114862pfg.13 for ; Wed, 04 Sep 2019 12:54:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=B3LdhXMejYHV96KWaOzQTiZ+Z/f85k5LioNA2CuFGwA=; b=or+awGJWRmLily/ifY9QIoKC+CGZd1JKGA2tLCtcVC/VmH4wwudR9egRn8R/ycC2MU QkEVmujb1Mk/MjJsZ531+o2qQ7k16VC23z6324ylW7RJXfjvi0O5g2i8K9SXerUQbgNO Q6NSmTB/AoABUH0mdkvgh56M3vvjfBldKCTDhb6r8Ry9sSh/SoJW+jWVTaFD3s8cE7fe ohOxMT9Cth1jYnIPeErV+9BC+nAc3qlv9HzDPUcw3x/Wwi54C0UG/xZGaIFTFZBuEquM B3NRWQjdcMOUp7g/Fb0s6sxmlmLcFoSw/s92qJLDjewgM3DS2Ds0XUVoYF5VnX0+1+IS cONA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=B3LdhXMejYHV96KWaOzQTiZ+Z/f85k5LioNA2CuFGwA=; b=SUg/UyzsprHZ75fQiGPJ2OX5/iUuMHE1kHqQnEWfzkEU1ucQMV8AGF29cELH4musNC C6Fro5FRXVNmBtm15hwPHfkXTQFOjqgWD8BYOVwq6++crDcWq1r4c4DNBN6857/BlU2B ytw+kg4vNRJt+oN1+eMpslDADeyK8r/zodOHXGFQXkkWmfuxvSQ9fxyDQQ21y4Ln0WDb wu8gTyP3uOGfXInS77STQgSbYS/80FO91CrfcIq31bygEQMCTCeKw2ZtzYmzhAAtOPpI UqhCBy+gCXqpdYRwHwfGzKLD8T2BzOPczT1RkUdLkMsPt0cBLqai43nwLehVcxGWlmIu RxeQ== X-Gm-Message-State: APjAAAVlnyzXFk4Vc+NYe2tnVMh0gbrKSwbFufyQSf3yhF0tf3dVa5oL UOhxu7AlX0zJEKrs8v+dpu0M2A== X-Received: by 2002:a62:cd45:: with SMTP id o66mr49017925pfg.112.1567626864234; Wed, 04 Sep 2019 12:54:24 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id c2sm3173938pjs.13.2019.09.04.12.54.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Sep 2019 12:54:23 -0700 (PDT) Date: Wed, 4 Sep 2019 12:54:22 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Linus Torvalds , Andrew Morton cc: Andrea Arcangeli , Michal Hocko , Mel Gorman , Vlastimil Babka , "Kirill A. Shutemov" , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed Message-ID: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Memory compaction has a couple significant drawbacks as the allocation order increases, specifically: - isolate_freepages() is responsible for finding free pages to use as migration targets and is implemented as a linear scan of memory starting at the end of a zone, - failing order-0 watermark checks in memory compaction does not account for how far below the watermarks the zone actually is: to enable migration, there must be *some* free memory available. Per the above, watermarks are not always suffficient if isolate_freepages() cannot find the free memory but it could require hundreds of MBs of reclaim to even reach this threshold (read: potentially very expensive reclaim with no indication compaction can be successful), and - if compaction at this order has failed recently so that it does not even run as a result of deferred compaction, looping through reclaim can often be pointless. For hugepage allocations, these are quite substantial drawbacks because these are very high order allocations (order-9 on x86) and falling back to doing reclaim can potentially be *very* expensive without any indication that compaction would even be successful. Reclaim itself is unlikely to free entire pageblocks and certainly no reliance should be put on it to do so in isolation (recall lumpy reclaim). This means we should avoid reclaim and simply fail hugepage allocation if compaction is deferred. It is also not helpful to thrash a zone by doing excessive reclaim if compaction may not be able to access that memory. If order-0 watermarks fail and the allocation order is sufficiently large, it is likely better to fail the allocation rather than thrashing the zone. Signed-off-by: David Rientjes --- mm/page_alloc.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4458,6 +4458,28 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (page) goto got_pg; + if (order >= pageblock_order && (gfp_mask & __GFP_IO)) { + /* + * If allocating entire pageblock(s) and compaction + * failed because all zones are below low watermarks + * or is prohibited because it recently failed at this + * order, fail immediately. + * + * Reclaim is + * - potentially very expensive because zones are far + * below their low watermarks or this is part of very + * bursty high order allocations, + * - not guaranteed to help because isolate_freepages() + * may not iterate over freed pages as part of its + * linear scan, and + * - unlikely to make entire pageblocks free on its + * own. + */ + if (compact_result == COMPACT_SKIPPED || + compact_result == COMPACT_DEFERRED) + goto nopage; + } + /* * Checks for costly allocations with __GFP_NORETRY, which * includes THP page fault allocations