Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp693417ybe; Thu, 5 Sep 2019 04:35:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqzqU9c6u1atyzahSQnbCd/nuVo/2CDAVnpL3bvuFQaxL3zYPm0no8mgSHwOTTmQ5V+PKJJ3 X-Received: by 2002:a63:2685:: with SMTP id m127mr586310pgm.6.1567683323785; Thu, 05 Sep 2019 04:35:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567683323; cv=none; d=google.com; s=arc-20160816; b=WK1ObZn4fFltD2IXb1PhdFl0n/Qx1NMafSEbaBF+E+wCwtmR+LTZlOtrdV6O/ivHM4 Go3qC9TR1d41bEBYldcmjLXBwgKGq51Iu0V/qJdHe8YjkVECPj93IiW+oeMFJApg1piv qbXIEgCcSVOv5Mc1O00nMd011KNN8SDtM0g+gwVFHgmx44VueieFQAzWsEABq1VtRNWQ 7hyUFRPoaAjdY95MI8FBLyjIBv3a0ugt+TNCIGGCfYUmvPbWRsDzv4en5+Prmx4kF4q+ F027AKVdyG7CWtnhfqrmoqXtx1mi0juaBRyGk1AUDEZeEKY7Sem1o1nXkJAUzLkSRXKt LiZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=BFs5WGf+hgdnhxlu+FRLrCzlGeHhhZUI8jI6KDzg98U=; b=BFASVAypFPZveVqZggP2iC5a8mQ3HUq44y9gZkvCwK62HtX4AuahYCa3Dyqv+4tFLQ rRwWG7HCIkyyHyfXp7o50CXGbTlxrnQlTOTtLlfweOhqmiAk3GcMCXuFLjFVJVFU0FS8 yj3M4kR3wNohQJ8W+qdabozYLfIlg/hC1d6/86y3SqL46kiYtN+oDV+ro5uegTiHKfVf K2q/2Ep1KrY/4ZB/PzKO23+9AXbvhl/9qagE7PnYIf2mcO6+g/VvH3sQrvAynUD/mqUE uhf/lMdpCZMl7pJ0uSxRa3Cuo3KEcYSxdQWlozaGCVSoyL0YAuLod9wMzYoGROG4qOuf jJhw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f8si1555676pgi.191.2019.09.05.04.35.06; Thu, 05 Sep 2019 04:35:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732890AbfIEJAM (ORCPT + 99 others); Thu, 5 Sep 2019 05:00:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:37750 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726231AbfIEJAL (ORCPT ); Thu, 5 Sep 2019 05:00:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 9EA39AE39; Thu, 5 Sep 2019 09:00:09 +0000 (UTC) Date: Thu, 5 Sep 2019 11:00:09 +0200 From: Michal Hocko To: David Rientjes Cc: Linus Torvalds , Andrew Morton , Andrea Arcangeli , Mel Gorman , Vlastimil Babka , "Kirill A. Shutemov" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mike Kravetz Subject: Re: [rfc 3/4] mm, page_alloc: avoid expensive reclaim when compaction may not succeed Message-ID: <20190905090009.GF3838@dhcp22.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Ccing Mike for checking on the hugetlb side of this change] On Wed 04-09-19 12:54:22, David Rientjes wrote: > Memory compaction has a couple significant drawbacks as the allocation > order increases, specifically: > > - isolate_freepages() is responsible for finding free pages to use as > migration targets and is implemented as a linear scan of memory > starting at the end of a zone, > > - failing order-0 watermark checks in memory compaction does not account > for how far below the watermarks the zone actually is: to enable > migration, there must be *some* free memory available. Per the above, > watermarks are not always suffficient if isolate_freepages() cannot > find the free memory but it could require hundreds of MBs of reclaim to > even reach this threshold (read: potentially very expensive reclaim with > no indication compaction can be successful), and > > - if compaction at this order has failed recently so that it does not even > run as a result of deferred compaction, looping through reclaim can often > be pointless. > > For hugepage allocations, these are quite substantial drawbacks because > these are very high order allocations (order-9 on x86) and falling back to > doing reclaim can potentially be *very* expensive without any indication > that compaction would even be successful. > > Reclaim itself is unlikely to free entire pageblocks and certainly no > reliance should be put on it to do so in isolation (recall lumpy reclaim). > This means we should avoid reclaim and simply fail hugepage allocation if > compaction is deferred. > > It is also not helpful to thrash a zone by doing excessive reclaim if > compaction may not be able to access that memory. If order-0 watermarks > fail and the allocation order is sufficiently large, it is likely better > to fail the allocation rather than thrashing the zone. > > Signed-off-by: David Rientjes > --- > mm/page_alloc.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4458,6 +4458,28 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > if (page) > goto got_pg; > > + if (order >= pageblock_order && (gfp_mask & __GFP_IO)) { > + /* > + * If allocating entire pageblock(s) and compaction > + * failed because all zones are below low watermarks > + * or is prohibited because it recently failed at this > + * order, fail immediately. > + * > + * Reclaim is > + * - potentially very expensive because zones are far > + * below their low watermarks or this is part of very > + * bursty high order allocations, > + * - not guaranteed to help because isolate_freepages() > + * may not iterate over freed pages as part of its > + * linear scan, and > + * - unlikely to make entire pageblocks free on its > + * own. > + */ > + if (compact_result == COMPACT_SKIPPED || > + compact_result == COMPACT_DEFERRED) > + goto nopage; > + } > + > /* > * Checks for costly allocations with __GFP_NORETRY, which > * includes THP page fault allocations -- Michal Hocko SUSE Labs