Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp4558854imm; Mon, 15 Oct 2018 17:40:05 -0700 (PDT) X-Google-Smtp-Source: ACcGV60Q/93XHfxvKoNFPM2G6K9spnZY5sFegwKWo3VgSXpOiWuBQd91CP5gn+S0VBBJFbJEHSjW X-Received: by 2002:a63:5b14:: with SMTP id p20-v6mr17973349pgb.56.1539650405104; Mon, 15 Oct 2018 17:40:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539650405; cv=none; d=google.com; s=arc-20160816; b=f/GKjUXIQ/Vb5xxyJ239tdZds6tiUWqElrBzFk8n4SGkBLgadjZRo7rHbASQQICakb DE1cynvLpHYNcOwbKCfeZcISwXJguJ7Y13aW4o+gia5GtmFwhS2C4vrahKvW/57Ovtyg yAyW5ksNM4PuwBwX0VoQDwrNSkPJsMOsNImjWJNlRLsn48OXJalyP6eF3ij3FRRt9+j6 nsiLuvhekzn6EFIC5122LfNzzcdCZjwHLsBcKlbzzdchzM2t+HtU1H+hqYEhwfk5Qu0r XT/ZWk+bQvT5le2nkhjNCp/okMjwhbdtLWj6LosVneZzG8RtvUxzx4TgN7xsQ+6SeK3x iG2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=WSosgb27W9LPFM4xLLeVVvSzQWYl+Gyt1ZgVA1iSiqk=; b=rqN6NIFY3nF9z4FxOh9F/CvHGOzwd5gJO8Md8yVfDhEO6eLe3YDIYMvZ6/oIfk0tJY 1oEJo+PPt7ltJ7+JOVh2d5d+naCUDxZEYsolUON55WQx79C2JpCWzt3MyspUNNng2pfr 6FcryWZWeVWqEQPg5lS7mGlv2lXn30f4k9vdwhPKQup8lloLmQlNwMEXl1jyqpIbR831 ziogVfz0NuP6kD8ej2TWFsOBliwLsUL9u0Cd+y9CecsNHO8oScFNtoeapzvqFBgV2rOT 1HN4pxa10mLjGvOoYSvtfW/fgdvQzKqAOxz3X67ZqOqVbR+wwo7O8+/S5UvIjC9bjdEO xP8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=jgRZjTOs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r25-v6si12731196pfk.83.2018.10.15.17.39.49; Mon, 15 Oct 2018 17:40:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=jgRZjTOs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726959AbeJPI1K (ORCPT + 99 others); Tue, 16 Oct 2018 04:27:10 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:43531 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726337AbeJPI1K (ORCPT ); Tue, 16 Oct 2018 04:27:10 -0400 Received: by mail-pl1-f196.google.com with SMTP id 30-v6so10078834plb.10 for ; Mon, 15 Oct 2018 17:39:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=WSosgb27W9LPFM4xLLeVVvSzQWYl+Gyt1ZgVA1iSiqk=; b=jgRZjTOsIxmgl88OpRzctVnY69AcZxwicXy1JZwMo2JQl9/xjuuh/snZD6/LJ2oT2H ZfV9RAMGVxj5yWmvkuW+R1I0JYu73os2+5k85qkfpfNMvsFHWiIr30owg0SnISz2+9kD 54PMjZEBp+aeBwrDpOYS4wOJ4PrW71X9fey1ifeoRM1Fxmk82REknZzBlQ9HpFVAKlCm 0i5aWPs8rZVgchkFx3orJcCJKPAYz+fopA7tT7eIEG80PQzKgX3NS4m5oMeaLa5waUse vHXGo4zj+45klf6pXrlybho4h/sYaQY1uzE77NGm5qptJqARLTIr49x2U8mrcicMkGSl CqPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=WSosgb27W9LPFM4xLLeVVvSzQWYl+Gyt1ZgVA1iSiqk=; b=RuMyGHEycuO+tnMQd1QF7BZiaO5M8SPW2VXaUHidocuUOBHLmcdWygvz9TctvY2ELf 4B7Ur+ghzB+Cx5XxGn7Vg0STo8XAGFpbgidOBnI0t31R733m9fOssZEmqFQzYOXcfEg1 YxhiMDv9ghxf3qgBlh74TOZve8x85JHp37GKuV+DwYvAI5uDHJLXmXogrtX9bQP5jVb3 5lQzbuoRhPi2MJsgH0FXO/QjExJsTBCKMJJfbEUjRnSNV8dvy+IVHJoDvY3SozlZQ99X /3h6Gia7xtGCAeA4v+iwismIZs1U9MvgaLcJnVSI5Ji2IKd7kakHHIBdjoJeVwsuKE/R GwJQ== X-Gm-Message-State: ABuFfoiXhSX0/V81XUdpQ/zPY4Q2bfMgY6JwqP6hYxVNJod9GyQS7QjH kEYjP8QcyVBml1Y7Rz04TZYj6g== X-Received: by 2002:a17:902:744c:: with SMTP id e12-v6mr19334875plt.186.1539650366535; Mon, 15 Oct 2018 17:39:26 -0700 (PDT) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id c79-v6sm18278130pfb.147.2018.10.15.17.39.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 15 Oct 2018 17:39:25 -0700 (PDT) Date: Mon, 15 Oct 2018 17:39:24 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Christopher Lameter cc: Andrew Morton , Pekka Enberg , Joonsoo Kim , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [patch] mm, slab: avoid high-order slab pages when it does not reduce waste In-Reply-To: <0100016679e3c96f-c78df4e2-9ab8-48db-8796-271c4b439f16-000000@email.amazonses.com> Message-ID: References: <20181012151341.286cd91321cdda9b6bde4de9@linux-foundation.org> <0100016679e3c96f-c78df4e2-9ab8-48db-8796-271c4b439f16-000000@email.amazonses.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 15 Oct 2018, Christopher Lameter wrote: > > > If the amount of waste is the same at higher cachep->gfporder values, > > > there is no significant benefit to allocating higher order memory. There > > > will be fewer calls to the page allocator, but each call will require > > > zone->lock and finding the page of best fit from the per-zone free areas. > > There is a benefit because the management overhead is halved. > It depends on (1) how difficult it is to allocate higher order memory and (2) the long term affects of preferring high order memory over order 0. For (1), slab has no minimum order fallback like slub does so the allocation either succeeds at cachep->gfporder or it fails. If memory fragmentation is such that order-1 memory is not possible, this is fixing an issue where the slab allocation would succeed but now fails unnecessarily. If that order-1 memory is painful to allocate, we've reclaimed and compacted unnecessarily when order-0 pages are available from the pcp list. For (2), high-order slab allocations increase fragmentation of the zone under memory pressure. If the per-zone free area is void of MIGRATE_UNMOVABLE pageblocks such that it must fallback, which it is under memory pressure, these order-1 pages can be returned from pageblocks that are filled with movable memory, or otherwise free. This ends up making hugepages difficult to allocate from (to the extent where 1.5GB of slab on a node is spread over 100GB of pageblocks). This occurs even though there may be MIGRATE_UNMOVABLE pages available on pcp lists. Using this patch, it is possible to backfill the pcp list up to the batchcount with MIGRATE_UNMOVABLE order-0 pages that we can subsequently allocate and free to, which turns out to be optimized for caches like TCPv6 that result in both faster page allocation and less slab fragmentation. > > > Instead, it is better to allocate order-0 memory if possible so that pages > > > can be returned from the per-cpu pagesets (pcp). > > Have a benchmark that shows this? > I'm not necessarily approaching this from a performance point of view, but rather as a means to reduce slab fragmentation when fallback to order-0 memory, especially when completely legitimate, is prohibited. From a performance standpoint, this will depend on separately on fragmentation and contention on zone->lock which both don't exist for order-0 memory until fallback is required and then the pcp are filled with up to batchcount pages. > > > > > There are two reasons to prefer this over allocating high order memory: > > > > > > - allocating from the pcp lists does not require a per-zone lock, and > > > > > > - this reduces stranding of MIGRATE_UNMOVABLE pageblocks on pcp lists > > > that increases slab fragmentation across a zone. > > The slab allocators generally buffer pages from the page allocator to > avoid this effect given the slowness of page allocator operations anyways. > It is possible to buffer the same number of pages once they are allocated, absent memory pressure, and does not require high-order memory. This seems like a separate issue. > > > We are particularly interested in the second point to eliminate cases > > > where all other pages on a pageblock are movable (or free) and fallback to > > > pageblocks of other migratetypes from the per-zone free areas causes > > > high-order slab memory to be allocated from them rather than from free > > > MIGRATE_UNMOVABLE pages on the pcp. > > Well does this actually do some good? > Examining pageblocks via tools/vm/page-types under memory pressure that show all B (buddy) and UlAMab (anon mapped) pages and then a single order-1 S (slab) page would suggest that the pageblock would not be exempted from ever being allocated for a hugepage until the slab is completely freed (indeterminate amount of time) if there are any pages on the MIGRATE_UNMOVABLE pcp list. This change is eliminating the exemption from allocating from unmovable pages that are readily available instead of preferring to expensively allocate order-1 with no reduction in waste. For users of slab_max_order, which we are not for obvious reasons, I can change this to only consider when testing gfporder == 0 since that logically makes sense if you prefer.