Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Date:   Tue, 16 Oct 2018 15:17:56 +0000
From:   Christopher Lameter <cl@linux.com>
To:     David Rientjes <rientjes@google.com>
cc:     Andrew Morton <akpm@linux-foundation.org>,
        Pekka Enberg <penberg@kernel.org>,
        Joonsoo Kim <iamjoonsoo.kim@lge.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org
Subject: Re: [patch] mm, slab: avoid high-order slab pages when it does not
 reduce waste
In-Reply-To: <alpine.DEB.2.21.1810151715220.21338@chino.kir.corp.google.com>
Message-ID: <010001667d7476a2-f91dcf12-5e90-4ade-97e8-9fd651f7bf17-000000@email.amazonses.com>
References: <alpine.DEB.2.21.1810121424420.116562@chino.kir.corp.google.com> <20181012151341.286cd91321cdda9b6bde4de9@linux-foundation.org> <0100016679e3c96f-c78df4e2-9ab8-48db-8796-271c4b439f16-000000@email.amazonses.com>
 <alpine.DEB.2.21.1810151715220.21338@chino.kir.corp.google.com>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Feedback-ID: 1.us-east-1.fQZZZ0Xtj2+TD7V5apTT/NrT6QKuPgzCT/IC7XYgDKI=:AmazonSES
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Mon, 15 Oct 2018, David Rientjes wrote:

> On Mon, 15 Oct 2018, Christopher Lameter wrote:
>
> > > > If the amount of waste is the same at higher cachep->gfporder values,
> > > > there is no significant benefit to allocating higher order memory.  There
> > > > will be fewer calls to the page allocator, but each call will require
> > > > zone->lock and finding the page of best fit from the per-zone free areas.
> >
> > There is a benefit because the management overhead is halved.
> >
>
> It depends on (1) how difficult it is to allocate higher order memory and
> (2) the long term affects of preferring high order memory over order 0.

The overhead of the page allocator is orders of magnitudes bigger than
slab allocation. Higher order may be faster because the pcp overhead is
not there. It all depends. Please come up with some benchmarking to
substantiate these ideas.

>
> For (1), slab has no minimum order fallback like slub does so the
> allocation either succeeds at cachep->gfporder or it fails.  If memory
> fragmentation is such that order-1 memory is not possible, this is fixing
> an issue where the slab allocation would succeed but now fails
> unnecessarily.  If that order-1 memory is painful to allocate, we've
> reclaimed and compacted unnecessarily when order-0 pages are available
> from the pcp list.
>

Ok that sounds good but the performance impact is still an issue. Also we
agreed that the page allocator will provide allocations up to
COSTLY_ORDER without too much fuss. Other system components may fail if
these smaller order pages are not available.

> > Have a benchmark that shows this?
> >
>
> I'm not necessarily approaching this from a performance point of view, but
> rather as a means to reduce slab fragmentation when fallback to order-0
> memory, especially when completely legitimate, is prohibited.  From a
> performance standpoint, this will depend on separately on fragmentation
> and contention on zone->lock which both don't exist for order-0 memory
> until fallback is required and then the pcp are filled with up to
> batchcount pages.

Fragmentation is a performance issue and causes degradation of Linux MM
performance over time.  There are pretty complex mechanism that need to be
played against one another.

Come up with some metrics to get meaningful data that allows us to see the
impact.

I think what would be beneficial to have is a load that gradually
degrade as another process causes fragmentation. Any patch like the one
proposed should have an effect on the degree of fragmentation after a
certain time.

Having something like that could lead to a whole serial of optimizations.
Ideally we would like to have a MM subsystem that does not degrade as
today.