On Tue, Sep 12, 2023 at 03:47:45PM +0200, Vlastimil Babka wrote:
> On 9/11/23 21:41, Johannes Weiner wrote:
> > The idea behind the cache is to save get_pageblock_migratetype()
> > lookups during bulk freeing. A microbenchmark suggests this isn't
> > helping, though. The pcp migratetype can get stale, which means that
> > bulk freeing has an extra branch to check if the pageblock was
> > isolated while on the pcp.
> >
> > While the variance overlaps, the cache write and the branch seem to
> > make this a net negative. The following test allocates and frees
> > batches of 10,000 pages (~3x the pcp high marks to trigger flushing):
> >
> > Before:
> > 8,668.48 msec task-clock # 99.735 CPUs utilized ( +- 2.90% )
> > 19 context-switches # 4.341 /sec ( +- 3.24% )
> > 0 cpu-migrations # 0.000 /sec
> > 17,440 page-faults # 3.984 K/sec ( +- 2.90% )
> > 41,758,692,473 cycles # 9.541 GHz ( +- 2.90% )
> > 126,201,294,231 instructions # 5.98 insn per cycle ( +- 2.90% )
> > 25,348,098,335 branches # 5.791 G/sec ( +- 2.90% )
> > 33,436,921 branch-misses # 0.26% of all branches ( +- 2.90% )
> >
> > 0.0869148 +- 0.0000302 seconds time elapsed ( +- 0.03% )
> >
> > After:
> > 8,444.81 msec task-clock # 99.726 CPUs utilized ( +- 2.90% )
> > 22 context-switches # 5.160 /sec ( +- 3.23% )
> > 0 cpu-migrations # 0.000 /sec
> > 17,443 page-faults # 4.091 K/sec ( +- 2.90% )
> > 40,616,738,355 cycles # 9.527 GHz ( +- 2.90% )
> > 126,383,351,792 instructions # 6.16 insn per cycle ( +- 2.90% )
> > 25,224,985,153 branches # 5.917 G/sec ( +- 2.90% )
> > 32,236,793 branch-misses # 0.25% of all branches ( +- 2.90% )
> >
> > 0.0846799 +- 0.0000412 seconds time elapsed ( +- 0.05% )
> >
> > A side effect is that this also ensures that pages whose pageblock
> > gets stolen while on the pcplist end up on the right freelist and we
> > don't perform potentially type-incompatible buddy merges (or skip
> > merges when we shouldn't), whis is likely beneficial to long-term
> > fragmentation management, although the effects would be harder to
> > measure. Settle for simpler and faster code as justification here.
>
> Makes sense to me, so
>
> > Signed-off-by: Johannes Weiner <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
Thanks!
> > @@ -1577,7 +1556,6 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> > continue;
> > del_page_from_free_list(page, zone, current_order);
> > expand(zone, page, order, current_order, migratetype);
> > - set_pcppage_migratetype(page, migratetype);
>
> Hm interesting, just noticed that __rmqueue_fallback() never did this
> AFAICS, sounds like a bug.
I don't quite follow. Which part?
Keep in mind that at this point __rmqueue_fallback() doesn't return a
page. It just moves pages to the desired freelist, and then
__rmqueue_smallest() gets called again. This changes in 5/6, but until
now at least all of the above would apply to fallback pages.
> > @@ -2145,7 +2123,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> > * pages are ordered properly.
> > */
> > list_add_tail(&page->pcp_list, list);
> > - if (is_migrate_cma(get_pcppage_migratetype(page)))
> > + if (is_migrate_cma(get_pageblock_migratetype(page)))
> > __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
> > -(1 << order));
>
> This is potentially a source of overhead, I assume patch 6/6 might
> change that.
Yes, 6/6 removes it altogether.
But the test results in this patch's changelog are from this patch in
isolation, so it doesn't appear to be a concern even on its own.
> > @@ -2457,7 +2423,7 @@ void free_unref_page_list(struct list_head *list)
> > * Free isolated pages directly to the allocator, see
> > * comment in free_unref_page.
> > */
> > - migratetype = get_pcppage_migratetype(page);
> > + migratetype = get_pfnblock_migratetype(page, pfn);
> > if (unlikely(is_migrate_isolate(migratetype))) {
> > list_del(&page->lru);
> > free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
>
> I think after this change we should move the isolated pages handling to
> the second loop below, so that we wouldn't have to call
> get_pfnblock_migratetype() twice per page. Dunno yet if some later patch
> does that. It would need to unlock pcp when necessary.
That sounds like a great idea. Something like the following?
Lightly tested. If you're good with it, I'll beat some more on it and
submit it as a follow-up.
---
From 429d13322819ab38b3ba2fad6d1495997819ccc2 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <[email protected]>
Date: Tue, 12 Sep 2023 10:16:10 -0400
Subject: [PATCH] mm: page_alloc: optimize free_unref_page_list()
Move direct freeing of isolated pages to the lock-breaking block in
the second loop. This saves an unnecessary migratetype reassessment.
Minor comment and local variable scoping cleanups.
Suggested-by: Vlastimil Babka <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
---
mm/page_alloc.c | 49 +++++++++++++++++++++----------------------------
1 file changed, 21 insertions(+), 28 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e3f1c777feed..9cad31de1bf5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2408,48 +2408,41 @@ void free_unref_page_list(struct list_head *list)
struct per_cpu_pages *pcp = NULL;
struct zone *locked_zone = NULL;
int batch_count = 0;
- int migratetype;
-
- /* Prepare pages for freeing */
- list_for_each_entry_safe(page, next, list, lru) {
- unsigned long pfn = page_to_pfn(page);
- if (!free_pages_prepare(page, 0, FPI_NONE)) {
+ list_for_each_entry_safe(page, next, list, lru)
+ if (!free_pages_prepare(page, 0, FPI_NONE))
list_del(&page->lru);
- continue;
- }
-
- /*
- * Free isolated pages directly to the allocator, see
- * comment in free_unref_page.
- */
- migratetype = get_pfnblock_migratetype(page, pfn);
- if (unlikely(is_migrate_isolate(migratetype))) {
- list_del(&page->lru);
- free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
- continue;
- }
- }
list_for_each_entry_safe(page, next, list, lru) {
unsigned long pfn = page_to_pfn(page);
struct zone *zone = page_zone(page);
+ int migratetype;
list_del(&page->lru);
migratetype = get_pfnblock_migratetype(page, pfn);
/*
- * Either different zone requiring a different pcp lock or
- * excessive lock hold times when freeing a large list of
- * pages.
+ * Zone switch, batch complete, or non-pcp freeing?
+ * Drop the pcp lock and evaluate.
*/
- if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) {
+ if (unlikely(zone != locked_zone ||
+ batch_count == SWAP_CLUSTER_MAX ||
+ is_migrate_isolate(migratetype))) {
if (pcp) {
pcp_spin_unlock(pcp);
pcp_trylock_finish(UP_flags);
+ locked_zone = NULL;
}
- batch_count = 0;
+ /*
+ * Free isolated pages directly to the
+ * allocator, see comment in free_unref_page.
+ */
+ if (is_migrate_isolate(migratetype)) {
+ free_one_page(zone, page, pfn, 0,
+ migratetype, FPI_NONE);
+ continue;
+ }
/*
* trylock is necessary as pages may be getting freed
@@ -2459,12 +2452,12 @@ void free_unref_page_list(struct list_head *list)
pcp = pcp_spin_trylock(zone->per_cpu_pageset);
if (unlikely(!pcp)) {
pcp_trylock_finish(UP_flags);
- free_one_page(zone, page, pfn,
- 0, migratetype, FPI_NONE);
- locked_zone = NULL;
+ free_one_page(zone, page, pfn, 0,
+ migratetype, FPI_NONE);
continue;
}
locked_zone = zone;
+ batch_count = 0;
}
/*
--
2.42.0
On 9/12/23 16:50, Johannes Weiner wrote:
> On Tue, Sep 12, 2023 at 03:47:45PM +0200, Vlastimil Babka wrote:
>> On 9/11/23 21:41, Johannes Weiner wrote:
>
>> > @@ -1577,7 +1556,6 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>> > continue;
>> > del_page_from_free_list(page, zone, current_order);
>> > expand(zone, page, order, current_order, migratetype);
>> > - set_pcppage_migratetype(page, migratetype);
>>
>> Hm interesting, just noticed that __rmqueue_fallback() never did this
>> AFAICS, sounds like a bug.
>
> I don't quite follow. Which part?
>
> Keep in mind that at this point __rmqueue_fallback() doesn't return a
> page. It just moves pages to the desired freelist, and then
> __rmqueue_smallest() gets called again. This changes in 5/6, but until
> now at least all of the above would apply to fallback pages.
Yep, missed that "doesn't return a page", thanks.
>> > @@ -2145,7 +2123,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>> > * pages are ordered properly.
>> > */
>> > list_add_tail(&page->pcp_list, list);
>> > - if (is_migrate_cma(get_pcppage_migratetype(page)))
>> > + if (is_migrate_cma(get_pageblock_migratetype(page)))
>> > __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
>> > -(1 << order));
>>
>> This is potentially a source of overhead, I assume patch 6/6 might
>> change that.
>
> Yes, 6/6 removes it altogether.
>
> But the test results in this patch's changelog are from this patch in
> isolation, so it doesn't appear to be a concern even on its own.
>
>> > @@ -2457,7 +2423,7 @@ void free_unref_page_list(struct list_head *list)
>> > * Free isolated pages directly to the allocator, see
>> > * comment in free_unref_page.
>> > */
>> > - migratetype = get_pcppage_migratetype(page);
>> > + migratetype = get_pfnblock_migratetype(page, pfn);
>> > if (unlikely(is_migrate_isolate(migratetype))) {
>> > list_del(&page->lru);
>> > free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
>>
>> I think after this change we should move the isolated pages handling to
>> the second loop below, so that we wouldn't have to call
>> get_pfnblock_migratetype() twice per page. Dunno yet if some later patch
>> does that. It would need to unlock pcp when necessary.
>
> That sounds like a great idea. Something like the following?
>
> Lightly tested. If you're good with it, I'll beat some more on it and
> submit it as a follow-up.
>
> ---
>
> From 429d13322819ab38b3ba2fad6d1495997819ccc2 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <[email protected]>
> Date: Tue, 12 Sep 2023 10:16:10 -0400
> Subject: [PATCH] mm: page_alloc: optimize free_unref_page_list()
>
> Move direct freeing of isolated pages to the lock-breaking block in
> the second loop. This saves an unnecessary migratetype reassessment.
>
> Minor comment and local variable scoping cleanups.
Looks like batch_count and locked_zone could be moved to the loop scope as well.
>
> Suggested-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
> ---
> mm/page_alloc.c | 49 +++++++++++++++++++++----------------------------
> 1 file changed, 21 insertions(+), 28 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e3f1c777feed..9cad31de1bf5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2408,48 +2408,41 @@ void free_unref_page_list(struct list_head *list)
> struct per_cpu_pages *pcp = NULL;
> struct zone *locked_zone = NULL;
> int batch_count = 0;
> - int migratetype;
> -
> - /* Prepare pages for freeing */
> - list_for_each_entry_safe(page, next, list, lru) {
> - unsigned long pfn = page_to_pfn(page);
>
> - if (!free_pages_prepare(page, 0, FPI_NONE)) {
> + list_for_each_entry_safe(page, next, list, lru)
> + if (!free_pages_prepare(page, 0, FPI_NONE))
> list_del(&page->lru);
> - continue;
> - }
> -
> - /*
> - * Free isolated pages directly to the allocator, see
> - * comment in free_unref_page.
> - */
> - migratetype = get_pfnblock_migratetype(page, pfn);
> - if (unlikely(is_migrate_isolate(migratetype))) {
> - list_del(&page->lru);
> - free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
> - continue;
> - }
> - }
>
> list_for_each_entry_safe(page, next, list, lru) {
> unsigned long pfn = page_to_pfn(page);
> struct zone *zone = page_zone(page);
> + int migratetype;
>
> list_del(&page->lru);
> migratetype = get_pfnblock_migratetype(page, pfn);
>
> /*
> - * Either different zone requiring a different pcp lock or
> - * excessive lock hold times when freeing a large list of
> - * pages.
> + * Zone switch, batch complete, or non-pcp freeing?
> + * Drop the pcp lock and evaluate.
> */
> - if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) {
> + if (unlikely(zone != locked_zone ||
> + batch_count == SWAP_CLUSTER_MAX ||
> + is_migrate_isolate(migratetype))) {
> if (pcp) {
> pcp_spin_unlock(pcp);
> pcp_trylock_finish(UP_flags);
> + locked_zone = NULL;
> }
>
> - batch_count = 0;
> + /*
> + * Free isolated pages directly to the
> + * allocator, see comment in free_unref_page.
> + */
> + if (is_migrate_isolate(migratetype)) {
> + free_one_page(zone, page, pfn, 0,
> + migratetype, FPI_NONE);
> + continue;
> + }
>
> /*
> * trylock is necessary as pages may be getting freed
> @@ -2459,12 +2452,12 @@ void free_unref_page_list(struct list_head *list)
> pcp = pcp_spin_trylock(zone->per_cpu_pageset);
> if (unlikely(!pcp)) {
> pcp_trylock_finish(UP_flags);
> - free_one_page(zone, page, pfn,
> - 0, migratetype, FPI_NONE);
> - locked_zone = NULL;
> + free_one_page(zone, page, pfn, 0,
> + migratetype, FPI_NONE);
> continue;
> }
> locked_zone = zone;
> + batch_count = 0;
> }
>
> /*
Hello Vlastimil,
On Wed, Sep 13, 2023 at 11:33:52AM +0200, Vlastimil Babka wrote:
> On 9/12/23 16:50, Johannes Weiner wrote:
> > From 429d13322819ab38b3ba2fad6d1495997819ccc2 Mon Sep 17 00:00:00 2001
> > From: Johannes Weiner <[email protected]>
> > Date: Tue, 12 Sep 2023 10:16:10 -0400
> > Subject: [PATCH] mm: page_alloc: optimize free_unref_page_list()
> >
> > Move direct freeing of isolated pages to the lock-breaking block in
> > the second loop. This saves an unnecessary migratetype reassessment.
> >
> > Minor comment and local variable scoping cleanups.
>
> Looks like batch_count and locked_zone could be moved to the loop scope as well.
Hm they both maintain values over multiple iterations, so I don't
think that's possible. Am I missing something?
> > Suggested-by: Vlastimil Babka <[email protected]>
> > Signed-off-by: Johannes Weiner <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
Thanks! I'll send this out properly with your tag.