LinuxLists.cc - Re: [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching

2023-09-12 20:28:52

Subject: Re: [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching

On 9/11/23 21:41, Johannes Weiner wrote:
> The idea behind the cache is to save get_pageblock_migratetype()
> lookups during bulk freeing. A microbenchmark suggests this isn't
> helping, though. The pcp migratetype can get stale, which means that
> bulk freeing has an extra branch to check if the pageblock was
> isolated while on the pcp.
>
> While the variance overlaps, the cache write and the branch seem to
> make this a net negative. The following test allocates and frees
> batches of 10,000 pages (~3x the pcp high marks to trigger flushing):
>
> Before:
> 8,668.48 msec task-clock # 99.735 CPUs utilized ( +- 2.90% )
> 19 context-switches # 4.341 /sec ( +- 3.24% )
> 0 cpu-migrations # 0.000 /sec
> 17,440 page-faults # 3.984 K/sec ( +- 2.90% )
> 41,758,692,473 cycles # 9.541 GHz ( +- 2.90% )
> 126,201,294,231 instructions # 5.98 insn per cycle ( +- 2.90% )
> 25,348,098,335 branches # 5.791 G/sec ( +- 2.90% )
> 33,436,921 branch-misses # 0.26% of all branches ( +- 2.90% )
>
> 0.0869148 +- 0.0000302 seconds time elapsed ( +- 0.03% )
>
> After:
> 8,444.81 msec task-clock # 99.726 CPUs utilized ( +- 2.90% )
> 22 context-switches # 5.160 /sec ( +- 3.23% )
> 0 cpu-migrations # 0.000 /sec
> 17,443 page-faults # 4.091 K/sec ( +- 2.90% )
> 40,616,738,355 cycles # 9.527 GHz ( +- 2.90% )
> 126,383,351,792 instructions # 6.16 insn per cycle ( +- 2.90% )
> 25,224,985,153 branches # 5.917 G/sec ( +- 2.90% )
> 32,236,793 branch-misses # 0.25% of all branches ( +- 2.90% )
>
> 0.0846799 +- 0.0000412 seconds time elapsed ( +- 0.05% )
>
> A side effect is that this also ensures that pages whose pageblock
> gets stolen while on the pcplist end up on the right freelist and we
> don't perform potentially type-incompatible buddy merges (or skip
> merges when we shouldn't), whis is likely beneficial to long-term
> fragmentation management, although the effects would be harder to
> measure. Settle for simpler and faster code as justification here.

Makes sense to me, so

> Signed-off-by: Johannes Weiner <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

Some notes below.

> @@ -1577,7 +1556,6 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> continue;
> del_page_from_free_list(page, zone, current_order);
> expand(zone, page, order, current_order, migratetype);
> - set_pcppage_migratetype(page, migratetype);

Hm interesting, just noticed that __rmqueue_fallback() never did this
AFAICS, sounds like a bug.

> trace_mm_page_alloc_zone_locked(page, order, migratetype,
> pcp_allowed_order(order) &&
> migratetype < MIGRATE_PCPTYPES);
> @@ -2145,7 +2123,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> * pages are ordered properly.
> */
> list_add_tail(&page->pcp_list, list);
> - if (is_migrate_cma(get_pcppage_migratetype(page)))
> + if (is_migrate_cma(get_pageblock_migratetype(page)))
> __mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
> -(1 << order));

This is potentially a source of overhead, I assume patch 6/6 might
change that.

> }
> @@ -2304,19 +2282,6 @@ void drain_all_pages(struct zone *zone)
> __drain_all_pages(zone, false);
> }
>
> -static bool free_unref_page_prepare(struct page *page, unsigned long pfn,
> - unsigned int order)
> -{
> - int migratetype;
> -
> - if (!free_pages_prepare(page, order, FPI_NONE))
> - return false;
> -
> - migratetype = get_pfnblock_migratetype(page, pfn);
> - set_pcppage_migratetype(page, migratetype);
> - return true;
> -}
> -
> static int nr_pcp_free(struct per_cpu_pages *pcp, int high, bool free_high)
> {
> int min_nr_free, max_nr_free;
> @@ -2402,7 +2367,7 @@ void free_unref_page(struct page *page, unsigned int order)
> unsigned long pfn = page_to_pfn(page);
> int migratetype, pcpmigratetype;
>
> - if (!free_unref_page_prepare(page, pfn, order))
> + if (!free_pages_prepare(page, order, FPI_NONE))
> return;
>
> /*
> @@ -2412,7 +2377,7 @@ void free_unref_page(struct page *page, unsigned int order)
> * get those areas back if necessary. Otherwise, we may have to free
> * excessively into the page allocator
> */
> - migratetype = pcpmigratetype = get_pcppage_migratetype(page);
> + migratetype = pcpmigratetype = get_pfnblock_migratetype(page, pfn);
> if (unlikely(migratetype >= MIGRATE_PCPTYPES)) {
> if (unlikely(is_migrate_isolate(migratetype))) {
> free_one_page(page_zone(page), page, pfn, order, migratetype, FPI_NONE);
> @@ -2448,7 +2413,8 @@ void free_unref_page_list(struct list_head *list)
> /* Prepare pages for freeing */
> list_for_each_entry_safe(page, next, list, lru) {
> unsigned long pfn = page_to_pfn(page);
> - if (!free_unref_page_prepare(page, pfn, 0)) {
> +
> + if (!free_pages_prepare(page, 0, FPI_NONE)) {
> list_del(&page->lru);
> continue;
> }
> @@ -2457,7 +2423,7 @@ void free_unref_page_list(struct list_head *list)
> * Free isolated pages directly to the allocator, see
> * comment in free_unref_page.
> */
> - migratetype = get_pcppage_migratetype(page);
> + migratetype = get_pfnblock_migratetype(page, pfn);
> if (unlikely(is_migrate_isolate(migratetype))) {
> list_del(&page->lru);
> free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);

I think after this change we should move the isolated pages handling to
the second loop below, so that we wouldn't have to call
get_pfnblock_migratetype() twice per page. Dunno yet if some later patch
does that. It would need to unlock pcp when necessary.

> @@ -2466,10 +2432,11 @@ void free_unref_page_list(struct list_head *list)
> }
>
> list_for_each_entry_safe(page, next, list, lru) {
> + unsigned long pfn = page_to_pfn(page);
> struct zone *zone = page_zone(page);
>
> list_del(&page->lru);
> - migratetype = get_pcppage_migratetype(page);
> + migratetype = get_pfnblock_migratetype(page, pfn);
>
> /*
> * Either different zone requiring a different pcp lock or
> @@ -2492,7 +2459,7 @@ void free_unref_page_list(struct list_head *list)
> pcp = pcp_spin_trylock(zone->per_cpu_pageset);
> if (unlikely(!pcp)) {
> pcp_trylock_finish(UP_flags);
> - free_one_page(zone, page, page_to_pfn(page),
> + free_one_page(zone, page, pfn,
> 0, migratetype, FPI_NONE);
> locked_zone = NULL;
> continue;
> @@ -2661,7 +2628,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> }
> }
> __mod_zone_freepage_state(zone, -(1 << order),
> - get_pcppage_migratetype(page));
> + get_pageblock_migratetype(page));
> spin_unlock_irqrestore(&zone->lock, flags);
> } while (check_new_pages(page, order));
>

2023-09-13 01:32:26

by Johannes Weiner

[permalink] [raw]

Subject: Re: [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching

On Tue, Sep 12, 2023 at 03:47:45PM +0200, Vlastimil Babka wrote:
> I think after this change we should [...]

Speaking of follow-ups, AFAICS we no longer need those either:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9cad31de1bf5..bea499fbca58 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1751,13 +1751,6 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,

old_block_type = get_pageblock_migratetype(page);

- /*
- * This can happen due to races and we want to prevent broken
- * highatomic accounting.
- */
- if (is_migrate_highatomic(old_block_type))
- goto single_page;
-
/* Take ownership for orders >= pageblock_order */
if (current_order >= pageblock_order) {
change_pageblock_range(page, current_order, start_type);
@@ -1926,24 +1919,15 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
continue;

/*
- * In page freeing path, migratetype change is racy so
- * we can counter several free pages in a pageblock
- * in this loop although we changed the pageblock type
- * from highatomic to ac->migratetype. So we should
- * adjust the count once.
+ * It should never happen but changes to
+ * locking could inadvertently allow a per-cpu
+ * drain to add pages to MIGRATE_HIGHATOMIC
+ * while unreserving so be safe and watch for
+ * underflows.
*/
- if (is_migrate_highatomic_page(page)) {
- /*
- * It should never happen but changes to
- * locking could inadvertently allow a per-cpu
- * drain to add pages to MIGRATE_HIGHATOMIC
- * while unreserving so be safe and watch for
- * underflows.
- */
- zone->nr_reserved_highatomic -= min(
- pageblock_nr_pages,
- zone->nr_reserved_highatomic);
- }
+ zone->nr_reserved_highatomic -= min(
+ pageblock_nr_pages,
+ zone->nr_reserved_highatomic);

/*
* Convert to ac->migratetype and avoid the normal

I think they were only in place because we could change the highatomic
status of pages on the pcplist, and those pages would then end up on
some other freelist due to the stale pcppage cache.

I replaced them locally with WARNs and ran an hour or so of kernel
builds under pressure. It didn't trigger. So I would send a follow up
to remove them.

Unless you point me to a good reason why they're definitely still
needed - in which case this is a moot proposal - but then we should
make the comments more specific.

2023-09-14 14:10:17

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH 1/6] mm: page_alloc: remove pcppage migratetype caching

On 9/12/23 17:03, Johannes Weiner wrote:
> On Tue, Sep 12, 2023 at 03:47:45PM +0200, Vlastimil Babka wrote:
>> I think after this change we should [...]
>
> Speaking of follow-ups, AFAICS we no longer need those either:

Seems so, but the comments do talk about races, so once those are sorted out :)

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9cad31de1bf5..bea499fbca58 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1751,13 +1751,6 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
>
> old_block_type = get_pageblock_migratetype(page);
>
> - /*
> - * This can happen due to races and we want to prevent broken
> - * highatomic accounting.
> - */
> - if (is_migrate_highatomic(old_block_type))
> - goto single_page;
> -
> /* Take ownership for orders >= pageblock_order */
> if (current_order >= pageblock_order) {
> change_pageblock_range(page, current_order, start_type);
> @@ -1926,24 +1919,15 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
> continue;
>
> /*
> - * In page freeing path, migratetype change is racy so
> - * we can counter several free pages in a pageblock
> - * in this loop although we changed the pageblock type
> - * from highatomic to ac->migratetype. So we should
> - * adjust the count once.
> + * It should never happen but changes to
> + * locking could inadvertently allow a per-cpu
> + * drain to add pages to MIGRATE_HIGHATOMIC
> + * while unreserving so be safe and watch for
> + * underflows.
> */
> - if (is_migrate_highatomic_page(page)) {
> - /*
> - * It should never happen but changes to
> - * locking could inadvertently allow a per-cpu
> - * drain to add pages to MIGRATE_HIGHATOMIC
> - * while unreserving so be safe and watch for
> - * underflows.
> - */
> - zone->nr_reserved_highatomic -= min(
> - pageblock_nr_pages,
> - zone->nr_reserved_highatomic);
> - }
> + zone->nr_reserved_highatomic -= min(
> + pageblock_nr_pages,
> + zone->nr_reserved_highatomic);
>
> /*
> * Convert to ac->migratetype and avoid the normal
>
> I think they were only in place because we could change the highatomic
> status of pages on the pcplist, and those pages would then end up on
> some other freelist due to the stale pcppage cache.
>
> I replaced them locally with WARNs and ran an hour or so of kernel
> builds under pressure. It didn't trigger. So I would send a follow up
> to remove them.
>
> Unless you point me to a good reason why they're definitely still
> needed - in which case this is a moot proposal - but then we should
> make the comments more specific.