2022-05-12 14:23:12

by Wonhyuk Yang

[permalink] [raw]
Subject: [Patch v2] [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()

Currently, trace point mm_page_alloc_zone_locked() doesn't show
correct information.

First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can
be allocated from MIGRATE_HIGHATOMIC/MIGRATE_CMA. Nevertheless,
tracepoint use requested migration type not MIGRATE_HIGHATOMIC and
MIGRATE_CMA.

Second, after Commit 44042b4498728 ("mm/page_alloc: allow high-order
pages to be stored on the per-cpu lists") percpu-list can store
high order pages. But trace point determine whether it is a refiil
of percpu-list by comparing requested order and 0.

To handle these problems, make mm_page_alloc_zone_locked() only be
called by __rmqueue_smallest with correct migration type. With a
new argument called percpu_refill, it can show roughly whether it
is a refill of percpu-list.

Cc: Baik Song An <[email protected]>
Cc: Hong Yeon Kim <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: [email protected]
Signed-off-by: Wonhyuk Yang <[email protected]>
---
v1 -> v2: Simplify determining percpu-refill suggested by Mel Gorman

include/trace/events/kmem.h | 14 +++++++++-----
mm/page_alloc.c | 13 +++++--------
2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index ddc8c944f417..f89fb3afcd46 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -229,20 +229,23 @@ TRACE_EVENT(mm_page_alloc,

DECLARE_EVENT_CLASS(mm_page,

- TP_PROTO(struct page *page, unsigned int order, int migratetype),
+ TP_PROTO(struct page *page, unsigned int order, int migratetype,
+ int percpu_refill),

- TP_ARGS(page, order, migratetype),
+ TP_ARGS(page, order, migratetype, percpu_refill),

TP_STRUCT__entry(
__field( unsigned long, pfn )
__field( unsigned int, order )
__field( int, migratetype )
+ __field( int, percpu_refill )
),

TP_fast_assign(
__entry->pfn = page ? page_to_pfn(page) : -1UL;
__entry->order = order;
__entry->migratetype = migratetype;
+ __entry->percpu_refill = percpu_refill;
),

TP_printk("page=%p pfn=0x%lx order=%u migratetype=%d percpu_refill=%d",
@@ -250,14 +253,15 @@ DECLARE_EVENT_CLASS(mm_page,
__entry->pfn != -1UL ? __entry->pfn : 0,
__entry->order,
__entry->migratetype,
- __entry->order == 0)
+ __entry->percpu_refill)
);

DEFINE_EVENT(mm_page, mm_page_alloc_zone_locked,

- TP_PROTO(struct page *page, unsigned int order, int migratetype),
+ TP_PROTO(struct page *page, unsigned int order, int migratetype,
+ int percpu_refill),

- TP_ARGS(page, order, migratetype)
+ TP_ARGS(page, order, migratetype, percpu_refill)
);

TRACE_EVENT(mm_page_pcpu_drain,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e42038382c1..e906ac274586 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2476,6 +2476,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
del_page_from_free_list(page, zone, current_order);
expand(zone, page, order, current_order, migratetype);
set_pcppage_migratetype(page, migratetype);
+ trace_mm_page_alloc_zone_locked(page, order, migratetype,
+ pcp_allowed_order(order) &&
+ migratetype < MIGRATE_PCPTYPES);
return page;
}

@@ -2999,7 +3002,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
zone_page_state(zone, NR_FREE_PAGES) / 2) {
page = __rmqueue_cma_fallback(zone, order);
if (page)
- goto out;
+ return page;
}
}
retry:
@@ -3012,9 +3015,6 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
alloc_flags))
goto retry;
}
-out:
- if (page)
- trace_mm_page_alloc_zone_locked(page, order, migratetype);
return page;
}

@@ -3733,11 +3733,8 @@ struct page *rmqueue(struct zone *preferred_zone,
* reserved for high-order atomic allocation, so order-0
* request should skip it.
*/
- if (order > 0 && alloc_flags & ALLOC_HARDER) {
+ if (order > 0 && alloc_flags & ALLOC_HARDER)
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
- if (page)
- trace_mm_page_alloc_zone_locked(page, order, migratetype);
- }
if (!page) {
page = __rmqueue(zone, order, migratetype, alloc_flags);
if (!page)
--
2.30.2



2022-05-13 03:37:57

by Mel Gorman

[permalink] [raw]
Subject: Re: [Patch v2] [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()

On Thu, May 12, 2022 at 11:53:07AM +0900, Wonhyuk Yang wrote:
> Currently, trace point mm_page_alloc_zone_locked() doesn't show
> correct information.
>
> First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can
> be allocated from MIGRATE_HIGHATOMIC/MIGRATE_CMA. Nevertheless,
> tracepoint use requested migration type not MIGRATE_HIGHATOMIC and
> MIGRATE_CMA.
>
> Second, after Commit 44042b4498728 ("mm/page_alloc: allow high-order
> pages to be stored on the per-cpu lists") percpu-list can store
> high order pages. But trace point determine whether it is a refiil
> of percpu-list by comparing requested order and 0.
>
> To handle these problems, make mm_page_alloc_zone_locked() only be
> called by __rmqueue_smallest with correct migration type. With a
> new argument called percpu_refill, it can show roughly whether it
> is a refill of percpu-list.
>
> Cc: Baik Song An <[email protected]>
> Cc: Hong Yeon Kim <[email protected]>
> Cc: Taeung Song <[email protected]>
> Cc: [email protected]
> Signed-off-by: Wonhyuk Yang <[email protected]>

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2022-05-25 03:09:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [Patch v2] [PATCH] mm/page_alloc: Fix tracepoint mm_page_alloc_zone_locked()

Sorry for the late reply, but I finally have time to catchup on my inbox.

On Thu, 12 May 2022 11:53:07 +0900
Wonhyuk Yang <[email protected]> wrote:

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2476,6 +2476,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> del_page_from_free_list(page, zone, current_order);
> expand(zone, page, order, current_order, migratetype);
> set_pcppage_migratetype(page, migratetype);
> + trace_mm_page_alloc_zone_locked(page, order, migratetype,
> + pcp_allowed_order(order) &&
> + migratetype < MIGRATE_PCPTYPES);

It would make more sense to put this logic into the TP_fast_assign() if
possible. This code is added at the location of execution, and even though
it may not run while tracing is disabled, it does affect icache.

Could you pass in the order (you already have migratetype) and then in the
trace event have:


TP_fast_assign(
__entry->pfn = page ? page_to_pfn(page) : -1UL;
__entry->order = order;
__entry->migratetype = migratetype;
+ __entry->percpu_refill = pcp_allowed_order(order) &&
migratetype < MIGRATE_PCPTYPES);
),

??

-- Steve

> return page;
> }
>