LinuxLists.cc - [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

2022-03-04 20:19:45

Subject: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

From: Eric Dumazet <[email protected]>

For high order pages not using pcp, rmqueue() is currently calling
the costly check_new_pages() while zone spinlock is held,
and hard irqs masked.

This is not needed, we can release the spinlock sooner to reduce
zone spinlock contention.

Note that after this patch, we call __mod_zone_freepage_state()
before deciding to leak the page because it is in bad state.

v2: We need to keep interrupts disabled to call __mod_zone_freepage_state()

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wei Xu <[email protected]>
Cc: Greg Thelen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
---
mm/page_alloc.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3589febc6d31928f850ebe5a4015ddc40e0469f3..1804287c1b792b8aa0e964b17eb002b6b1115258 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3706,10 +3706,10 @@ struct page *rmqueue(struct zone *preferred_zone,
* allocate greater than order-1 page units with __GFP_NOFAIL.
*/
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
- spin_lock_irqsave(&zone->lock, flags);

do {
page = NULL;
+ spin_lock_irqsave(&zone->lock, flags);
/*
* order-0 request can reach here when the pcplist is skipped
* due to non-CMA allocation context. HIGHATOMIC area is
@@ -3721,15 +3721,15 @@ struct page *rmqueue(struct zone *preferred_zone,
if (page)
trace_mm_page_alloc_zone_locked(page, order, migratetype);
}
- if (!page)
+ if (!page) {
page = __rmqueue(zone, order, migratetype, alloc_flags);
- } while (page && check_new_pages(page, order));
- if (!page)
- goto failed;
-
- __mod_zone_freepage_state(zone, -(1 << order),
- get_pcppage_migratetype(page));
- spin_unlock_irqrestore(&zone->lock, flags);
+ if (!page)
+ goto failed;
+ }
+ __mod_zone_freepage_state(zone, -(1 << order),
+ get_pcppage_migratetype(page));
+ spin_unlock_irqrestore(&zone->lock, flags);
+ } while (check_new_pages(page, order));

__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
zone_statistics(preferred_zone, zone, 1);
--
2.35.1.616.g0bdcbb4464-goog

2022-03-04 20:56:07

by Shakeel Butt

[permalink] [raw]

Subject: Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

On Fri, Mar 04, 2022 at 09:02:15AM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <[email protected]>

> For high order pages not using pcp, rmqueue() is currently calling
> the costly check_new_pages() while zone spinlock is held,
> and hard irqs masked.

> This is not needed, we can release the spinlock sooner to reduce
> zone spinlock contention.

> Note that after this patch, we call __mod_zone_freepage_state()
> before deciding to leak the page because it is in bad state.

> v2: We need to keep interrupts disabled to call
> __mod_zone_freepage_state()

> Signed-off-by: Eric Dumazet <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: Wei Xu <[email protected]>
> Cc: Greg Thelen <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: David Rientjes <[email protected]>

Reviewed-by: Shakeel Butt <[email protected]>

2022-03-07 03:11:24

by David Rientjes

[permalink] [raw]

Subject: Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

On Fri, 4 Mar 2022, Eric Dumazet wrote:

> From: Eric Dumazet <[email protected]>
>
> For high order pages not using pcp, rmqueue() is currently calling
> the costly check_new_pages() while zone spinlock is held,
> and hard irqs masked.
>
> This is not needed, we can release the spinlock sooner to reduce
> zone spinlock contention.
>
> Note that after this patch, we call __mod_zone_freepage_state()
> before deciding to leak the page because it is in bad state.
>
> v2: We need to keep interrupts disabled to call __mod_zone_freepage_state()
>
> Signed-off-by: Eric Dumazet <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: Wei Xu <[email protected]>
> Cc: Greg Thelen <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: David Rientjes <[email protected]>

Acked-by: David Rientjes <[email protected]>

2022-03-07 09:51:50

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

On Fri, Mar 04, 2022 at 09:02:15AM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <[email protected]>
>
> For high order pages not using pcp, rmqueue() is currently calling
> the costly check_new_pages() while zone spinlock is held,
> and hard irqs masked.
>
> This is not needed, we can release the spinlock sooner to reduce
> zone spinlock contention.
>
> Note that after this patch, we call __mod_zone_freepage_state()
> before deciding to leak the page because it is in bad state.
>
> v2: We need to keep interrupts disabled to call __mod_zone_freepage_state()
>
> Signed-off-by: Eric Dumazet <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: Wei Xu <[email protected]>
> Cc: Greg Thelen <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: David Rientjes <[email protected]>

Ok, this is only more expensive in the event pages on the free list have
been corrupted whch is already very unlikely so thanks!

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2022-03-07 10:11:31

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

On 3/4/22 18:02, Eric Dumazet wrote:
> From: Eric Dumazet <[email protected]>
>
> For high order pages not using pcp, rmqueue() is currently calling
> the costly check_new_pages() while zone spinlock is held,
> and hard irqs masked.
>
> This is not needed, we can release the spinlock sooner to reduce
> zone spinlock contention.
>
> Note that after this patch, we call __mod_zone_freepage_state()
> before deciding to leak the page because it is in bad state.

Which is arguably an accounting fix on its own, because when we remove page
from the free list, we should decrease the respective counter(s) even if we
find the page is in bad state and discard (effectively leak) it.

>
> v2: We need to keep interrupts disabled to call __mod_zone_freepage_state()
>
> Signed-off-by: Eric Dumazet <[email protected]>

Reviewed-by: Vlastimil Babka <[email protected]>

> Cc: Mel Gorman <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Shakeel Butt <[email protected]>
> Cc: Wei Xu <[email protected]>
> Cc: Greg Thelen <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: David Rientjes <[email protected]>
> ---
> mm/page_alloc.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3589febc6d31928f850ebe5a4015ddc40e0469f3..1804287c1b792b8aa0e964b17eb002b6b1115258 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3706,10 +3706,10 @@ struct page *rmqueue(struct zone *preferred_zone,
> * allocate greater than order-1 page units with __GFP_NOFAIL.
> */
> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> - spin_lock_irqsave(&zone->lock, flags);
>
> do {
> page = NULL;
> + spin_lock_irqsave(&zone->lock, flags);
> /*
> * order-0 request can reach here when the pcplist is skipped
> * due to non-CMA allocation context. HIGHATOMIC area is
> @@ -3721,15 +3721,15 @@ struct page *rmqueue(struct zone *preferred_zone,
> if (page)
> trace_mm_page_alloc_zone_locked(page, order, migratetype);
> }
> - if (!page)
> + if (!page) {
> page = __rmqueue(zone, order, migratetype, alloc_flags);
> - } while (page && check_new_pages(page, order));
> - if (!page)
> - goto failed;
> -
> - __mod_zone_freepage_state(zone, -(1 << order),
> - get_pcppage_migratetype(page));
> - spin_unlock_irqrestore(&zone->lock, flags);
> + if (!page)
> + goto failed;
> + }
> + __mod_zone_freepage_state(zone, -(1 << order),
> + get_pcppage_migratetype(page));
> + spin_unlock_irqrestore(&zone->lock, flags);
> + } while (check_new_pages(page, order));
>
> __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
> zone_statistics(preferred_zone, zone, 1);

2022-03-09 02:15:37

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

On Mon, Mar 7, 2022 at 1:15 AM Mel Gorman <[email protected]> wrote:
>
> On Fri, Mar 04, 2022 at 09:02:15AM -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <[email protected]>
> >
> > For high order pages not using pcp, rmqueue() is currently calling
> > the costly check_new_pages() while zone spinlock is held,
> > and hard irqs masked.
> >
> > This is not needed, we can release the spinlock sooner to reduce
> > zone spinlock contention.
> >
> > Note that after this patch, we call __mod_zone_freepage_state()
> > before deciding to leak the page because it is in bad state.
> >
> > v2: We need to keep interrupts disabled to call __mod_zone_freepage_state()
> >
> > Signed-off-by: Eric Dumazet <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Vlastimil Babka <[email protected]>
> > Cc: Michal Hocko <[email protected]>
> > Cc: Shakeel Butt <[email protected]>
> > Cc: Wei Xu <[email protected]>
> > Cc: Greg Thelen <[email protected]>
> > Cc: Hugh Dickins <[email protected]>
> > Cc: David Rientjes <[email protected]>
>
> Ok, this is only more expensive in the event pages on the free list have
> been corrupted whch is already very unlikely so thanks!
>
> Acked-by: Mel Gorman <[email protected]>
>

One remaining question is:

After your patch ("mm/page_alloc: allow high-order pages to be stored
on the per-cpu lists"),
do we want to change check_pcp_refill()/check_new_pcp() to check all pages,
and not only the head ?

Or was it a conscious choice of yours ?
(I presume part of the performance gains came from
not having to bring ~7 cache lines per 32KB chunk on x86)

Thanks !

2022-03-09 14:28:57

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

On Tue, Mar 08, 2022 at 03:49:48PM -0800, Eric Dumazet wrote:
> On Mon, Mar 7, 2022 at 1:15 AM Mel Gorman <[email protected]> wrote:
> >
> > On Fri, Mar 04, 2022 at 09:02:15AM -0800, Eric Dumazet wrote:
> > > From: Eric Dumazet <[email protected]>
> > >
> > > For high order pages not using pcp, rmqueue() is currently calling
> > > the costly check_new_pages() while zone spinlock is held,
> > > and hard irqs masked.
> > >
> > > This is not needed, we can release the spinlock sooner to reduce
> > > zone spinlock contention.
> > >
> > > Note that after this patch, we call __mod_zone_freepage_state()
> > > before deciding to leak the page because it is in bad state.
> > >
> > > v2: We need to keep interrupts disabled to call __mod_zone_freepage_state()
> > >
> > > Signed-off-by: Eric Dumazet <[email protected]>
> > > Cc: Mel Gorman <[email protected]>
> > > Cc: Vlastimil Babka <[email protected]>
> > > Cc: Michal Hocko <[email protected]>
> > > Cc: Shakeel Butt <[email protected]>
> > > Cc: Wei Xu <[email protected]>
> > > Cc: Greg Thelen <[email protected]>
> > > Cc: Hugh Dickins <[email protected]>
> > > Cc: David Rientjes <[email protected]>
> >
> > Ok, this is only more expensive in the event pages on the free list have
> > been corrupted whch is already very unlikely so thanks!
> >
> > Acked-by: Mel Gorman <[email protected]>
> >
>
> One remaining question is:
>
> After your patch ("mm/page_alloc: allow high-order pages to be stored
> on the per-cpu lists"),
> do we want to change check_pcp_refill()/check_new_pcp() to check all pages,
> and not only the head ?
>

We should because it was an oversight. Thanks for pointing that out.

> Or was it a conscious choice of yours ?
> (I presume part of the performance gains came from
> not having to bring ~7 cache lines per 32KB chunk on x86)
>

There will be a performance penalty due to the check but it's a correctness
vs performance issue.

This? It's boot tested only.

--8<--
mm/page_alloc: check high-order pages for corruption during PCP operations

Eric Dumazet pointed out that commit 44042b449872 ("mm/page_alloc: allow
high-order pages to be stored on the per-cpu lists") only checks the head
page during PCP refill and allocation operations. This was an oversight
and all pages should be checked. This will incur a small performance
penalty but it's necessary for correctness.

Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
---
mm/page_alloc.c | 46 +++++++++++++++++++++++-----------------------
1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3589febc6d31..2920344fa887 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2342,23 +2342,36 @@ static inline int check_new_page(struct page *page)
return 1;
}

+static bool check_new_pages(struct page *page, unsigned int order)
+{
+ int i;
+ for (i = 0; i < (1 << order); i++) {
+ struct page *p = page + i;
+
+ if (unlikely(check_new_page(p)))
+ return true;
+ }
+
+ return false;
+}
+
#ifdef CONFIG_DEBUG_VM
/*
* With DEBUG_VM enabled, order-0 pages are checked for expected state when
* being allocated from pcp lists. With debug_pagealloc also enabled, they are
* also checked when pcp lists are refilled from the free lists.
*/
-static inline bool check_pcp_refill(struct page *page)
+static inline bool check_pcp_refill(struct page *page, unsigned int order)
{
if (debug_pagealloc_enabled_static())
- return check_new_page(page);
+ return check_new_pages(page, order);
else
return false;
}

-static inline bool check_new_pcp(struct page *page)
+static inline bool check_new_pcp(struct page *page, unsigned int order)
{
- return check_new_page(page);
+ return check_new_pages(page, order);
}
#else
/*
@@ -2366,32 +2379,19 @@ static inline bool check_new_pcp(struct page *page)
* when pcp lists are being refilled from the free lists. With debug_pagealloc
* enabled, they are also checked when being allocated from the pcp lists.
*/
-static inline bool check_pcp_refill(struct page *page)
+static inline bool check_pcp_refill(struct page *page, unsigned int order)
{
- return check_new_page(page);
+ return check_new_pages(page, order);
}
-static inline bool check_new_pcp(struct page *page)
+static inline bool check_new_pcp(struct page *page, unsigned int order)
{
if (debug_pagealloc_enabled_static())
- return check_new_page(page);
+ return check_new_pages(page, order);
else
return false;
}
#endif /* CONFIG_DEBUG_VM */

-static bool check_new_pages(struct page *page, unsigned int order)
-{
- int i;
- for (i = 0; i < (1 << order); i++) {
- struct page *p = page + i;
-
- if (unlikely(check_new_page(p)))
- return true;
- }
-
- return false;
-}
-
inline void post_alloc_hook(struct page *page, unsigned int order,
gfp_t gfp_flags)
{
@@ -3037,7 +3037,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
if (unlikely(page == NULL))
break;

- if (unlikely(check_pcp_refill(page)))
+ if (unlikely(check_pcp_refill(page, order)))
continue;

/*
@@ -3641,7 +3641,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
page = list_first_entry(list, struct page, lru);
list_del(&page->lru);
pcp->count -= 1 << order;
- } while (check_new_pcp(page));
+ } while (check_new_pcp(page, order));

return page;
}

2022-03-09 19:34:54

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held

On Wed, Mar 9, 2022 at 4:32 AM Mel Gorman <[email protected]> wrote:

> We should because it was an oversight. Thanks for pointing that out.
>
> > Or was it a conscious choice of yours ?
> > (I presume part of the performance gains came from
> > not having to bring ~7 cache lines per 32KB chunk on x86)
> >
>
> There will be a performance penalty due to the check but it's a correctness
> vs performance issue.
>
> This? It's boot tested only.
>
> --8<--
> mm/page_alloc: check high-order pages for corruption during PCP operations
>
> Eric Dumazet pointed out that commit 44042b449872 ("mm/page_alloc: allow
> high-order pages to be stored on the per-cpu lists") only checks the head
> page during PCP refill and allocation operations. This was an oversight
> and all pages should be checked. This will incur a small performance
> penalty but it's necessary for correctness.
>
> Fixes: 44042b449872 ("mm/page_alloc: allow high-order pages to be stored on the per-cpu lists")
> Reported-by: Eric Dumazet <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> ---

SGTM, thanks Mel !

Acked-by: Eric Dumazet <[email protected]>

2022-03-12 15:52:55

by kernel test robot

[permalink] [raw]

Subject: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement

Greeting,

FYI, we noticed a

commit: 8212a964ee020471104 url: in testcase: vm-scalability
on test machine: 128 with following parameters:

runtime: 300s
size: 512G
test: anon-w-rand-hugetlb
cpufreq_governor: performance
ucode: 0xd000331

test-description: test-url: Thanks,
Oliver Sang

30.5% improvement of vm-scalability.throughput due to commit:
e34dce7029dec33c218a9 ("Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held")
ub.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504">https://github.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504
://lore.kernel.org/lkml/20220309123245.GI15701@techsingularity.net">https://lore.kernel.org/lkml/[email protected]
threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
/git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/">https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
----------------------------------------------------------------------->
href="https://github.com/intel/lkp-tests.git">https://github.com/intel/lkp-tests.git
# job file is attached in this email
--compatible job.yaml # generate the yaml file for lkp run
run generated-yaml-file
across any failure that blocks the test,
~/.lkp and /lkp dir to run from a clean state.
==============================================================
config/rootfs/runtime/size/tbox_group/test/testcase/ucode:
rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/512G/lkp-icl-2sp5/anon-w-rand-hugetlb/vm-scalability/0xd000331
call check_new_pages() while zone spinlock is not held")
020471104e34dce70
-----------------
%stddev
\
0.00 ? 4% vm-scalability.free_time
59208 ? 2% vm-scalability.median
+30.5% 8293110 ? 2% vm-scalability.throughput
177.98 ? 3% vm-scalability.time.elapsed_time
177.98 ? 3% vm-scalability.time.elapsed_time.max
91162 ? 10% vm-scalability.time.involuntary_context_switches
10641 vm-scalability.time.percent_of_cpu_this_job_got
1496 ? 6% vm-scalability.time.system_time
17443 ? 3% vm-scalability.time.user_time
8130 vm-scalability.time.voluntary_context_switches
214.10 ? 2% uptime.boot
2771 ? 5% vmstat.system.cs
16.05 ? 8% mpstat.cpu.all.idle%
6.84 ? 3% mpstat.cpu.all.sys%
602238 ? 6% numa-numastat.node1.local_node
691955 ? 6% numa-numastat.node1.numa_hit
+26.7% 7356010 ? 10% turbostat.C1E
15.48 ? 9% turbostat.C1E%
-17.8% 49202950 ? 3% turbostat.IRQ
19301 ? 21% meminfo.Active
18325 ? 23% meminfo.Active(anon)
46542 ? 3% meminfo.Mapped
102591 ? 3% meminfo.Shmem
13823 ? 29% numa-meminfo.node1.Active
13173 ? 30% numa-meminfo.node1.Active(anon)
8233 ? 33% numa-meminfo.node1.Mapped
21189 ? 13% numa-meminfo.node1.Shmem
3246 ? 31% numa-vmstat.node1.nr_active_anon
2171 ? 32% numa-vmstat.node1.nr_mapped
5258 ? 14% numa-vmstat.node1.nr_shmem
3246 ? 31% numa-vmstat.node1.nr_zone_active_anon
4570 ? 23% proc-vmstat.nr_active_anon
70203 proc-vmstat.nr_anon_pages
+4.5% 1794462 proc-vmstat.nr_dirty_background_threshold
+4.5% 3593312 proc-vmstat.nr_dirty_threshold
-1.4% 632171 proc-vmstat.nr_file_pages
+4.4% 18125242 proc-vmstat.nr_free_pages
91059 proc-vmstat.nr_inactive_anon
11911 ? 2% proc-vmstat.nr_mapped
25663 ? 4% proc-vmstat.nr_shmem
4570 ? 23% proc-vmstat.nr_zone_active_anon
91059 proc-vmstat.nr_zone_inactive_anon
12542 ? 13% proc-vmstat.numa_hint_faults
2964 ? 45% proc-vmstat.numa_hint_faults_local
-10.9% 1423466 proc-vmstat.numa_hit
-11.8% 1307551 proc-vmstat.numa_local
-1.2% 115916 proc-vmstat.numa_other
260178 ? 53% proc-vmstat.numa_pte_updates
24113 ? 7% proc-vmstat.pgactivate
-11.0% 1424527 proc-vmstat.pgalloc_normal
-14.2% 1368920 proc-vmstat.pgfault
-20.8% 1275284 proc-vmstat.pgfree
42496 ? 5% proc-vmstat.pgreuse
14.04 perf-stat.i.MPKI
+22.0% 1.645e+10 ? 3% perf-stat.i.branch-instructions
+22.4% 8.517e+08 ? 3% perf-stat.i.cache-misses
+22.4% 8.71e+08 ? 3% perf-stat.i.cache-references
5.58 ? 6% perf-stat.i.cpi
-5.1% 3.549e+11 perf-stat.i.cpu-cycles
427.87 ? 5% perf-stat.i.cycles-between-cache-misses
+22.1% 1.959e+10 ? 3% perf-stat.i.dTLB-loads
0.01 ? 4% perf-stat.i.dTLB-store-miss-rate%
-4.6% 878569 perf-stat.i.dTLB-store-misses
+22.0% 7.078e+09 ? 3% perf-stat.i.dTLB-stores
+22.0% 6.911e+10 ? 3% perf-stat.i.instructions
0.20 ? 3% perf-stat.i.ipc
2.77 perf-stat.i.metric.GHz
318.61 ? 22% perf-stat.i.metric.K/sec
349.59 ? 3% perf-stat.i.metric.M/sec
6957 ? 2% perf-stat.i.minor-faults
800024 ? 7% perf-stat.i.node-loads
16.31 ? 12% perf-stat.i.node-store-miss-rate%
-27.7% 1.291e+08 ? 7% perf-stat.i.node-store-misses
+39.8% 7.172e+08 ? 5% perf-stat.i.node-stores
6959 ? 2% perf-stat.i.page-faults
0.01 ? 4% perf-stat.overall.branch-miss-rate%
5.16 ? 3% perf-stat.overall.cpi
416.69 ? 3% perf-stat.overall.cycles-between-cache-misses
0.01 ? 3% perf-stat.overall.dTLB-store-miss-rate%
0.19 ? 3% perf-stat.overall.ipc
15.28 ? 10% perf-stat.overall.node-store-miss-rate%
+22.3% 1.622e+10 ? 3% perf-stat.ps.branch-instructions
+22.7% 8.444e+08 ? 3% perf-stat.ps.cache-misses
+22.7% 8.638e+08 ? 3% perf-stat.ps.cache-references
-5.2% 3.515e+11 perf-stat.ps.cpu-cycles
+22.4% 1.931e+10 ? 3% perf-stat.ps.dTLB-loads
-4.6% 868700 perf-stat.ps.dTLB-store-misses
+22.3% 6.975e+09 ? 3% perf-stat.ps.dTLB-stores
+22.3% 6.813e+10 ? 3% perf-stat.ps.instructions
7038 perf-stat.ps.minor-faults
816710 ? 8% perf-stat.ps.node-loads
-27.8% 1.277e+08 ? 7% perf-stat.ps.node-store-misses
+40.3% 7.113e+08 ? 5% perf-stat.ps.node-stores
7039 perf-stat.ps.page-faults
0.80 ? 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages
0.80 ? 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page
0.83 ? 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page
0.84 ? 8% perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory
0.84 ? 8% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page
0.84 ? 8% perf-profile.calltrace.cycles-pp.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages
0.85 ? 8% perf-profile.calltrace.cycles-pp.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap
0.88 ? 8% perf-profile.calltrace.cycles-pp.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region
0.88 ? 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
0.88 ? 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.88 ? 8% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.88 ? 8% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.88 ? 8% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.88 ? 8% perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.88 ? 8% perf-profile.calltrace.cycles-pp.__mmap
0.88 ? 8% perf-profile.calltrace.cycles-pp.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.88 ? 8% perf-profile.calltrace.cycles-pp.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff
0.88 ? 8% perf-profile.calltrace.cycles-pp.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap
64.98 ? 2% perf-profile.calltrace.cycles-pp.do_rw_once
0.11 ? 9% perf-profile.children.cycles-pp.task_tick_fair
0.17 ? 5% perf-profile.children.cycles-pp.scheduler_tick
0.24 ? 3% perf-profile.children.cycles-pp.tick_sched_timer
0.24 ? 4% perf-profile.children.cycles-pp.tick_sched_handle
0.23 ? 4% perf-profile.children.cycles-pp.update_process_times
0.29 ? 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.45 ? 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.45 ? 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.86 ? 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.88 ? 8% perf-profile.children.cycles-pp.__mmap
0.88 ? 8% perf-profile.children.cycles-pp.ksys_mmap_pgoff
0.88 ? 8% perf-profile.children.cycles-pp.hugetlbfs_file_mmap
0.88 ? 8% perf-profile.children.cycles-pp.hugetlb_reserve_pages
0.88 ? 8% perf-profile.children.cycles-pp.hugetlb_acct_memory
0.88 ? 8% perf-profile.children.cycles-pp.alloc_surplus_huge_page
0.88 ? 8% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.88 ? 8% perf-profile.children.cycles-pp.do_mmap
0.88 ? 8% perf-profile.children.cycles-pp.mmap_region
1.16 ? 9% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
1.16 ? 9% perf-profile.children.cycles-pp.do_syscall_64
0.85 ? 8% perf-profile.children.cycles-pp.alloc_fresh_huge_page
0.84 ? 8% perf-profile.children.cycles-pp.alloc_buddy_huge_page
0.84 ? 8% perf-profile.children.cycles-pp.get_page_from_freelist
0.84 ? 8% perf-profile.children.cycles-pp.__alloc_pages
0.82 ? 8% perf-profile.children.cycles-pp._raw_spin_lock
0.83 ? 8% perf-profile.children.cycles-pp.rmqueue_bulk
0.86 ? 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
may affect actual performance.
org/hyperkitty/list/lkp@lists.01.org">https://lists.01.org/hyperkitty/list/[email protected]

Attachments:

(No filename) (14.74 kB)
config-5.17.0-rc7-00001-g8212a964ee02 (164.58 kB)
job-script (8.24 kB)
job.yaml (5.55 kB)
reproduce (2.05 kB)
Download all attachments

2022-03-13 10:55:25

by Vlastimil Babka

[permalink] [raw]

Subject: Re: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement

On 3/12/22 16:43, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed a 30.5% improvement of vm-scalability.throughput due to commit:
>
>
> commit: 8212a964ee020471104e34dce7029dec33c218a9 ("Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held")
> url: https://github.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504
> patch link: https://lore.kernel.org/lkml/[email protected]

Heh, that's weird. I would expect some improvement from Eric's patch,
but this seems to be actually about Mel's "mm/page_alloc: check
high-order pages for corruption during PCP operations" applied directly
on 5.17-rc7 per the github url above. This was rather expected to make
performance worse if anything, so maybe the improvement is due to some
unexpected side-effect of different inlining decisions or cache alignment...

> in testcase: vm-scalability
> on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
> with following parameters:
>
> runtime: 300s
> size: 512G
> test: anon-w-rand-hugetlb
> cpufreq_governor: performance
> ucode: 0xd000331
>
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>
>
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
> gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/512G/lkp-icl-2sp5/anon-w-rand-hugetlb/vm-scalability/0xd000331
>
> commit:
> v5.17-rc7
> 8212a964ee ("mm/page_alloc: call check_new_pages() while zone spinlock is not held")
>
> v5.17-rc7 8212a964ee020471104e34dce70
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 0.00 ± 5% -7.4% 0.00 ± 4% vm-scalability.free_time
> 47190 ± 2% +25.5% 59208 ± 2% vm-scalability.median
> 6352467 ± 2% +30.5% 8293110 ± 2% vm-scalability.throughput
> 218.97 ± 2% -18.7% 177.98 ± 3% vm-scalability.time.elapsed_time
> 218.97 ± 2% -18.7% 177.98 ± 3% vm-scalability.time.elapsed_time.max
> 121357 ± 7% -24.9% 91162 ± 10% vm-scalability.time.involuntary_context_switches
> 11226 -5.2% 10641 vm-scalability.time.percent_of_cpu_this_job_got
> 2311 ± 3% -35.2% 1496 ± 6% vm-scalability.time.system_time
> 22275 ± 2% -21.7% 17443 ± 3% vm-scalability.time.user_time
> 9358 ± 3% -13.1% 8130 vm-scalability.time.voluntary_context_switches
> 255.23 -16.1% 214.10 ± 2% uptime.boot
> 2593 +6.8% 2771 ± 5% vmstat.system.cs
> 11.51 ± 7% +4.5 16.05 ± 8% mpstat.cpu.all.idle%
> 8.48 ± 2% -1.6 6.84 ± 3% mpstat.cpu.all.sys%
> 727581 ± 12% -17.2% 602238 ± 6% numa-numastat.node1.local_node
> 798037 ± 8% -13.3% 691955 ± 6% numa-numastat.node1.numa_hit
> 5806206 ± 17% +26.7% 7356010 ± 10% turbostat.C1E
> 9.55 ± 26% +5.9 15.48 ± 9% turbostat.C1E%
> 59854751 ± 2% -17.8% 49202950 ± 3% turbostat.IRQ
> 42804 ± 6% -54.9% 19301 ± 21% meminfo.Active
> 41832 ± 7% -56.2% 18325 ± 23% meminfo.Active(anon)
> 63386 ± 6% -26.6% 46542 ± 3% meminfo.Mapped
> 137758 -25.5% 102591 ± 3% meminfo.Shmem
> 36980 ± 5% -62.6% 13823 ± 29% numa-meminfo.node1.Active
> 36495 ± 5% -63.9% 13173 ± 30% numa-meminfo.node1.Active(anon)
> 19454 ± 26% -57.7% 8233 ± 33% numa-meminfo.node1.Mapped
> 65896 ± 38% -67.8% 21189 ± 13% numa-meminfo.node1.Shmem
> 9185 ± 6% -64.7% 3246 ± 31% numa-vmstat.node1.nr_active_anon
> 4769 ± 26% -54.5% 2171 ± 32% numa-vmstat.node1.nr_mapped
> 16462 ± 37% -68.1% 5258 ± 14% numa-vmstat.node1.nr_shmem
> 9185 ± 6% -64.7% 3246 ± 31% numa-vmstat.node1.nr_zone_active_anon
> 10436 ± 5% -56.2% 4570 ± 23% proc-vmstat.nr_active_anon
> 69290 +1.3% 70203 proc-vmstat.nr_anon_pages
> 1717695 +4.5% 1794462 proc-vmstat.nr_dirty_background_threshold
> 3439592 +4.5% 3593312 proc-vmstat.nr_dirty_threshold
> 640952 -1.4% 632171 proc-vmstat.nr_file_pages
> 17356030 +4.4% 18125242 proc-vmstat.nr_free_pages
> 93258 -2.4% 91059 proc-vmstat.nr_inactive_anon
> 16187 ± 5% -26.4% 11911 ± 2% proc-vmstat.nr_mapped
> 34477 ± 2% -25.6% 25663 ± 4% proc-vmstat.nr_shmem
> 10436 ± 5% -56.2% 4570 ± 23% proc-vmstat.nr_zone_active_anon
> 93258 -2.4% 91059 proc-vmstat.nr_zone_inactive_anon
> 32151 ± 16% -61.0% 12542 ± 13% proc-vmstat.numa_hint_faults
> 21214 ± 22% -86.0% 2964 ± 45% proc-vmstat.numa_hint_faults_local
> 1598135 -10.9% 1423466 proc-vmstat.numa_hit
> 1481881 -11.8% 1307551 proc-vmstat.numa_local
> 117279 -1.2% 115916 proc-vmstat.numa_other
> 555445 ± 16% -53.2% 260178 ± 53% proc-vmstat.numa_pte_updates
> 93889 ± 4% -74.3% 24113 ± 7% proc-vmstat.pgactivate
> 1599893 -11.0% 1424527 proc-vmstat.pgalloc_normal
> 1594626 -14.2% 1368920 proc-vmstat.pgfault
> 1609987 -20.8% 1275284 proc-vmstat.pgfree
> 49893 -14.8% 42496 ± 5% proc-vmstat.pgreuse
> 15.23 ± 2% -7.8% 14.04 perf-stat.i.MPKI
> 1.348e+10 +22.0% 1.645e+10 ± 3% perf-stat.i.branch-instructions
> 6.957e+08 ± 2% +22.4% 8.517e+08 ± 3% perf-stat.i.cache-misses
> 7.117e+08 ± 2% +22.4% 8.71e+08 ± 3% perf-stat.i.cache-references
> 7.86 ± 2% -29.0% 5.58 ± 6% perf-stat.i.cpi
> 3.739e+11 -5.1% 3.549e+11 perf-stat.i.cpu-cycles
> 550.18 ± 3% -22.2% 427.87 ± 5% perf-stat.i.cycles-between-cache-misses
> 1.605e+10 +22.1% 1.959e+10 ± 3% perf-stat.i.dTLB-loads
> 0.02 ± 3% -0.0 0.01 ± 4% perf-stat.i.dTLB-store-miss-rate%
> 921125 ± 2% -4.6% 878569 perf-stat.i.dTLB-store-misses
> 5.803e+09 +22.0% 7.078e+09 ± 3% perf-stat.i.dTLB-stores
> 5.665e+10 +22.0% 6.911e+10 ± 3% perf-stat.i.instructions
> 0.16 ± 3% +26.1% 0.20 ± 3% perf-stat.i.ipc
> 2.92 -5.1% 2.77 perf-stat.i.metric.GHz
> 123.32 ± 16% +158.4% 318.61 ± 22% perf-stat.i.metric.K/sec
> 286.92 +21.8% 349.59 ± 3% perf-stat.i.metric.M/sec
> 6641 +4.8% 6957 ± 2% perf-stat.i.minor-faults
> 586608 ± 12% +36.4% 800024 ± 7% perf-stat.i.node-loads
> 26.79 ± 4% -10.5 16.31 ± 12% perf-stat.i.node-store-miss-rate%
> 1.785e+08 ± 2% -27.7% 1.291e+08 ± 7% perf-stat.i.node-store-misses
> 5.131e+08 ± 3% +39.8% 7.172e+08 ± 5% perf-stat.i.node-stores
> 6643 +4.8% 6959 ± 2% perf-stat.i.page-faults
> 0.02 ± 18% -0.0 0.01 ± 4% perf-stat.overall.branch-miss-rate%
> 6.66 ± 2% -22.5% 5.16 ± 3% perf-stat.overall.cpi
> 539.35 ± 2% -22.7% 416.69 ± 3% perf-stat.overall.cycles-between-cache-misses
> 0.02 ± 3% -0.0 0.01 ± 3% perf-stat.overall.dTLB-store-miss-rate%
> 0.15 ± 2% +29.1% 0.19 ± 3% perf-stat.overall.ipc
> 25.88 ± 4% -10.6 15.28 ± 10% perf-stat.overall.node-store-miss-rate%
> 1.325e+10 ± 2% +22.3% 1.622e+10 ± 3% perf-stat.ps.branch-instructions
> 6.88e+08 ± 2% +22.7% 8.444e+08 ± 3% perf-stat.ps.cache-misses
> 7.043e+08 ± 2% +22.7% 8.638e+08 ± 3% perf-stat.ps.cache-references
> 3.708e+11 -5.2% 3.515e+11 perf-stat.ps.cpu-cycles
> 1.577e+10 ± 2% +22.4% 1.931e+10 ± 3% perf-stat.ps.dTLB-loads
> 910623 ± 2% -4.6% 868700 perf-stat.ps.dTLB-store-misses
> 5.701e+09 ± 2% +22.3% 6.975e+09 ± 3% perf-stat.ps.dTLB-stores
> 5.569e+10 ± 2% +22.3% 6.813e+10 ± 3% perf-stat.ps.instructions
> 6716 +4.8% 7038 perf-stat.ps.minor-faults
> 595302 ± 11% +37.2% 816710 ± 8% perf-stat.ps.node-loads
> 1.769e+08 ± 2% -27.8% 1.277e+08 ± 7% perf-stat.ps.node-store-misses
> 5.071e+08 ± 3% +40.3% 7.113e+08 ± 5% perf-stat.ps.node-stores
> 6717 +4.8% 7039 perf-stat.ps.page-faults
> 0.00 +0.8 0.80 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages
> 0.00 +0.8 0.80 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page
> 0.00 +0.8 0.83 ± 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page
> 0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory
> 0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page
> 0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages
> 0.00 +0.9 0.85 ± 8% perf-profile.calltrace.cycles-pp.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.__mmap
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff
> 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap
> 60.28 ± 5% +4.7 64.98 ± 2% perf-profile.calltrace.cycles-pp.do_rw_once
> 0.09 ± 8% +0.0 0.11 ± 9% perf-profile.children.cycles-pp.task_tick_fair
> 0.14 ± 7% +0.0 0.17 ± 5% perf-profile.children.cycles-pp.scheduler_tick
> 0.20 ± 9% +0.0 0.24 ± 3% perf-profile.children.cycles-pp.tick_sched_timer
> 0.19 ± 9% +0.0 0.24 ± 4% perf-profile.children.cycles-pp.tick_sched_handle
> 0.19 ± 9% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.update_process_times
> 0.24 ± 8% +0.0 0.29 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
> 0.40 ± 8% +0.1 0.45 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> 0.39 ± 7% +0.1 0.45 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
> 0.26 ± 71% +0.6 0.86 ± 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.__mmap
> 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.ksys_mmap_pgoff
> 0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlbfs_file_mmap
> 0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlb_reserve_pages
> 0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlb_acct_memory
> 0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.alloc_surplus_huge_page
> 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.vm_mmap_pgoff
> 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.do_mmap
> 0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.mmap_region
> 0.55 ± 44% +0.6 1.16 ± 9% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.55 ± 44% +0.6 1.16 ± 9% perf-profile.children.cycles-pp.do_syscall_64
> 0.12 ± 71% +0.7 0.85 ± 8% perf-profile.children.cycles-pp.alloc_fresh_huge_page
> 0.03 ± 70% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.alloc_buddy_huge_page
> 0.04 ± 71% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.get_page_from_freelist
> 0.04 ± 71% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.__alloc_pages
> 0.00 +0.8 0.82 ± 8% perf-profile.children.cycles-pp._raw_spin_lock
> 0.00 +0.8 0.83 ± 8% perf-profile.children.cycles-pp.rmqueue_bulk
> 0.26 ± 71% +0.6 0.86 ± 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> ---
> 0-DAY CI Kernel Test Service
> https://lists.01.org/hyperkitty/list/[email protected]
>
> Thanks,
> Oliver Sang
>

2022-03-14 12:35:57

by Eric Dumazet

[permalink] [raw]

Subject: Re: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5% improvement

On Sat, Mar 12, 2022 at 10:59 AM Vlastimil Babka <[email protected]> wrote:
>
> On 3/12/22 16:43, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed a 30.5% improvement of vm-scalability.throughput due to commit:
> >
> >
> > commit: 8212a964ee020471104e34dce7029dec33c218a9 ("Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held")
> > url: https://github.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504
> > patch link: https://lore.kernel.org/lkml/[email protected]
>
> Heh, that's weird. I would expect some improvement from Eric's patch,
> but this seems to be actually about Mel's "mm/page_alloc: check
> high-order pages for corruption during PCP operations" applied directly
> on 5.17-rc7 per the github url above. This was rather expected to make
> performance worse if anything, so maybe the improvement is due to some
> unexpected side-effect of different inlining decisions or cache alignment...
>

I doubt this has anything to do with inlining or cache alignment.

I am not familiar with the benchmark, but its name
(anon-w-rand-hugetlb) hints at hugetlb ?

After Mel fix, we go over 512 'struct page' to perform sanity checks,
thus loading into cpu caches the 512 cache lines.

This caching is done while no lock is held.

If after this huge page allocation some mm operation needs to access
these 512 struct pages,
while holding a lock, then sure, there is a huge gain.