2023-05-16 09:27:02

by Raghavendra K T

[permalink] [raw]
Subject: [RFC PATCH V2 0/1] sched/numa: Fix disjoint set vma scan regression

With the numa scan enhancements [1], only the threads which had previously
accessed vma are allowed to scan.

While this had improved significant system time overhead, there were corner
cases, which genuinely need some relaxation for e.g., concern raised by
PeterZ where unfairness amongst the thread belonging to disjoint set of vmas,
that can potentially amplify the side effects of vma regions belonging to some
of the tasks being left unscanned.

[1] had handled that issue by allowing first two scans at mm level
(mm->numa_scan_seq) unconditionally. But that was not enough.

One of the test that exercise similar side effect is numa01_THREAD_ALLOC where
allocation happen by main thread and it is divided into memory chunks of 24MB
to be continuously bzeroed.

(this is run default by LKP tests while numa01 is run default in mmtests which
operate on full 3GB region by each thread)

While RFC [2] tried to address this issue, the logic had more heuristics.
After posting [2], [3] also confirmed same regression.

The current patch addresses the same issue in a more accurate way as
follows:

(1) Any disjoint vma which is not associated with a task, that tries to
scan is now allowed to induce prot_none faults. Total number of such
unconditional scans allowed per vma is derived based on the exact vma size
as follows:

total scans allowed = 1/2 * vma_size / scan_size.

(2) Total scans already done is maintained using a per vma scan counter.

(3) For a very long running task, this scan counter is reset after 16
times whole mm of task scanning took place (using mm->numa_scan_seq).

With above patch, numa01_THREAD_ALLOC regression reported is resolved,
but please note that with [1] there was a drastic decrease in system time
for mmtest numa01, this patch adds back some of the system time.

Summary: numa scan enhancement patch [1] togethor with the current patchset
improves overall system time by filtering unnecessary numa scan
while still retaining necessary scanning in some corner cases which
involves disjoint set vmas.

(Mel, PeterZ this patch looks more precise handling of the issue)

Your comments/Ideas are welcome.

Changes since V1:
1) Rewrite entire logic based on actual vma size than heuristics
2) Added Reported-by kernel test robot and internal LKP test
3) Rebased to 6.4.-rc1 (ba0ad6ed89)

Result:
SUT: Milan w/ 2 numa nodes 256 cpus

Run of numa01_THREAD__ALLOC 6.4.0-rc1 (that has w/ numascan enhancement)
base-numascan base base+fix
real 1m3.025s 1m24.163s 1m3.551s
user 213m44.232s 251m3.638s 219m55.662s
sys 6m26.598s 0m13.056s 2m35.767s

numa_hit 5478165 4395752 4907431
numa_local 5478103 4395366 4907044
numa_other 62 386 387
numa_pte_updates 1989274 11606 1265014
numa_hint_faults 1756059 515 1135804
numa_hint_faults_local 971500 486 558076
numa_pages_migrated 784211 29 577728

Below is the mmtest autonuma performance
autonuma
===========
base: 6.4.0-rc1+
base w/o numascan base(=w/ numascan) base + fix


Amean syst-NUMA01 247.46 ( 0.00%) 18.52 * 92.51%* 148.18 * 40.12%*
Amean syst-NUMA01_THREADLOCAL 0.23 ( 0.00%) 0.21 * 5.06%* 0.22 * 1.90%*
Amean syst-NUMA02 0.70 ( 0.00%) 0.70 * 1.02%* 0.73 * -3.46%*
Amean syst-NUMA02_SMT 0.59 ( 0.00%) 0.59 * 0.00%* 0.58 * 2.42%*
Amean elsp-NUMA01 309.54 ( 0.00%) 284.57 * 8.07%* 306.84 * 0.87%*
Amean elsp-NUMA01_THREADLOCAL 1.02 ( 0.00%) 1.02 * 0.42%* 1.04 * -1.53%*
Amean elsp-NUMA02 3.22 ( 0.00%) 3.55 * -10.21%* 3.32 * -3.15%*
Amean elsp-NUMA02_SMT 3.71 ( 0.00%) 3.86 * -4.08%* 3.74 * -0.69%*

Duration User 383183.43 294971.18 357446.52
Duration System 1743.53 140.85 1048.57
Duration Elapsed 2232.09 2062.33 2214.44

Ops NUMA alloc hit 57057379.00 43378289.00 51885613.00
Ops NUMA alloc local 57055256.00 43377265.00 51884407.00
Ops NUMA base-page range updates 137882746.00 25895.00 83600214.00
Ops NUMA PTE updates 137882746.00 25895.00 83600214.00
Ops NUMA hint faults 139609832.00 22651.00 84634363.00
Ops NUMA hint local faults % 113091055.00 18200.00 65809169.00
Ops NUMA hint local percent 81.01 80.35 77.76
Ops NUMA pages migrated 13415929.00 1798.00 9638327.00
Ops AutoNUMA cost 699269.24 113.47 423940.14

links:
[1] https://lore.kernel.org/lkml/[email protected]/T/#t
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://lore.kernel.org/lkml/[email protected]/T/

Note: I have updated patch-1 with appropriate log required for the commit, so some of
above result/info is duplicated.

Raghavendra K T (1):
sched/numa: Fix disjoint set vma scan regression

include/linux/mm_types.h | 1 +
kernel/sched/fair.c | 41 ++++++++++++++++++++++++++++++++--------
2 files changed, 34 insertions(+), 8 deletions(-)

--
2.34.1



2023-05-16 09:36:37

by Raghavendra K T

[permalink] [raw]
Subject: [RFC PATCH V2 1/1] sched/numa: Fix disjoint set vma scan regression

With the numa scan enhancements [1], only the threads which had previously
accessed vma are allowed to scan.

While this had improved significant system time overhead, there were corner
cases, which genuinely need some relaxation. For e.g.,

1) Concern raised by PeterZ, where if there are N partition sets of vmas
belonging to tasks, then unfairness in allowing these threads to scan could
potentially amplify the side effect of some of the vmas being left
unscanned.

2) Below reports of LKP numa01 benchmark regression.

Currently this was handled by allowing first two scanning unconditional
as indicated by mm->numa_scan_seq. This is imprecise since for some
benchmark vma scanning might itself start at numa_scan_seq > 2.

Solution:
Allow unconditional scanning of vmas of tasks depending on vma size. This
is achieved by maintaining a per vma scan counter, where

f(allowed_to_scan) = f(scan_counter < vma_size / scan_size)

Fixes: fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
regression.

Result:
numa01_THREAD_ALLOC result on 6.4.0-rc1 (that has w/ numascan enhancement)
base-numascan base base+fix
real 1m3.025s 1m24.163s 1m3.551s
user 213m44.232s 251m3.638s 219m55.662s
sys 6m26.598s 0m13.056s 2m35.767s

numa_hit 5478165 4395752 4907431
numa_local 5478103 4395366 4907044
numa_other 62 386 387
numa_pte_updates 1989274 11606 1265014
numa_hint_faults 1756059 515 1135804
numa_hint_faults_local 971500 486 558076
numa_pages_migrated 784211 29 577728

Summary: Regression in base is recovered by allowing scanning as required.

[1] https://lore.kernel.org/lkml/[email protected]/T/#t

Reported-by: Aithal Srikanth <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/T/
Signed-off-by: Raghavendra K T <[email protected]>
---
include/linux/mm_types.h | 1 +
kernel/sched/fair.c | 41 ++++++++++++++++++++++++++++++++--------
2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 306a3d1a0fa6..992e460a713e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -479,6 +479,7 @@ struct vma_numab_state {
unsigned long next_scan;
unsigned long next_pid_reset;
unsigned long access_pids[2];
+ unsigned int scan_counter;
};

/*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 373ff5f55884..2c3e17e7fc2f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2931,20 +2931,34 @@ static void reset_ptenuma_scan(struct task_struct *p)
static bool vma_is_accessed(struct vm_area_struct *vma)
{
unsigned long pids;
+ unsigned int vma_size;
+ unsigned int scan_threshold;
+ unsigned int scan_size;
+
+ pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1];
+
+ if (test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids))
+ return true;
+
+ scan_size = READ_ONCE(sysctl_numa_balancing_scan_size);
+ /* vma size in MB */
+ vma_size = (vma->vm_end - vma->vm_start) >> 20;
+
+ /* Total scans needed to cover VMA */
+ scan_threshold = (vma_size / scan_size);
+
/*
- * Allow unconditional access first two times, so that all the (pages)
- * of VMAs get prot_none fault introduced irrespective of accesses.
+ * Allow the scanning of half of disjoint set's VMA to induce
+ * prot_none fault irrespective of accesses.
* This is also done to avoid any side effect of task scanning
* amplifying the unfairness of disjoint set of VMAs' access.
*/
- if (READ_ONCE(current->mm->numa_scan_seq) < 2)
- return true;
-
- pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1];
- return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids);
+ scan_threshold = 1 + (scan_threshold >> 1);
+ return (READ_ONCE(vma->numab_state->scan_counter) <= scan_threshold);
}

-#define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
+#define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
+#define DISJOINT_VMA_SCAN_RENEW_THRESH 16

/*
* The expensive part of numa migration is done from task_work context.
@@ -3058,6 +3072,8 @@ static void task_numa_work(struct callback_head *work)
/* Reset happens after 4 times scan delay of scan start */
vma->numab_state->next_pid_reset = vma->numab_state->next_scan +
msecs_to_jiffies(VMA_PID_RESET_PERIOD);
+
+ WRITE_ONCE(vma->numab_state->scan_counter, 0);
}

/*
@@ -3068,6 +3084,13 @@ static void task_numa_work(struct callback_head *work)
vma->numab_state->next_scan))
continue;

+ /*
+ * For long running tasks, renew the disjoint vma scanning
+ * periodically.
+ */
+ if (mm->numa_scan_seq && !(mm->numa_scan_seq % DISJOINT_VMA_SCAN_RENEW_THRESH))
+ WRITE_ONCE(vma->numab_state->scan_counter, 0);
+
/* Do not scan the VMA if task has not accessed */
if (!vma_is_accessed(vma))
continue;
@@ -3083,6 +3106,8 @@ static void task_numa_work(struct callback_head *work)
vma->numab_state->access_pids[0] = READ_ONCE(vma->numab_state->access_pids[1]);
vma->numab_state->access_pids[1] = 0;
}
+ WRITE_ONCE(vma->numab_state->scan_counter,
+ READ_ONCE(vma->numab_state->scan_counter) + 1);

do {
start = max(start, vma->vm_start);
--
2.34.1


2023-05-19 08:16:21

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC PATCH V2 1/1] sched/numa: Fix disjoint set vma scan regression

On 16-May-23 2:49 PM, Raghavendra K T wrote:
> With the numa scan enhancements [1], only the threads which had previously
> accessed vma are allowed to scan.
>
> While this had improved significant system time overhead, there were corner
> cases, which genuinely need some relaxation. For e.g.,
>
> 1) Concern raised by PeterZ, where if there are N partition sets of vmas
> belonging to tasks, then unfairness in allowing these threads to scan could
> potentially amplify the side effect of some of the vmas being left
> unscanned.
>
> 2) Below reports of LKP numa01 benchmark regression.
>
> Currently this was handled by allowing first two scanning unconditional
> as indicated by mm->numa_scan_seq. This is imprecise since for some
> benchmark vma scanning might itself start at numa_scan_seq > 2.
>
> Solution:
> Allow unconditional scanning of vmas of tasks depending on vma size. This
> is achieved by maintaining a per vma scan counter, where
>
> f(allowed_to_scan) = f(scan_counter < vma_size / scan_size)
>
> Fixes: fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
> regression.
>
> Result:
> numa01_THREAD_ALLOC result on 6.4.0-rc1 (that has w/ numascan enhancement)
> base-numascan base base+fix
> real 1m3.025s 1m24.163s 1m3.551s
> user 213m44.232s 251m3.638s 219m55.662s
> sys 6m26.598s 0m13.056s 2m35.767s
>
> numa_hit 5478165 4395752 4907431
> numa_local 5478103 4395366 4907044
> numa_other 62 386 387
> numa_pte_updates 1989274 11606 1265014
> numa_hint_faults 1756059 515 1135804
> numa_hint_faults_local 971500 486 558076
> numa_pages_migrated 784211 29 577728
>
> Summary: Regression in base is recovered by allowing scanning as required.
>
> [1] https://lore.kernel.org/lkml/[email protected]/T/#t
>
> Reported-by: Aithal Srikanth <[email protected]>
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/lkml/[email protected]/T/
> Signed-off-by: Raghavendra K T <[email protected]>
> ---
> include/linux/mm_types.h | 1 +
> kernel/sched/fair.c | 41 ++++++++++++++++++++++++++++++++--------
> 2 files changed, 34 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 306a3d1a0fa6..992e460a713e 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -479,6 +479,7 @@ struct vma_numab_state {
> unsigned long next_scan;
> unsigned long next_pid_reset;
> unsigned long access_pids[2];
> + unsigned int scan_counter;
> };
>
> /*
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 373ff5f55884..2c3e17e7fc2f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2931,20 +2931,34 @@ static void reset_ptenuma_scan(struct task_struct *p)
> static bool vma_is_accessed(struct vm_area_struct *vma)
> {
> unsigned long pids;
> + unsigned int vma_size;
> + unsigned int scan_threshold;
> + unsigned int scan_size;
> +
> + pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1];
> +
> + if (test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids))
> + return true;
> +
> + scan_size = READ_ONCE(sysctl_numa_balancing_scan_size);
> + /* vma size in MB */
> + vma_size = (vma->vm_end - vma->vm_start) >> 20;
> +
> + /* Total scans needed to cover VMA */
> + scan_threshold = (vma_size / scan_size);
> +
> /*
> - * Allow unconditional access first two times, so that all the (pages)
> - * of VMAs get prot_none fault introduced irrespective of accesses.
> + * Allow the scanning of half of disjoint set's VMA to induce
> + * prot_none fault irrespective of accesses.
> * This is also done to avoid any side effect of task scanning
> * amplifying the unfairness of disjoint set of VMAs' access.
> */
> - if (READ_ONCE(current->mm->numa_scan_seq) < 2)
> - return true;
> -
> - pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1];
> - return test_bit(hash_32(current->pid, ilog2(BITS_PER_LONG)), &pids);
> + scan_threshold = 1 + (scan_threshold >> 1);
> + return (READ_ONCE(vma->numab_state->scan_counter) <= scan_threshold);
> }
>
> -#define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
> +#define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
> +#define DISJOINT_VMA_SCAN_RENEW_THRESH 16
>
> /*
> * The expensive part of numa migration is done from task_work context.
> @@ -3058,6 +3072,8 @@ static void task_numa_work(struct callback_head *work)
> /* Reset happens after 4 times scan delay of scan start */
> vma->numab_state->next_pid_reset = vma->numab_state->next_scan +
> msecs_to_jiffies(VMA_PID_RESET_PERIOD);
> +
> + WRITE_ONCE(vma->numab_state->scan_counter, 0);
> }
>
> /*
> @@ -3068,6 +3084,13 @@ static void task_numa_work(struct callback_head *work)
> vma->numab_state->next_scan))
> continue;
>
> + /*
> + * For long running tasks, renew the disjoint vma scanning
> + * periodically.
> + */
> + if (mm->numa_scan_seq && !(mm->numa_scan_seq % DISJOINT_VMA_SCAN_RENEW_THRESH))

Don't you need a READ_ONCE() accessor for mm->numa_scan_seq?

Regards,
Bharata.

2023-05-19 12:18:26

by Raghavendra K T

[permalink] [raw]
Subject: Re: [RFC PATCH V2 1/1] sched/numa: Fix disjoint set vma scan regression

On 5/19/2023 1:26 PM, Bharata B Rao wrote:
> On 16-May-23 2:49 PM, Raghavendra K T wrote:
>> With the numa scan enhancements [1], only the threads which had previously
[...]
>> -#define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
>> +#define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
>> +#define DISJOINT_VMA_SCAN_RENEW_THRESH 16
>>
>> /*
>> * The expensive part of numa migration is done from task_work context.
>> @@ -3058,6 +3072,8 @@ static void task_numa_work(struct callback_head *work)
>> /* Reset happens after 4 times scan delay of scan start */
>> vma->numab_state->next_pid_reset = vma->numab_state->next_scan +
>> msecs_to_jiffies(VMA_PID_RESET_PERIOD);
>> +
>> + WRITE_ONCE(vma->numab_state->scan_counter, 0);
>> }
>>
>> /*
>> @@ -3068,6 +3084,13 @@ static void task_numa_work(struct callback_head *work)
>> vma->numab_state->next_scan))
>> continue;
>>
>> + /*
>> + * For long running tasks, renew the disjoint vma scanning
>> + * periodically.
>> + */
>> + if (mm->numa_scan_seq && !(mm->numa_scan_seq % DISJOINT_VMA_SCAN_RENEW_THRESH))
>
> Don't you need a READ_ONCE() accessor for mm->numa_scan_seq?
>

Hello Bharata,

Yes.. Thanks for pointing out.. V1 I did ensure that, But in V2 somehow
leftout :( .

On the other-hand I see vma->numab_state->scan_counter does not need
READ_ONCE/WRITE_ONCE since it is not modified out of this function
(i.e. it is all done after cmpxchg above)..

Also thinking more, DISJOINT_VMA_SCAN_RENEW_THRESH reset change itself
may need some correction, and doesn't seem to be absolutely necessary
here. (will post that separately for improving long running benchmark as
per my experiment with more detail)

will wait for any confirmation of reported regression fix with this
patch and/or any better idea/ack for a while and repost.

2023-05-26 01:50:19

by Oliver Sang

[permalink] [raw]
Subject: Re: [RFC PATCH V2 1/1] sched/numa: Fix disjoint set vma scan regression



Hello,

kernel test robot noticed a -46.3% improvement of autonuma-benchmark.numa01.seconds on:


commit: d281d36ed007eabb243ad2d489c52c43961f8ac3 ("[RFC PATCH V2 1/1] sched/numa: Fix disjoint set vma scan regression")
url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Fix-disjoint-set-vma-scan-regression/20230516-180954
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git a6fcdd8d95f7486150b3faadfea119fc3dfc3b74
patch link: https://lore.kernel.org/all/b0a8f3490b491d4fd003c3e0493e940afaea5f2c.1684228065.git.raghavendra.kt@amd.com/
patch subject: [RFC PATCH V2 1/1] sched/numa: Fix disjoint set vma scan regression


we noticed this patch addressed the performance regression we reported
https://lore.kernel.org/all/[email protected]/

we also noticed there is still some discussion in the thread
https://lore.kernel.org/all/[email protected]/

since we didn't see V3 patch, send out this report for your information
about its performance impact.


testcase: autonuma-benchmark
test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
parameters:

iterations: 4x
test: numa02_SMT
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
gcc-11/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/numa02_SMT/autonuma-benchmark

commit:
a6fcdd8d95 ("sched/debug: Correct printing for rq->nr_uninterruptible")
d281d36ed0 ("sched/numa: Fix disjoint set vma scan regression")

a6fcdd8d95f74861 d281d36ed007eabb243ad2d489c
---------------- ---------------------------
%stddev %change %stddev
\ | \
1899 ? 2% -37.4% 1189 ? 18% uptime.boot
1809 +3.1% 1866 vmstat.system.cs
1.685e+10 ? 3% -52.2% 8.052e+09 ? 11% cpuidle..time
17400470 ? 3% -52.3% 8308590 ? 11% cpuidle..usage
26350 ? 7% -9.2% 23932 meminfo.Active
26238 ? 7% -9.2% 23828 meminfo.Active(anon)
38666 ? 8% +21.4% 46957 ? 7% meminfo.Mapped
1996 -16.5% 1666 meminfo.Mlocked
23350 ? 56% +64.7% 38446 ? 8% numa-meminfo.node0.Mapped
5108 ? 6% +78.7% 9132 ? 43% numa-meminfo.node0.Shmem
25075 ? 8% -9.3% 22750 numa-meminfo.node1.Active
25057 ? 8% -9.5% 22681 numa-meminfo.node1.Active(anon)
2038104 ? 7% -34.9% 1327290 ? 12% numa-numastat.node0.local_node
2394647 ? 5% -30.9% 1655892 ? 15% numa-numastat.node0.numa_hit
1988880 ? 7% -27.5% 1442918 ? 12% numa-numastat.node1.local_node
2255986 ? 6% -23.0% 1737172 ? 17% numa-numastat.node1.numa_hit
10.54 ? 3% -1.7 8.83 ? 9% mpstat.cpu.all.idle%
0.00 ? 74% +0.1 0.05 ? 13% mpstat.cpu.all.iowait%
2.48 -0.9 1.57 mpstat.cpu.all.irq%
0.08 ? 2% -0.0 0.06 ? 3% mpstat.cpu.all.soft%
1.48 +0.7 2.19 ? 4% mpstat.cpu.all.sys%
427.10 -46.3% 229.25 ? 3% autonuma-benchmark.numa01.seconds
1819 -43.5% 1027 ? 3% autonuma-benchmark.time.elapsed_time
1819 -43.5% 1027 ? 3% autonuma-benchmark.time.elapsed_time.max
791068 ? 2% -43.6% 446212 ? 4% autonuma-benchmark.time.involuntary_context_switches
2089497 -16.8% 1737489 ? 2% autonuma-benchmark.time.minor_page_faults
7603 +3.2% 7848 autonuma-benchmark.time.percent_of_cpu_this_job_got
136519 -42.2% 78864 ? 3% autonuma-benchmark.time.user_time
22402 +44.8% 32429 ? 4% autonuma-benchmark.time.voluntary_context_switches
5919 ? 55% +64.7% 9747 ? 9% numa-vmstat.node0.nr_mapped
1277 ? 6% +77.6% 2268 ? 43% numa-vmstat.node0.nr_shmem
2394430 ? 5% -30.9% 1655441 ? 15% numa-vmstat.node0.numa_hit
2037887 ? 7% -34.9% 1326839 ? 12% numa-vmstat.node0.numa_local
6261 ? 8% -9.2% 5683 numa-vmstat.node1.nr_active_anon
6261 ? 8% -9.2% 5683 numa-vmstat.node1.nr_zone_active_anon
2255543 ? 6% -23.0% 1736429 ? 17% numa-vmstat.node1.numa_hit
1988436 ? 7% -27.5% 1442174 ? 12% numa-vmstat.node1.numa_local
35815 ? 5% -23.8% 27284 ? 17% turbostat.C1
0.03 ? 17% +0.0 0.07 ? 12% turbostat.C1E%
17197885 ? 3% -52.7% 8127065 ? 11% turbostat.C6
10.48 ? 3% -1.7 8.80 ? 10% turbostat.C6%
10.23 ? 3% -17.3% 8.46 ? 10% turbostat.CPU%c1
0.24 ? 7% +61.3% 0.38 ? 11% turbostat.CPU%c6
1.615e+08 -42.4% 93035289 ? 3% turbostat.IRQ
48830 ? 13% -35.2% 31632 ? 11% turbostat.POLL
0.19 ? 7% +61.1% 0.30 ? 11% turbostat.Pkg%pc2
238.01 +5.2% 250.27 turbostat.PkgWatt
22.38 +25.3% 28.03 turbostat.RAMWatt
6557 ? 7% -9.1% 5963 proc-vmstat.nr_active_anon
1539398 -4.8% 1465253 proc-vmstat.nr_anon_pages
2955 -5.8% 2785 proc-vmstat.nr_anon_transparent_hugepages
1541555 -4.7% 1468824 proc-vmstat.nr_inactive_anon
9843 ? 8% +21.4% 11949 ? 7% proc-vmstat.nr_mapped
499.00 -16.5% 416.67 proc-vmstat.nr_mlock
3896 -3.2% 3770 proc-vmstat.nr_page_table_pages
6557 ? 7% -9.1% 5963 proc-vmstat.nr_zone_active_anon
1541555 -4.7% 1468824 proc-vmstat.nr_zone_inactive_anon
30446 ? 15% +397.7% 151532 ? 4% proc-vmstat.numa_hint_faults
21562 ? 12% +312.9% 89028 ? 3% proc-vmstat.numa_hint_faults_local
4651965 -27.0% 3395711 ? 2% proc-vmstat.numa_hit
5122 ? 7% +1393.9% 76529 ? 5% proc-vmstat.numa_huge_pte_updates
4028316 -31.2% 2772852 ? 2% proc-vmstat.numa_local
1049660 +672.3% 8106150 ? 6% proc-vmstat.numa_pages_migrated
2725369 ? 7% +1343.9% 39352403 ? 5% proc-vmstat.numa_pte_updates
45132 ? 31% +31.9% 59519 proc-vmstat.pgactivate
1.816e+08 ? 2% +5.6% 1.918e+08 ? 3% proc-vmstat.pgalloc_normal
5863913 -30.2% 4092045 ? 2% proc-vmstat.pgfault
1.815e+08 ? 2% +5.7% 1.918e+08 ? 3% proc-vmstat.pgfree
1049660 +672.3% 8106150 ? 6% proc-vmstat.pgmigrate_success
264923 -35.1% 171993 ? 2% proc-vmstat.pgreuse
2037 +675.1% 15790 ? 6% proc-vmstat.thp_migration_success
13598464 -42.9% 7770880 ? 3% proc-vmstat.unevictable_pgs_scanned
3208 ? 14% +44.1% 4624 ? 12% sched_debug.cfs_rq:/.load.min
2.73 ? 16% +52.6% 4.16 ? 14% sched_debug.cfs_rq:/.load_avg.min
94294753 ? 2% -45.7% 51173318 ? 3% sched_debug.cfs_rq:/.min_vruntime.avg
98586361 ? 2% -46.3% 52983552 ? 3% sched_debug.cfs_rq:/.min_vruntime.max
85615972 ? 2% -45.4% 46737672 ? 3% sched_debug.cfs_rq:/.min_vruntime.min
2806211 ? 7% -53.1% 1314959 ? 6% sched_debug.cfs_rq:/.min_vruntime.stddev
2.63 ? 23% +65.8% 4.36 ? 24% sched_debug.cfs_rq:/.removed.load_avg.avg
1.08 ? 23% +55.4% 1.68 ? 19% sched_debug.cfs_rq:/.removed.runnable_avg.avg
1.07 ? 24% +56.7% 1.68 ? 19% sched_debug.cfs_rq:/.removed.util_avg.avg
7252565 ? 13% -46.6% 3874343 ? 15% sched_debug.cfs_rq:/.spread0.avg
11534195 ? 10% -50.8% 5673455 ? 12% sched_debug.cfs_rq:/.spread0.max
-1406320 -61.2% -546147 sched_debug.cfs_rq:/.spread0.min
2795186 ? 7% -53.2% 1309202 ? 6% sched_debug.cfs_rq:/.spread0.stddev
6.57 ? 40% +6868.7% 457.74 ? 6% sched_debug.cfs_rq:/.util_est_enqueued.avg
275.73 ? 43% +333.0% 1193 ? 5% sched_debug.cfs_rq:/.util_est_enqueued.max
37.70 ? 41% +722.1% 309.90 ? 3% sched_debug.cfs_rq:/.util_est_enqueued.stddev
794516 ? 5% -26.2% 586654 ? 17% sched_debug.cpu.avg_idle.min
224.87 ? 4% -42.6% 129.01 ? 10% sched_debug.cpu.clock.stddev
885581 ? 2% -42.6% 508722 ? 3% sched_debug.cpu.clock_task.min
19466 ? 2% -30.4% 13558 ? 4% sched_debug.cpu.curr->pid.avg
26029 ? 2% -33.9% 17216 ? 2% sched_debug.cpu.curr->pid.max
13168 ? 11% -33.3% 8788 ? 12% sched_debug.cpu.curr->pid.min
2761 ? 14% -37.3% 1730 ? 29% sched_debug.cpu.curr->pid.stddev
958735 -24.7% 721786 ? 11% sched_debug.cpu.max_idle_balance_cost.max
97977 ? 3% -52.2% 46860 ? 25% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ? 4% -41.9% 0.00 ? 10% sched_debug.cpu.next_balance.stddev
20557 ? 3% -37.3% 12889 ? 3% sched_debug.cpu.nr_switches.avg
81932 ? 7% -25.4% 61097 ? 15% sched_debug.cpu.nr_switches.max
6643 ? 6% -39.2% 4037 ? 15% sched_debug.cpu.nr_switches.min
13929 ? 6% -30.1% 9740 ? 4% sched_debug.cpu.nr_switches.stddev
20.30 ? 23% +73.3% 35.19 ? 28% sched_debug.cpu.nr_uninterruptible.max
-12.43 +145.8% -30.55 sched_debug.cpu.nr_uninterruptible.min
5.28 ? 12% +82.4% 9.63 ? 13% sched_debug.cpu.nr_uninterruptible.stddev
925729 ? 2% -42.4% 533314 ? 3% sched_debug.sched_clk
36.08 +51.2% 54.54 ? 2% perf-stat.i.MPKI
1.037e+08 +9.7% 1.137e+08 perf-stat.i.branch-instructions
1.36 +0.0 1.39 perf-stat.i.branch-miss-rate%
1602349 +19.8% 1918946 ? 3% perf-stat.i.branch-misses
11889864 +56.5% 18603954 ? 2% perf-stat.i.cache-misses
17973544 +54.5% 27773059 ? 2% perf-stat.i.cache-references
1771 +3.1% 1826 perf-stat.i.context-switches
2.147e+11 +3.2% 2.215e+11 perf-stat.i.cpu-cycles
112.60 +16.6% 131.26 perf-stat.i.cpu-migrations
18460 -33.1% 12347 perf-stat.i.cycles-between-cache-misses
0.03 ? 4% +0.0 0.04 ? 8% perf-stat.i.dTLB-load-miss-rate%
52266 ? 4% +28.9% 67377 ? 7% perf-stat.i.dTLB-load-misses
1.442e+08 +7.7% 1.553e+08 perf-stat.i.dTLB-loads
0.25 +0.0 0.27 perf-stat.i.dTLB-store-miss-rate%
189901 +13.6% 215670 perf-stat.i.dTLB-store-misses
80186719 +6.8% 85653052 perf-stat.i.dTLB-stores
400061 ? 4% +18.9% 475847 ? 4% perf-stat.i.iTLB-load-misses
361622 ? 2% -18.0% 296709 ? 12% perf-stat.i.iTLB-loads
5.358e+08 +9.1% 5.845e+08 perf-stat.i.instructions
1420 ? 2% -4.7% 1353 ? 3% perf-stat.i.instructions-per-iTLB-miss
0.00 ? 5% +24.9% 0.01 ? 8% perf-stat.i.ipc
0.06 ? 8% -21.0% 0.04 ? 8% perf-stat.i.major-faults
2.44 +3.3% 2.52 perf-stat.i.metric.GHz
1592 +3.9% 1655 ? 2% perf-stat.i.metric.K/sec
2.46 +17.0% 2.88 perf-stat.i.metric.M/sec
3139 +21.5% 3815 perf-stat.i.minor-faults
52.88 +1.2 54.08 perf-stat.i.node-load-miss-rate%
238363 +53.4% 365608 perf-stat.i.node-load-misses
219288 ? 4% +35.5% 297188 perf-stat.i.node-loads
50.27 -6.3 44.01 ? 4% perf-stat.i.node-store-miss-rate%
5122757 +33.6% 6845784 ? 3% perf-stat.i.node-store-misses
5214111 +77.3% 9242761 ? 5% perf-stat.i.node-stores
3139 +21.5% 3815 perf-stat.i.page-faults
33.59 +41.4% 47.50 ? 2% perf-stat.overall.MPKI
1.54 +0.2 1.70 ? 2% perf-stat.overall.branch-miss-rate%
407.04 -6.3% 381.24 perf-stat.overall.cpi
18139 -34.2% 11935 ? 2% perf-stat.overall.cycles-between-cache-misses
0.03 ? 4% +0.0 0.04 ? 7% perf-stat.overall.dTLB-load-miss-rate%
0.24 +0.0 0.25 perf-stat.overall.dTLB-store-miss-rate%
54.20 ? 2% +8.4 62.57 ? 5% perf-stat.overall.iTLB-load-miss-rate%
1353 ? 3% -8.7% 1236 ? 5% perf-stat.overall.instructions-per-iTLB-miss
0.00 +6.8% 0.00 perf-stat.overall.ipc
51.26 ? 2% +2.9 54.20 perf-stat.overall.node-load-miss-rate%
49.80 -7.1 42.74 ? 4% perf-stat.overall.node-store-miss-rate%
1.031e+08 +10.1% 1.136e+08 perf-stat.ps.branch-instructions
1590263 +21.1% 1925935 ? 3% perf-stat.ps.branch-misses
11959851 +55.9% 18644201 ? 2% perf-stat.ps.cache-misses
17905275 +54.8% 27718539 ? 2% perf-stat.ps.cache-references
1778 +2.7% 1826 perf-stat.ps.context-switches
2.169e+11 +2.5% 2.224e+11 perf-stat.ps.cpu-cycles
112.21 +16.9% 131.18 perf-stat.ps.cpu-migrations
50162 ? 4% +30.8% 65621 ? 8% perf-stat.ps.dTLB-load-misses
1.434e+08 +8.0% 1.549e+08 perf-stat.ps.dTLB-loads
191061 +13.1% 216009 perf-stat.ps.dTLB-store-misses
79737750 +7.0% 85349028 perf-stat.ps.dTLB-stores
394455 ? 4% +20.0% 473256 ? 4% perf-stat.ps.iTLB-load-misses
5.33e+08 +9.5% 5.835e+08 perf-stat.ps.instructions
0.06 ? 9% -21.7% 0.04 ? 8% perf-stat.ps.major-faults
3088 +22.3% 3775 perf-stat.ps.minor-faults
236351 +54.5% 365167 perf-stat.ps.node-load-misses
225011 ? 4% +37.1% 308593 perf-stat.ps.node-loads
5183662 +32.7% 6879838 ? 3% perf-stat.ps.node-store-misses
5224274 +76.7% 9231733 ? 5% perf-stat.ps.node-stores
3088 +22.3% 3775 perf-stat.ps.page-faults
9.703e+11 -38.2% 5.999e+11 ? 2% perf-stat.total.instructions
14.84 ? 13% -12.7 2.15 ? 14% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
1.37 ? 11% -0.6 0.76 ? 31% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
0.53 ? 72% +0.5 1.03 ? 18% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
1.26 ? 18% +0.8 2.11 ? 12% perf-profile.calltrace.cycles-pp.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit._printk
1.26 ? 18% +0.9 2.13 ? 12% perf-profile.calltrace.cycles-pp.irq_work_run_list.irq_work_run.__sysvec_irq_work.sysvec_irq_work.asm_sysvec_irq_work
1.26 ? 18% +0.9 2.13 ? 12% perf-profile.calltrace.cycles-pp.irq_work_single.irq_work_run_list.irq_work_run.__sysvec_irq_work.sysvec_irq_work
1.26 ? 18% +0.9 2.13 ? 12% perf-profile.calltrace.cycles-pp._printk.irq_work_single.irq_work_run_list.irq_work_run.__sysvec_irq_work
1.26 ? 18% +0.9 2.13 ? 12% perf-profile.calltrace.cycles-pp.vprintk_emit._printk.irq_work_single.irq_work_run_list.irq_work_run
1.26 ? 18% +0.9 2.13 ? 12% perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit._printk.irq_work_single.irq_work_run_list
1.26 ? 18% +0.9 2.13 ? 12% perf-profile.calltrace.cycles-pp.console_flush_all.console_unlock.vprintk_emit._printk.irq_work_single
2.10 ?112% +6.0 8.06 ? 77% perf-profile.calltrace.cycles-pp.__libc_start_main
2.10 ?112% +6.0 8.06 ? 77% perf-profile.calltrace.cycles-pp.main.__libc_start_main
2.10 ?112% +6.0 8.06 ? 77% perf-profile.calltrace.cycles-pp.run_builtin.main.__libc_start_main
1.37 ?105% +6.0 7.34 ? 77% perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
4.80 ? 11% +9.9 14.67 ? 51% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.42 ? 17% +10.6 13.98 ? 55% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.82 ? 17% +10.6 14.41 ? 54% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
4.01 ? 17% +10.6 14.60 ? 53% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
3.78 ? 17% +10.6 14.40 ? 54% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
17.34 ? 10% -13.4 3.91 ? 9% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
1.54 ? 94% -1.3 0.20 ? 52% perf-profile.children.cycles-pp.zero_user_segments
4.02 ? 15% -1.3 2.75 ? 16% perf-profile.children.cycles-pp.exit_to_user_mode_loop
3.52 ? 22% -1.1 2.45 ? 13% perf-profile.children.cycles-pp.get_perf_callchain
3.37 ? 6% -1.0 2.40 ? 12% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
1.16 ? 7% -1.0 0.20 ? 39% perf-profile.children.cycles-pp.rcu_gp_kthread
1.56 ? 15% -0.8 0.71 ? 23% perf-profile.children.cycles-pp.__irq_exit_rcu
2.62 ? 23% -0.8 1.82 ? 8% perf-profile.children.cycles-pp.perf_callchain_kernel
0.92 ? 10% -0.8 0.16 ? 42% perf-profile.children.cycles-pp.rcu_gp_fqs_loop
2.66 ? 17% -0.7 1.94 ? 18% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.93 ? 11% -0.7 0.24 ? 31% perf-profile.children.cycles-pp.schedule_timeout
1.53 ? 28% -0.7 0.85 ? 19% perf-profile.children.cycles-pp.perf_trace_sched_switch
2.26 ? 24% -0.6 1.62 ? 8% perf-profile.children.cycles-pp.unwind_next_frame
1.38 ? 11% -0.6 0.76 ? 31% perf-profile.children.cycles-pp.evsel__read_counter
1.89 ? 15% -0.6 1.29 ? 16% perf-profile.children.cycles-pp.__do_softirq
0.60 ? 23% -0.5 0.06 ? 74% perf-profile.children.cycles-pp.rebalance_domains
0.69 ? 20% -0.4 0.24 ? 65% perf-profile.children.cycles-pp.load_balance
1.30 ? 8% -0.4 0.88 ? 20% perf-profile.children.cycles-pp.readn
0.96 ? 16% -0.4 0.56 ? 14% perf-profile.children.cycles-pp.task_mm_cid_work
0.44 ? 22% -0.3 0.10 ? 9% perf-profile.children.cycles-pp.__evlist__disable
0.75 ? 8% -0.3 0.47 ? 24% perf-profile.children.cycles-pp.perf_read
0.72 ? 14% -0.2 0.47 ? 40% perf-profile.children.cycles-pp.pick_next_task_fair
0.63 ? 23% -0.2 0.38 ? 33% perf-profile.children.cycles-pp.put_prev_entity
0.52 ? 18% -0.2 0.28 ? 32% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.26 ? 16% -0.2 0.07 ? 48% perf-profile.children.cycles-pp.swake_up_one
0.26 ? 22% -0.2 0.08 ? 11% perf-profile.children.cycles-pp.rcu_report_qs_rdp
0.41 ? 26% -0.2 0.24 ? 30% perf-profile.children.cycles-pp.__fdget_pos
0.19 ? 16% -0.2 0.03 ?100% perf-profile.children.cycles-pp.detach_tasks
0.19 ? 52% -0.2 0.03 ?100% perf-profile.children.cycles-pp.ioctl
0.30 ? 20% -0.1 0.15 ? 49% perf-profile.children.cycles-pp.evlist__id2evsel
0.22 ? 16% -0.1 0.13 ? 57% perf-profile.children.cycles-pp.__folio_throttle_swaprate
0.22 ? 19% -0.1 0.13 ? 57% perf-profile.children.cycles-pp.blk_cgroup_congested
0.17 ? 27% -0.1 0.08 ? 22% perf-profile.children.cycles-pp.generic_exec_single
0.18 ? 24% -0.1 0.08 ? 22% perf-profile.children.cycles-pp.smp_call_function_single
0.17 ? 21% -0.1 0.10 ? 32% perf-profile.children.cycles-pp.__perf_read_group_add
0.13 ? 35% -0.1 0.08 ? 66% perf-profile.children.cycles-pp.__kmalloc
0.10 ? 31% -0.0 0.05 ? 74% perf-profile.children.cycles-pp.__perf_event_read
0.02 ?144% +0.1 0.10 ? 28% perf-profile.children.cycles-pp.mntput_no_expire
0.29 ? 19% +0.1 0.40 ? 19% perf-profile.children.cycles-pp.dput
0.05 ?101% +0.1 0.18 ? 40% perf-profile.children.cycles-pp.free_unref_page_prepare
0.04 ?152% +0.2 0.20 ? 22% perf-profile.children.cycles-pp.devkmsg_read
0.86 ? 8% +0.2 1.08 ? 19% perf-profile.children.cycles-pp.step_into
0.35 ? 27% +0.3 0.61 ? 15% perf-profile.children.cycles-pp.run_ksoftirqd
1.46 ? 18% +0.9 2.34 ? 22% perf-profile.children.cycles-pp.wait_for_lsr
1.57 ? 14% +1.0 2.61 ? 18% perf-profile.children.cycles-pp.serial8250_console_write
1.58 ? 14% +1.0 2.63 ? 19% perf-profile.children.cycles-pp.console_unlock
1.58 ? 14% +1.0 2.63 ? 19% perf-profile.children.cycles-pp.console_flush_all
1.58 ? 14% +1.1 2.63 ? 19% perf-profile.children.cycles-pp.asm_sysvec_irq_work
1.58 ? 14% +1.1 2.63 ? 19% perf-profile.children.cycles-pp.irq_work_run_list
1.57 ? 14% +1.1 2.63 ? 19% perf-profile.children.cycles-pp.sysvec_irq_work
1.57 ? 14% +1.1 2.63 ? 19% perf-profile.children.cycles-pp.__sysvec_irq_work
1.57 ? 14% +1.1 2.63 ? 19% perf-profile.children.cycles-pp.irq_work_run
1.57 ? 14% +1.1 2.63 ? 19% perf-profile.children.cycles-pp.irq_work_single
1.57 ? 14% +1.1 2.63 ? 19% perf-profile.children.cycles-pp._printk
1.57 ? 14% +1.1 2.63 ? 19% perf-profile.children.cycles-pp.vprintk_emit
2.17 ? 21% +1.3 3.44 ? 28% perf-profile.children.cycles-pp.io_serial_in
2.24 ?100% +5.8 8.06 ? 77% perf-profile.children.cycles-pp.__libc_start_main
2.24 ?100% +5.8 8.06 ? 77% perf-profile.children.cycles-pp.main
2.24 ?100% +5.8 8.06 ? 77% perf-profile.children.cycles-pp.run_builtin
1.68 ? 86% +6.4 8.06 ? 77% perf-profile.children.cycles-pp.cmd_record
0.51 ? 59% +7.9 8.43 ? 94% perf-profile.children.cycles-pp.copy_page
0.33 ?109% +7.9 8.27 ? 97% perf-profile.children.cycles-pp.folio_copy
0.36 ?101% +7.9 8.30 ? 96% perf-profile.children.cycles-pp.move_to_new_folio
0.36 ?101% +7.9 8.30 ? 96% perf-profile.children.cycles-pp.migrate_folio_extra
0.42 ? 93% +8.8 9.21 ? 88% perf-profile.children.cycles-pp.migrate_pages_batch
0.44 ? 91% +8.8 9.24 ? 88% perf-profile.children.cycles-pp.migrate_misplaced_page
0.42 ? 93% +8.8 9.22 ? 88% perf-profile.children.cycles-pp.migrate_pages
10.36 ? 5% +9.5 19.84 ? 36% perf-profile.children.cycles-pp.asm_exc_page_fault
9.62 ? 5% +9.5 19.12 ? 37% perf-profile.children.cycles-pp.exc_page_fault
7.83 ? 2% +9.6 17.45 ? 42% perf-profile.children.cycles-pp.handle_mm_fault
9.20 ? 6% +9.6 18.84 ? 38% perf-profile.children.cycles-pp.do_user_addr_fault
6.79 ? 2% +9.7 16.49 ? 44% perf-profile.children.cycles-pp.__handle_mm_fault
0.30 ?117% +9.8 10.06 ? 79% perf-profile.children.cycles-pp.do_huge_pmd_numa_page
12.05 ? 12% -11.7 0.31 ? 36% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
0.93 ? 14% -0.4 0.56 ? 14% perf-profile.self.cycles-pp.task_mm_cid_work
0.78 ? 25% -0.2 0.55 ? 10% perf-profile.self.cycles-pp.unwind_next_frame
0.29 ? 19% -0.1 0.15 ? 50% perf-profile.self.cycles-pp.evlist__id2evsel
0.20 ? 24% -0.1 0.11 ? 49% perf-profile.self.cycles-pp.exc_page_fault
0.20 ? 20% -0.1 0.12 ? 53% perf-profile.self.cycles-pp.blk_cgroup_congested
0.17 ? 14% -0.1 0.12 ? 20% perf-profile.self.cycles-pp.perf_swevent_event
0.02 ?144% +0.1 0.10 ? 28% perf-profile.self.cycles-pp.mntput_no_expire
0.12 ? 29% +0.1 0.23 ? 35% perf-profile.self.cycles-pp.mod_objcg_state
0.04 ?104% +0.1 0.17 ? 36% perf-profile.self.cycles-pp.free_unref_page_prepare
1.40 ? 13% +0.9 2.29 ? 19% perf-profile.self.cycles-pp.io_serial_in
0.50 ? 59% +7.8 8.28 ? 94% perf-profile.self.cycles-pp.copy_page




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Attachments:
(No filename) (26.01 kB)
config-6.4.0-rc1-00016-gd281d36ed007 (160.86 kB)
job-script (8.64 kB)
job.yaml (5.96 kB)
reproduce (263.00 B)
Download all attachments