LinuxLists.cc - [sched/hotplug] 2558aacff8: will-it-scale.per_thread

2020-12-10 08:11:21

Subject: [sched/hotplug] 2558aacff8: will-it-scale.per_thread_ops -1.6% regression

Greeting,

FYI, we noticed a

commit: 2558aacff8586699bcd

If you fix the issue, Reported-by: kernel

Details are as below:
---------------------------

To reproduce:

git clone bin/lkp install job.yaml bin/lkp run job.yaml

=========================== compiler/cpufreq_governor/k gcc-9/performance/x86_64-
commit:
565790d28b ("sched: 2558aacff8 ("sched/hotplug:
565790d28b1e33ee 2558aacff8 ---------------- ---------- fail:runs %reproduction | | 1:4 0% 0:4 0% %stddev %change \ | 4.011e+08 2785455 4.011e+08 12.05 +2.1 1087711 ? 75% -79.0% 1126029 ? 74% -74.5% 33836 -2.3% 74433 -1.5% 86.25 -2.3% 11.75 ? 3% +17.0% 333551 ? 17% -21.7% 329071 ? 3% -15.4% 472614 ? 2% -11.0% 17597663 ? 17% 1897476 ? 17% 5628 ? 8% -10.9% 5628 ? 8% -10.9% 3613 ? 2% -10.9% 3644 ? 2% -10.9% 3967 ? 4% -8.3% 3967 ? 4% -8.3% 0.02 ? 9% -14.5% 14.28 ? 38% +48.9% 0.02 ? 24% +38.2% 0.04 ? 13% -22.8% 47.71 ? 30% +41.6% 3.20 ? 33% -81.5% 33.43 ? 27% +38.4% 0.05 ? 43% -68.9% 8.23 ? 10% -57.9% 35211 ?169% -99.6% 8060 ? 17% -35.0% 35211 ?169% -99.6% 1053733 ? 53% -52.7% 946160 ? 58% -62.7% 107572 ? 23% +35.7% 5914 ? 23% -28.5% 18204 ? 3% -14.7% 629428 ? 9% -14.2% 17302 ? 10% +26.0% 140785 ?169% -99.6% 140785 ?169% -99.6% 32241 ? 17% -35.0% 32241 ? 17% -35.0% 101007 ? 5% -15.7% 23657 ? 23% -28.5% 23657 ? 23% -28.5% 72823 ? 3% -14.7% 96481 ? 3% -18.1% 69210 ? 10% +26.0% 110579 ? 9% +26.7% 388.75 ? 74% +1147.8% 1540 ? 69% -78.2% 388.75 ? 74% +1147.8% 307.50 +66.7% 1540 ? 69% -78.2% 350.50 ? 8% -10.3% 424.50 ? 24% -18.7% 8496 -50.1% 8496 -50.1% 314.25 8496 -50.1% 8496 -50.1% 311.50 +13.2% 21144 ? 15% -25.0% 317.75 +39.2% 8.557e+10 0.43 +0.4 3.479e+08 333383 ? 17% -22.1% 1.02 +2.4% 1.268e+11 7.506e+10 4.26e+08 ? 2% 538538 ? 36% -79.5% 3.983e+11 946.16 ? 3% +43.4% 0.99 -2.4% 1.22 ? 3% +30.8% 1996 -1.9% 0.41 +0.4 1.01 +2.5% 935.95 ? 2% +42.8% 0.99 -2.4% 8.527e+10 3.467e+08 334637 ? 17% -21.5% 1.264e+11 7.48e+10 4.244e+08 ? 2% 539519 ? 36% -79.5% 3.969e+11 1.2e+14 0.68 ? 2% -0.1 0.93 -0.1 1.22 +0.1 1.11 +0.1 1.51 -0.1 0.37 ? 3% -0.1 0.47 ? 2% -0.1 0.30 ? 5% -0.0 0.95 -0.0 0.10 ? 14% -0.0 0.42 -0.0 0.10 ? 4% -0.0 0.11 ? 4% -0.0 0.10 ? 4% -0.0 0.06 ? 15% +0.0 0.23 ? 8% +0.1 0.35 ? 4% -0.1 1.24 -0.1 0.23 ? 5% -0.0 0.34 ? 4% -0.0 0.12 ? 4% -0.0 0.32 -0.0 0.08 -0.0 0.34 -0.0 0.08 +0.0 0.05 ? 8% +0.0 0.51 +0.0 0.16 ? 13% +0.0 1.44 ? 4% +0.3

3e+06 +---------------- | : O +.+.+.+.+.++.+.+.+.+.+ 2.5e+06 |-: : | : : | : : 2e+06 |-+: : | : : 1.5e+06 |-+: : | : : 1e+06 |-+: : | : : | : 500000 |-+ : | : 0 +---------------- [*] bisect-good sample
[O] bisect-bad sample

Disclaimer:
Results have been for informational design or configuration

Thanks,
Oliver Sang

-1.6% regression of will-it-scale.per_thread_ops due to commit:
248b406febb28b0a25de2 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
l.org/cgit/linux/kernel/git/tip/tip.git">https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/migrate-disable
threads Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
/github.com/antonblanchard/will-it-scale">https://github.com/antonblanchard/will-it-scale
kindly add following tag
test robot <[email protected]>
----------------------------------------------------------------------->
href="https://github.com/intel/lkp-tests.git">https://github.com/intel/lkp-tests.git
# job file is attached in this email
==============================================================
config/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/sched_yield/will-it-scale/0x700001e
Fix balance_callback()")
Ensure only per-cpu kthreads run during hotplug")
586699bcd248b406f
-----------------
fail:runs
|
1:4 perf-profile.children.cycles-pp.error_entry
0:4 perf-profile.self.cycles-pp.error_entry
%stddev
\
-1.6% 3.945e+08 will-it-scale.144.threads
-1.6% 2739520 will-it-scale.per_thread_ops
-1.6% 3.945e+08 will-it-scale.workload
14.18 mpstat.cpu.all.usr%
228885 ? 7% numa-numastat.node1.local_node
286894 ? 6% numa-numastat.node1.numa_hit
33042 proc-vmstat.nr_slab_reclaimable
73345 proc-vmstat.nr_slab_unreclaimable
84.25 vmstat.cpu.sy
13.75 ? 3% vmstat.cpu.us
261115 ? 5% vmstat.system.cs
278535 ? 4% sched_debug.cfs_rq:/.spread0.avg
420678 ? 2% sched_debug.cfs_rq:/.spread0.max
-28.5% 12582107 ? 16% sched_debug.cpu.nr_switches.max
-28.4% 1359264 ? 14% sched_debug.cpu.nr_switches.stddev
5012 ? 3% slabinfo.files_cache.active_objs
5012 ? 3% slabinfo.files_cache.num_objs
3219 slabinfo.kmalloc-rcl-512.active_objs
3248 slabinfo.kmalloc-rcl-512.num_objs
3638 ? 2% slabinfo.sock_inode_cache.active_objs
3638 ? 2% slabinfo.sock_inode_cache.num_objs
0.02 ? 2% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
21.26 ? 24% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
0.03 ? 14% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
0.03 ? 14% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
67.54 ? 11% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
0.59 ? 91% perf-sched.wait_time.avg.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
46.27 ? 8% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
0.02 ? 63% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.path_openat
3.47 ? 97% perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
129.50 ? 96% numa-vmstat.node1.nr_active_anon
5236 ? 31% numa-vmstat.node1.nr_slab_reclaimable
129.50 ? 96% numa-vmstat.node1.nr_zone_active_anon
498416 ? 7% numa-vmstat.node1.numa_hit
352475 ? 12% numa-vmstat.node1.numa_local
145940 ? 5% numa-vmstat.node1.numa_other
4226 ? 10% numa-vmstat.node2.nr_slab_reclaimable
15522 ? 12% numa-vmstat.node2.nr_slab_unreclaimable
540085 ? 7% numa-vmstat.node2.numa_hit
21807 ? 7% numa-vmstat.node3.nr_slab_unreclaimable
520.75 ? 95% numa-meminfo.node1.Active
520.75 ? 95% numa-meminfo.node1.Active(anon)
20948 ? 31% numa-meminfo.node1.KReclaimable
20948 ? 31% numa-meminfo.node1.SReclaimable
85162 ? 13% numa-meminfo.node1.Slab
16906 ? 10% numa-meminfo.node2.KReclaimable
16906 ? 10% numa-meminfo.node2.SReclaimable
62089 ? 12% numa-meminfo.node2.SUnreclaim
78996 ? 12% numa-meminfo.node2.Slab
87229 ? 7% numa-meminfo.node3.SUnreclaim
140158 ? 10% numa-meminfo.node3.Slab
4851 ?124% interrupts.33:PCI-MSI.524291-edge.eth0-TxRx-2
335.75 ? 51% interrupts.34:PCI-MSI.524292-edge.eth0-TxRx-3
4851 ?124% interrupts.CPU11.33:PCI-MSI.524291-edge.eth0-TxRx-2
512.50 ? 50% interrupts.CPU111.RES:Rescheduling_interrupts
335.75 ? 51% interrupts.CPU12.34:PCI-MSI.524292-edge.eth0-TxRx-3
314.50 ? 2% interrupts.CPU122.RES:Rescheduling_interrupts
345.00 ? 14% interrupts.CPU128.RES:Rescheduling_interrupts
4241 interrupts.CPU29.NMI:Non-maskable_interrupts
4241 interrupts.CPU29.PMI:Performance_monitoring_interrupts
+8.7% 341.50 ? 4% interrupts.CPU29.RES:Rescheduling_interrupts
4242 interrupts.CPU30.NMI:Non-maskable_interrupts
4242 interrupts.CPU30.PMI:Performance_monitoring_interrupts
352.75 ? 8% interrupts.CPU7.RES:Rescheduling_interrupts
15858 ? 24% interrupts.CPU72.CAL:Function_call_interrupts
442.25 ? 32% interrupts.CPU82.RES:Rescheduling_interrupts
-1.8% 8.399e+10 perf-stat.i.branch-instructions
0.87 perf-stat.i.branch-miss-rate%
+106.5% 7.186e+08 perf-stat.i.branch-misses
259849 ? 6% perf-stat.i.context-switches
1.05 perf-stat.i.cpi
-1.9% 1.243e+11 perf-stat.i.dTLB-loads
-1.9% 7.363e+10 perf-stat.i.dTLB-stores
-31.3% 2.925e+08 perf-stat.i.iTLB-load-misses
110207 ? 16% perf-stat.i.iTLB-loads
-1.9% 3.908e+11 perf-stat.i.instructions
1356 perf-stat.i.instructions-per-iTLB-miss
0.97 perf-stat.i.ipc
1.60 ? 3% perf-stat.i.metric.K/sec
1958 perf-stat.i.metric.M/sec
0.86 perf-stat.overall.branch-miss-rate%
1.03 perf-stat.overall.cpi
1336 perf-stat.overall.instructions-per-iTLB-miss
0.97 perf-stat.overall.ipc
-1.8% 8.37e+10 perf-stat.ps.branch-instructions
+106.5% 7.161e+08 perf-stat.ps.branch-misses
262794 ? 5% perf-stat.ps.context-switches
-1.9% 1.239e+11 perf-stat.ps.dTLB-loads
-1.9% 7.338e+10 perf-stat.ps.dTLB-stores
-31.3% 2.915e+08 perf-stat.ps.iTLB-load-misses
110644 ? 16% perf-stat.ps.iTLB-loads
-1.9% 3.895e+11 perf-stat.ps.instructions
-2.0% 1.176e+14 perf-stat.total.instructions
0.59 ? 3% perf-profile.calltrace.cycles-pp.orc_find.unwind_next_frame.perf_callchain_kernel.get_perf_callchain.perf_callchain
0.88 ? 3% perf-profile.calltrace.cycles-pp.__perf_event_header__init_id.perf_prepare_sample.perf_event_output_forward.__perf_event_overflow.perf_swevent_overflow
1.30 ? 2% perf-profile.calltrace.cycles-pp.__orc_find.unwind_next_frame.perf_callchain_kernel.get_perf_callchain.perf_callchain
1.21 ? 2% perf-profile.calltrace.cycles-pp.orc_find.unwind_next_frame.__unwind_start.perf_callchain_kernel.get_perf_callchain
1.44 ? 3% perf-profile.children.cycles-pp.stack_access_ok
0.30 perf-profile.children.cycles-pp.__task_pid_nr_ns
0.41 ? 2% perf-profile.children.cycles-pp.perf_event_pid_type
0.25 ? 12% perf-profile.children.cycles-pp.__list_del_entry_valid
0.90 ? 3% perf-profile.children.cycles-pp.__perf_event_header__init_id
0.07 ? 31% perf-profile.children.cycles-pp.sched_yield@plt
0.38 ? 5% perf-profile.children.cycles-pp.ftrace_graph_ret_addr
0.06 ? 6% perf-profile.children.cycles-pp.is_module_text_address
0.09 ? 5% perf-profile.children.cycles-pp.is_ftrace_trampoline
0.08 ? 6% perf-profile.children.cycles-pp.ftrace_ops_trampoline
0.09 ? 7% perf-profile.children.cycles-pp.rcu_qs
0.31 ? 7% perf-profile.children.cycles-pp.rcu_note_context_switch
0.29 perf-profile.self.cycles-pp.__task_pid_nr_ns
1.18 ? 2% perf-profile.self.cycles-pp.stack_access_ok
0.18 ? 12% perf-profile.self.cycles-pp.__list_del_entry_valid
0.30 ? 2% perf-profile.self.cycles-pp.perf_tp_event
0.10 ? 10% perf-profile.self.cycles-pp.sched_clock_cpu
0.30 ? 5% perf-profile.self.cycles-pp.ftrace_graph_ret_addr
0.06 ? 6% perf-profile.self.cycles-pp.ftrace_ops_trampoline
0.33 ? 2% perf-profile.self.cycles-pp.unwind_get_return_address
0.10 ? 7% perf-profile.self.cycles-pp.rcu_is_watching
0.09 ? 7% perf-profile.self.cycles-pp.rcu_qs
0.55 ? 3% perf-profile.self.cycles-pp.bsearch
0.20 ? 7% perf-profile.self.cycles-pp.rcu_note_context_switch
1.76 ? 12% perf-profile.self.cycles-pp.__sched_yield

will-it-scale.per_thread_ops

-------------------------------------------------+
O O O O O O O O O O OO O O O O O O O |
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------+

estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
may affect actual performance.

Attachments:

(No filename) (14.80 kB)
config-5.10.0-rc1-00033-g2558aacff858 (174.16 kB)
job-script (7.94 kB)
job.yaml (5.55 kB)
reproduce (353.00 B)
Download all attachments

2020-12-10 16:18:28

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [sched/hotplug] 2558aacff8: will-it-scale.per_thread_ops -1.6% regression

On Thu, Dec 10, 2020 at 04:18:59PM +0800, kernel test robot wrote:
> FYI, we noticed a -1.6% regression of will-it-scale.per_thread_ops due to commit:
> commit: 2558aacff8586699bcd248b406febb28b0a25de2 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")

Mooo, weird but whatever. Does the below help at all?

---
kernel/sched/core.c | 40 +++++++++++++++-------------------------
kernel/sched/sched.h | 13 +++++--------
2 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7af80c3fce12..f80245c7f903 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3985,15 +3985,20 @@ static void do_balance_callbacks(struct rq *rq, struct callback_head *head)
}
}

+static void balance_push(struct rq *rq);
+
+struct callback_head balance_push_callback = {
+ .next = NULL,
+ .func = (void (*)(struct callback_head *))balance_push,
+};
+
static inline struct callback_head *splice_balance_callbacks(struct rq *rq)
{
struct callback_head *head = rq->balance_callback;

lockdep_assert_held(&rq->lock);
- if (head) {
+ if (head)
rq->balance_callback = NULL;
- rq->balance_flags &= ~BALANCE_WORK;
- }

return head;
}
@@ -4014,21 +4019,6 @@ static inline void balance_callbacks(struct rq *rq, struct callback_head *head)
}
}

-static void balance_push(struct rq *rq);
-
-static inline void balance_switch(struct rq *rq)
-{
- if (likely(!rq->balance_flags))
- return;
-
- if (rq->balance_flags & BALANCE_PUSH) {
- balance_push(rq);
- return;
- }
-
- __balance_callbacks(rq);
-}
-
#else

static inline void __balance_callbacks(struct rq *rq)
@@ -4044,10 +4034,6 @@ static inline void balance_callbacks(struct rq *rq, struct callback_head *head)
{
}

-static inline void balance_switch(struct rq *rq)
-{
-}
-
#endif

static inline void
@@ -4075,7 +4061,7 @@ static inline void finish_lock_switch(struct rq *rq)
* prev into current:
*/
spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_);
- balance_switch(rq);
+ __balance_callbacks(rq);
raw_spin_unlock_irq(&rq->lock);
}

@@ -7256,6 +7242,10 @@ static void balance_push(struct rq *rq)

lockdep_assert_held(&rq->lock);
SCHED_WARN_ON(rq->cpu != smp_processor_id());
+ /*
+ * Ensure the thing is persistent until balance_push_set(, on = false);
+ */
+ rq->balance_callback = &balance_push_callback;

/*
* Both the cpu-hotplug and stop task are in this case and are
@@ -7305,9 +7295,9 @@ static void balance_push_set(int cpu, bool on)

rq_lock_irqsave(rq, &rf);
if (on)
- rq->balance_flags |= BALANCE_PUSH;
+ rq->balance_callback = &balance_push_callback;
else
- rq->balance_flags &= ~BALANCE_PUSH;
+ rq->balance_callback = NULL;
rq_unlock_irqrestore(rq, &rf);
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f5acb6c5ce49..12ada79d40f3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -975,7 +975,6 @@ struct rq {
unsigned long cpu_capacity_orig;

struct callback_head *balance_callback;
- unsigned char balance_flags;

unsigned char nohz_idle_balance;
unsigned char idle_balance;
@@ -1226,6 +1225,8 @@ struct rq_flags {
#endif
};

+extern struct callback_head balance_push_callback;
+
/*
* Lockdep annotation that avoids accidental unlocks; it's like a
* sticky/continuous lockdep_assert_held().
@@ -1243,9 +1244,9 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
#ifdef CONFIG_SCHED_DEBUG
rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
rf->clock_update_flags = 0;
-#endif
#ifdef CONFIG_SMP
- SCHED_WARN_ON(rq->balance_callback);
+ SCHED_WARN_ON(rq->balance_callback && rq->balance_callback != &balance_push_callback);
+#endif
#endif
}

@@ -1408,9 +1409,6 @@ init_numa_balancing(unsigned long clone_flags, struct task_struct *p)

#ifdef CONFIG_SMP

-#define BALANCE_WORK 0x01
-#define BALANCE_PUSH 0x02
-
static inline void
queue_balance_callback(struct rq *rq,
struct callback_head *head,
@@ -1418,13 +1416,12 @@ queue_balance_callback(struct rq *rq,
{
lockdep_assert_held(&rq->lock);

- if (unlikely(head->next || (rq->balance_flags & BALANCE_PUSH)))
+ if (unlikely(head->next || rq->balance_callback == &balance_push_callback))
return;

head->func = (void (*)(struct callback_head *))func;
head->next = rq->balance_callback;
rq->balance_callback = head;
- rq->balance_flags |= BALANCE_WORK;
}

#define rcu_dereference_check_sched_domain(p) \

2020-12-15 05:41:55

by Xing Zhengjun

[permalink] [raw]

Subject: Re: [LKP] Re: [sched/hotplug] 2558aacff8: will-it-scale.per_thread_ops -1.6% regression

On 12/11/2020 12:14 AM, Peter Zijlstra wrote:
> On Thu, Dec 10, 2020 at 04:18:59PM +0800, kernel test robot wrote:
>> FYI, we noticed a -1.6% regression of will-it-scale.per_thread_ops due to commit:
>> commit: 2558aacff8586699bcd248b406febb28b0a25de2 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
>
> Mooo, weird but whatever. Does the below help at all?

I test the patch, the regression reduced to -0.6%.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode:

lkp-cpl-4sp1/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/thread/sched_yield/performance/0x700001e

commit:
565790d28b1e33ee2f77bad5348b99f6dfc366fd
2558aacff8586699bcd248b406febb28b0a25de2
4b26139b8db627a55043183614a32b0aba799d27 (this test patch)

565790d28b1e33ee 2558aacff8586699bcd248b406f 4b26139b8db627a55043183614a
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
4.011e+08 -1.6% 3.945e+08 -0.6% 3.989e+08
will-it-scale.144.threads
2785455 -1.6% 2739520 -0.6% 2769967
will-it-scale.per_thread_ops
4.011e+08 -1.6% 3.945e+08 -0.6% 3.989e+08
will-it-scale.workload

>
> ---
> kernel/sched/core.c | 40 +++++++++++++++-------------------------
> kernel/sched/sched.h | 13 +++++--------
> 2 files changed, 20 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7af80c3fce12..f80245c7f903 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3985,15 +3985,20 @@ static void do_balance_callbacks(struct rq *rq, struct callback_head *head)
> }
> }
>
> +static void balance_push(struct rq *rq);
> +
> +struct callback_head balance_push_callback = {
> + .next = NULL,
> + .func = (void (*)(struct callback_head *))balance_push,
> +};
> +
> static inline struct callback_head *splice_balance_callbacks(struct rq *rq)
> {
> struct callback_head *head = rq->balance_callback;
>
> lockdep_assert_held(&rq->lock);
> - if (head) {
> + if (head)
> rq->balance_callback = NULL;
> - rq->balance_flags &= ~BALANCE_WORK;
> - }
>
> return head;
> }
> @@ -4014,21 +4019,6 @@ static inline void balance_callbacks(struct rq *rq, struct callback_head *head)
> }
> }
>
> -static void balance_push(struct rq *rq);
> -
> -static inline void balance_switch(struct rq *rq)
> -{
> - if (likely(!rq->balance_flags))
> - return;
> -
> - if (rq->balance_flags & BALANCE_PUSH) {
> - balance_push(rq);
> - return;
> - }
> -
> - __balance_callbacks(rq);
> -}
> -
> #else
>
> static inline void __balance_callbacks(struct rq *rq)
> @@ -4044,10 +4034,6 @@ static inline void balance_callbacks(struct rq *rq, struct callback_head *head)
> {
> }
>
> -static inline void balance_switch(struct rq *rq)
> -{
> -}
> -
> #endif
>
> static inline void
> @@ -4075,7 +4061,7 @@ static inline void finish_lock_switch(struct rq *rq)
> * prev into current:
> */
> spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_);
> - balance_switch(rq);
> + __balance_callbacks(rq);
> raw_spin_unlock_irq(&rq->lock);
> }
>
> @@ -7256,6 +7242,10 @@ static void balance_push(struct rq *rq)
>
> lockdep_assert_held(&rq->lock);
> SCHED_WARN_ON(rq->cpu != smp_processor_id());
> + /*
> + * Ensure the thing is persistent until balance_push_set(, on = false);
> + */
> + rq->balance_callback = &balance_push_callback;
>
> /*
> * Both the cpu-hotplug and stop task are in this case and are
> @@ -7305,9 +7295,9 @@ static void balance_push_set(int cpu, bool on)
>
> rq_lock_irqsave(rq, &rf);
> if (on)
> - rq->balance_flags |= BALANCE_PUSH;
> + rq->balance_callback = &balance_push_callback;
> else
> - rq->balance_flags &= ~BALANCE_PUSH;
> + rq->balance_callback = NULL;
> rq_unlock_irqrestore(rq, &rf);
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index f5acb6c5ce49..12ada79d40f3 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -975,7 +975,6 @@ struct rq {
> unsigned long cpu_capacity_orig;
>
> struct callback_head *balance_callback;
> - unsigned char balance_flags;
>
> unsigned char nohz_idle_balance;
> unsigned char idle_balance;
> @@ -1226,6 +1225,8 @@ struct rq_flags {
> #endif
> };
>
> +extern struct callback_head balance_push_callback;
> +
> /*
> * Lockdep annotation that avoids accidental unlocks; it's like a
> * sticky/continuous lockdep_assert_held().
> @@ -1243,9 +1244,9 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
> #ifdef CONFIG_SCHED_DEBUG
> rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
> rf->clock_update_flags = 0;
> -#endif
> #ifdef CONFIG_SMP
> - SCHED_WARN_ON(rq->balance_callback);
> + SCHED_WARN_ON(rq->balance_callback && rq->balance_callback != &balance_push_callback);
> +#endif
> #endif
> }
>
> @@ -1408,9 +1409,6 @@ init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
>
> #ifdef CONFIG_SMP
>
> -#define BALANCE_WORK 0x01
> -#define BALANCE_PUSH 0x02
> -
> static inline void
> queue_balance_callback(struct rq *rq,
> struct callback_head *head,
> @@ -1418,13 +1416,12 @@ queue_balance_callback(struct rq *rq,
> {
> lockdep_assert_held(&rq->lock);
>
> - if (unlikely(head->next || (rq->balance_flags & BALANCE_PUSH)))
> + if (unlikely(head->next || rq->balance_callback == &balance_push_callback))
> return;
>
> head->func = (void (*)(struct callback_head *))func;
> head->next = rq->balance_callback;
> rq->balance_callback = head;
> - rq->balance_flags |= BALANCE_WORK;
> }
>
> #define rcu_dereference_check_sched_domain(p) \
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

--
Zhengjun Xing

2020-12-15 08:41:44

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [LKP] Re: [sched/hotplug] 2558aacff8: will-it-scale.per_thread_ops -1.6% regression

On Tue, Dec 15, 2020 at 01:35:46PM +0800, Xing Zhengjun wrote:
> On 12/11/2020 12:14 AM, Peter Zijlstra wrote:
> > On Thu, Dec 10, 2020 at 04:18:59PM +0800, kernel test robot wrote:
> > > FYI, we noticed a -1.6% regression of will-it-scale.per_thread_ops due to commit:
> > > commit: 2558aacff8586699bcd248b406febb28b0a25de2 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
> >
> > Mooo, weird but whatever. Does the below help at all?
>
> I test the patch

Thanks!

> , the regression reduced to -0.6%.
>
> =========================================================================================
> tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode:
>
> lkp-cpl-4sp1/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3/gcc-9/100%/thread/sched_yield/performance/0x700001e
>
> commit:
> 565790d28b1e33ee2f77bad5348b99f6dfc366fd
> 2558aacff8586699bcd248b406febb28b0a25de2
> 4b26139b8db627a55043183614a32b0aba799d27 (this test patch)
>
> 565790d28b1e33ee 2558aacff8586699bcd248b406f 4b26139b8db627a55043183614a
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 4.011e+08 -1.6% 3.945e+08 -0.6% 3.989e+08 will-it-scale.144.threads
> 2785455 -1.6% 2739520 -0.6% 2769967 will-it-scale.per_thread_ops
> 4.011e+08 -1.6% 3.945e+08 -0.6% 3.989e+08 will-it-scale.workload

Well, that's better. But I'm rather confused now, because with this new
patch, the actual hot paths are identical, so I've no idea what is
actually causing the regression :/

The above numbers don't seem to have variance, how sure are we the
results are stable? The thing is, when I tried reproducing this locally,
I was mostly looking at noise.

> > ---
> > kernel/sched/core.c | 40 +++++++++++++++-------------------------
> > kernel/sched/sched.h | 13 +++++--------
> > 2 files changed, 20 insertions(+), 33 deletions(-)

Anyway, let me queue this in sched/urgent, it's simpler code and has
less regression.