2023-06-15 20:57:57

by Bart Van Assche

[permalink] [raw]
Subject: [PATCH] sched/cputime: Make IRQ time accounting configurable at boot time

Some producers of Android devices want IRQ time accounting enabled while
others want IRQ time accounting disabled. Hence, make IRQ time accounting
configurable at boot time.

Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ++++
kernel/sched/cputime.c | 13 +++++++++++++
2 files changed, 17 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..67a2ad3af833 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5611,6 +5611,10 @@
non-zero "wait" parameter. See weight_single
and weight_many.

+ sched_clock_irqtime= [KNL]
+ Can be used to disable IRQ time accounting if
+ CONFIG_IRQ_TIME_ACCOUNTING=y.
+
skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index af7952f12e6c..d9c65017024d 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -24,6 +24,19 @@ DEFINE_PER_CPU(struct irqtime, cpu_irqtime);

static int sched_clock_irqtime;

+static int __init sched_clock_irqtime_setup(char *arg)
+{
+ bool enabled;
+
+ if (kstrtobool(arg, &enabled) < 0)
+ pr_err("Invalid sched_clock_irqtime value\n");
+ else
+ sched_clock_irqtime = enabled;
+ return 1;
+}
+
+__setup("sched_clock_irqtime=", sched_clock_irqtime_setup);
+
void enable_sched_clock_irqtime(void)
{
sched_clock_irqtime = 1;


2023-06-16 07:54:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Make IRQ time accounting configurable at boot time

On Thu, Jun 15, 2023 at 01:37:26PM -0700, Bart Van Assche wrote:
> Some producers of Android devices want IRQ time accounting enabled while
> others want IRQ time accounting disabled. Hence, make IRQ time accounting
> configurable at boot time.

Why would they want this disabled? IRQ time accounting avoids a number
of issues under high irq/softirq pressure.

Disabling this makes no sense.

2023-06-16 14:48:26

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Make IRQ time accounting configurable at boot time

On 6/16/23 00:45, Peter Zijlstra wrote:
> On Thu, Jun 15, 2023 at 01:37:26PM -0700, Bart Van Assche wrote:
>> Some producers of Android devices want IRQ time accounting enabled while
>> others want IRQ time accounting disabled. Hence, make IRQ time accounting
>> configurable at boot time.
>
> Why would they want this disabled? IRQ time accounting avoids a number
> of issues under high irq/softirq pressure.
>
> Disabling this makes no sense.

This is why disabling IRQ time accounting makes a ton of sense to me:
* If disabling IRQ time accounting would not be useful, there wouldn't
be a kernel configuration option that controls whether it is enabled
or disabled - it would be enabled all the time.
* If enabling IRQ time accounting would be essential, all Linux
distributors would enable it. In the x86 kernels I checked, IRQ time
accounting is disabled (Debian and openSUSE).
* For x86 there is already a kernel parameter for disabling IRQ time
accounting (tsc=noirqtime).
* Modern hardware, e.g. UFSHCI 4.0 controllers, supports sending the
completion interrupt to the CPU core that submitted the I/O. With such
hardware IRQ overload (100% spent in IRQ handlers and 0% outside IRQ
handlers) is impossible because the submitter is slowed down by the
completion interrupts.
* The performance overhead of CONFIG_IRQ_TIME_ACCOUNTING is
unacceptable. A quick test in an x86 VM shows that enabling
CONFIG_IRQ_TIME_ACCOUNTING reduces IOPS by 10% (220K -> 200K). On an
Android setup I measured an IOPS reduction of 40% (100K -> 60K) due
to CONFIG_IRQ_TIME_ACCOUNTING. This is not acceptable.

Thanks,

Bart.

2023-06-30 15:08:09

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Make IRQ time accounting configurable at boot time

On 06/16/23 09:45, Peter Zijlstra wrote:
> On Thu, Jun 15, 2023 at 01:37:26PM -0700, Bart Van Assche wrote:
> > Some producers of Android devices want IRQ time accounting enabled while
> > others want IRQ time accounting disabled. Hence, make IRQ time accounting
> > configurable at boot time.
>
> Why would they want this disabled? IRQ time accounting avoids a number
> of issues under high irq/softirq pressure.
>
> Disabling this makes no sense.

I think it is assumed that IRQ time accounting is only used for stat
collection (which is what I thought too), but based on this response I can see
it is used to ensure we account for stolen time in update_rq_clock_task() so it
helps to make it account more accurately for the time a task actually spent
running and doing useful work.

Bart, could you profile the cause of the high overhead? The config message says
a small perf impact, but 40% mentioned in your v2 is high. Could you try to
break down the problem further as we might be overlooking the true source of
overhead or miss an opportunity to improve the accounting logic instead?


Thanks

--
Qais Yousef