Hi,
The first part of the series is cpuidle callback fixes against timers,
some of which haven't been Signed by Peter yet.
Another part is removing the overhead of useless TIF_NR_POLLING clearing.
The rest is comments.
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/idle
HEAD: b66dd12bb29cca558b9323f2b270a7dae8f56c48
Thanks,
Frederic
---
Frederic Weisbecker (8):
x86: Add a comment about the "magic" behind shadow sti before mwait
cpuidle: Report illegal tick stopped while polling
cpuidle: Comment about timers requirements VS idle handler
cpuidle: Remove unnecessary current_clr_polling_and_test() from haltpoll
cpuidle: Remove unnecessary current_clr_polling() on poll_idle()
x86: Remove __current_clr_polling() from mwait_idle()
x86: Remove the current_clr_polling() call upon mwait exit
sched/timers: Explain why idle task schedules out on remote timer enqueue
Peter Zijlstra (2):
cpuidle: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
loongson: Fix idle VS timer enqueue
arch/loongarch/kernel/genex.S | 32 ++++++++++++++++++--------------
arch/loongarch/kernel/idle.c | 1 -
arch/x86/include/asm/mwait.h | 21 ++++++++++++++++++---
arch/x86/kernel/process.c | 1 -
drivers/acpi/acpi_pad.c | 1 +
drivers/cpuidle/cpuidle-haltpoll.c | 5 +----
drivers/cpuidle/poll_state.c | 10 ++++++++--
drivers/idle/intel_idle.c | 19 +++++++------------
kernel/sched/core.c | 22 ++++++++++++++++++++++
kernel/sched/idle.c | 30 ++++++++++++++++++++++++++++++
10 files changed, 105 insertions(+), 37 deletions(-)
poll_idle() can't be called while the tick is stopped because it enables
interrupts and only polls on TIF_NEED_RESCHED, which doesn't tell if an
interrupt queues a timer that would require a tick re-programming.
There is no point anyway to poll with the tick stopped so add a check
to make sure it never happens.
Signed-off-by: Frederic Weisbecker <[email protected]>
---
drivers/cpuidle/poll_state.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 9b6d90a72601..009f46f121ae 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -7,6 +7,7 @@
#include <linux/sched.h>
#include <linux/sched/clock.h>
#include <linux/sched/idle.h>
+#include <linux/tick.h>
#define POLL_IDLE_RELAX_COUNT 200
@@ -19,6 +20,13 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
dev->poll_time_limit = false;
+ /*
+ * This re-enables IRQs and only polls on TIF_NEED_RESCHED.
+ * A timer queued by an interrupt here may go unnoticed if
+ * the tick is stopped.
+ */
+ WARN_ON_ONCE(tick_nohz_tick_stopped());
+
raw_local_irq_enable();
if (!current_set_polling_and_test()) {
unsigned int loop_count = 0;
--
2.34.1
mwait_idle_with_hints() is mostly called from cpuidle which already sets
back TIF_NR_POLLING and fold TIF_NEED_RESCHED if necessary.
The only non-cpuidle caller is power_saving_thread() which acts as an idle
loop and is the only reason why mwait_idle_with_hints() carries a costly
fully ordered atomic operation upon idle exit.
Make this overhead proper to power_saving_thread() instead which already
duplicates quite some cpuidle code.
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/include/asm/mwait.h | 1 -
drivers/acpi/acpi_pad.c | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 920426d691ce..c1be3775b94a 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -134,7 +134,6 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
}
}
}
- current_clr_polling();
}
/*
diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
index 7a453c5ff303..5a44afafe185 100644
--- a/drivers/acpi/acpi_pad.c
+++ b/drivers/acpi/acpi_pad.c
@@ -174,6 +174,7 @@ static int power_saving_thread(void *data)
stop_critical_timings();
mwait_idle_with_hints(power_saving_mwait_eax, 1);
+ current_clr_polling();
start_critical_timings();
tick_broadcast_exit();
--
2.34.1
There is no point in clearing TIF_NR_POLLING and folding TIF_NEED_RESCHED
upon poll_idle() exit because cpuidle_idle_call() is going to set again
TIF_NR_POLLING anyway. Also if TIF_NEED_RESCHED is set, it will be
folded and TIF_NR_POLLING will be cleared at the end of do_idle().
Signed-off-by: Frederic Weisbecker <[email protected]>
---
drivers/cpuidle/poll_state.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 009f46f121ae..44a656464d06 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -48,8 +48,6 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
}
raw_local_irq_disable();
- current_clr_polling();
-
return index;
}
--
2.34.1
Add a note to make sure we never miss and break the requirements behind
it.
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/include/asm/mwait.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 778df05f8539..341ee4f1d91e 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -87,6 +87,15 @@ static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx,
:: "a" (eax), "b" (ebx), "c" (ecx));
}
+/*
+ * Re-enable interrupts right upon calling mwait in such a way that
+ * no interrupt can fire _before_ the execution of mwait, ie: no
+ * instruction must be placed between "sti" and "mwait".
+ *
+ * This is necessary because if an interrupt queues a timer before
+ * executing mwait, it would otherwise go unnoticed and the next tick
+ * would not be reprogrammed accordingly before mwait ever wakes up.
+ */
static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
{
mds_idle_clear_cpu_buffers();
--
2.34.1
Trying to avoid that didn't bring much value after testing, add comment
about this.
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/sched/core.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..e53b892167ad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1135,6 +1135,28 @@ static void wake_up_idle_cpu(int cpu)
if (cpu == smp_processor_id())
return;
+ /*
+ * Set TIF_NEED_RESCHED and send an IPI if in the non-polling
+ * part of the idle loop. This forces an exit from the idle loop
+ * and a round trip to schedule(). Now this could be optimized
+ * because a simple new idle loop iteration is enough to
+ * re-evaluate the next tick. Provided some re-ordering of tick
+ * nohz functions that would need to follow TIF_NR_POLLING
+ * clearing:
+ *
+ * - On most archs, a simple fetch_or on ti::flags with a
+ * "0" value would be enough to know if an IPI needs to be sent.
+ *
+ * - x86 needs to perform a last need_resched() check between
+ * monitor and mwait which doesn't take timers into account.
+ * There a dedicated TIF_TIMER flag would be required to
+ * fetch_or here and be checked along with TIF_NEED_RESCHED
+ * before mwait().
+ *
+ * However, remote timer enqueue is not such a frequent event
+ * and testing of the above solutions didn't appear to report
+ * much benefits.
+ */
if (set_nr_and_not_polling(rq->idle))
smp_send_reschedule(cpu);
else
--
2.34.1
From: Peter Zijlstra <[email protected]>
Loongson re-enables interrupts on its idle routine and performs a
TIF_NEED_RESCHED check afterwards before putting the CPU to sleep.
The IRQs firing between the check and the idling instruction may set the
TIF_NEED_RESCHED flag. In order to deal with the such a race, IRQs
interrupting __arch_cpu_idle() rollback their return address to the
beginning of __arch_cpu_idle() so that TIF_NEED_RESCHED is checked
again before going back to sleep.
However idle IRQs can also queue timers that may require a tick
reprogramming through a new generic idle loop iteration but those timers
would go unnoticed here because __arch_cpu_idle() only checks
TIF_NEED_RESCHED. It doesn't check for pending timers.
Fix this with fast-forwarding idle IRQs return value to the end of the
idle routine instead of the beginning, so that the generic idle loop
handles both TIF_NEED_RESCHED and pending timers.
Fixes: 0603839b18f4 (LoongArch: Add exception/interrupt handling)
Tested-by: Bibo, Mao <[email protected]>
Not-yet-signed-off-by: Peter Zijlstra <[email protected]>
Cc: WANG Xuerui <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/loongarch/kernel/genex.S | 32 ++++++++++++++++++--------------
arch/loongarch/kernel/idle.c | 1 -
2 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/arch/loongarch/kernel/genex.S b/arch/loongarch/kernel/genex.S
index 78f066384657..359d693f112e 100644
--- a/arch/loongarch/kernel/genex.S
+++ b/arch/loongarch/kernel/genex.S
@@ -18,27 +18,31 @@
.align 5
SYM_FUNC_START(__arch_cpu_idle)
- /* start of rollback region */
- LONG_L t0, tp, TI_FLAGS
- nop
- andi t0, t0, _TIF_NEED_RESCHED
- bnez t0, 1f
- nop
- nop
- nop
+ /* start of idle interrupt region */
+ move t0, CSR_CRMD_IE
+ csrxchg t0, t0, LOONGARCH_CSR_CRMD
+ /*
+ * If an interrupt lands here; between enabling interrupts above and
+ * going idle on the next instruction, we must *NOT* go idle since the
+ * interrupt could have set TIF_NEED_RESCHED or caused an timer to need
+ * reprogramming. Fall through -- see handle_vint() below -- and have
+ * the idle loop take care of things.
+ */
idle 0
- /* end of rollback region */
-1: jr ra
+ nop
+ /* end of idle interrupt region */
+SYM_INNER_LABEL(__arch_cpu_idle_exit, SYM_L_LOCAL)
+ jr ra
SYM_FUNC_END(__arch_cpu_idle)
SYM_FUNC_START(handle_vint)
BACKUP_T0T1
SAVE_ALL
- la_abs t1, __arch_cpu_idle
+ la_abs t1, __arch_cpu_idle_exit
LONG_L t0, sp, PT_ERA
- /* 32 byte rollback region */
- ori t0, t0, 0x1f
- xori t0, t0, 0x1f
+ /* 16 byte idle interrupt region */
+ ori t0, t0, 0x0f
+ addi.d t0, t0, 1
bne t0, t1, 1f
LONG_S t0, sp, PT_ERA
1: move a0, sp
diff --git a/arch/loongarch/kernel/idle.c b/arch/loongarch/kernel/idle.c
index 0b5dd2faeb90..5ba72d229920 100644
--- a/arch/loongarch/kernel/idle.c
+++ b/arch/loongarch/kernel/idle.c
@@ -11,7 +11,6 @@
void __cpuidle arch_cpu_idle(void)
{
- raw_local_irq_enable();
__arch_cpu_idle(); /* idle instruction needs irq enabled */
raw_local_irq_disable();
}
--
2.34.1
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <[email protected]> wrote:
>
> There is no point in clearing TIF_NR_POLLING and folding TIF_NEED_RESCHED
> upon poll_idle() exit because cpuidle_idle_call() is going to set again
> TIF_NR_POLLING anyway. Also if TIF_NEED_RESCHED is set, it will be
> folded and TIF_NR_POLLING will be cleared at the end of do_idle().
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/cpuidle/poll_state.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 009f46f121ae..44a656464d06 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -48,8 +48,6 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
> }
> raw_local_irq_disable();
>
> - current_clr_polling();
> -
> return index;
> }
>
> --
> 2.34.1
>
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <[email protected]> wrote:
>
> Trying to avoid that didn't bring much value after testing, add comment
> about this.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
> ---
> kernel/sched/core.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c52c2eba7c73..e53b892167ad 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1135,6 +1135,28 @@ static void wake_up_idle_cpu(int cpu)
> if (cpu == smp_processor_id())
> return;
>
> + /*
> + * Set TIF_NEED_RESCHED and send an IPI if in the non-polling
> + * part of the idle loop. This forces an exit from the idle loop
> + * and a round trip to schedule(). Now this could be optimized
> + * because a simple new idle loop iteration is enough to
> + * re-evaluate the next tick. Provided some re-ordering of tick
> + * nohz functions that would need to follow TIF_NR_POLLING
> + * clearing:
> + *
> + * - On most archs, a simple fetch_or on ti::flags with a
> + * "0" value would be enough to know if an IPI needs to be sent.
> + *
> + * - x86 needs to perform a last need_resched() check between
> + * monitor and mwait which doesn't take timers into account.
> + * There a dedicated TIF_TIMER flag would be required to
> + * fetch_or here and be checked along with TIF_NEED_RESCHED
> + * before mwait().
> + *
> + * However, remote timer enqueue is not such a frequent event
> + * and testing of the above solutions didn't appear to report
> + * much benefits.
> + */
> if (set_nr_and_not_polling(rq->idle))
> smp_send_reschedule(cpu);
> else
> --
> 2.34.1
>
On Fri, Aug 11, 2023 at 7:00 PM Frederic Weisbecker <[email protected]> wrote:
>
> Add a note to make sure we never miss and break the requirements behind
> it.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
> ---
> arch/x86/include/asm/mwait.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 778df05f8539..341ee4f1d91e 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -87,6 +87,15 @@ static __always_inline void __mwaitx(unsigned long eax, unsigned long ebx,
> :: "a" (eax), "b" (ebx), "c" (ecx));
> }
>
> +/*
> + * Re-enable interrupts right upon calling mwait in such a way that
> + * no interrupt can fire _before_ the execution of mwait, ie: no
> + * instruction must be placed between "sti" and "mwait".
> + *
> + * This is necessary because if an interrupt queues a timer before
> + * executing mwait, it would otherwise go unnoticed and the next tick
> + * would not be reprogrammed accordingly before mwait ever wakes up.
> + */
> static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> {
> mds_idle_clear_cpu_buffers();
> --
> 2.34.1
>
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <[email protected]> wrote:
>
> poll_idle() can't be called while the tick is stopped because it enables
> interrupts and only polls on TIF_NEED_RESCHED, which doesn't tell if an
> interrupt queues a timer that would require a tick re-programming.
>
> There is no point anyway to poll with the tick stopped so add a check
> to make sure it never happens.
I'd rather update governors so they never use polling states when the
tick has been stopped and then add the WARN_ON().
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> drivers/cpuidle/poll_state.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
> index 9b6d90a72601..009f46f121ae 100644
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -7,6 +7,7 @@
> #include <linux/sched.h>
> #include <linux/sched/clock.h>
> #include <linux/sched/idle.h>
> +#include <linux/tick.h>
>
> #define POLL_IDLE_RELAX_COUNT 200
>
> @@ -19,6 +20,13 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
>
> dev->poll_time_limit = false;
>
> + /*
> + * This re-enables IRQs and only polls on TIF_NEED_RESCHED.
> + * A timer queued by an interrupt here may go unnoticed if
> + * the tick is stopped.
> + */
> + WARN_ON_ONCE(tick_nohz_tick_stopped());
> +
> raw_local_irq_enable();
> if (!current_set_polling_and_test()) {
> unsigned int loop_count = 0;
> --
> 2.34.1
>
On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <[email protected]> wrote:
>
> mwait_idle_with_hints() is mostly called from cpuidle which already sets
> back TIF_NR_POLLING and fold TIF_NEED_RESCHED if necessary.
>
> The only non-cpuidle caller is power_saving_thread() which acts as an idle
> loop and is the only reason why mwait_idle_with_hints() carries a costly
> fully ordered atomic operation upon idle exit.
>
> Make this overhead proper to power_saving_thread() instead which already
> duplicates quite some cpuidle code.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
> ---
> arch/x86/include/asm/mwait.h | 1 -
> drivers/acpi/acpi_pad.c | 1 +
> 2 files changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 920426d691ce..c1be3775b94a 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -134,7 +134,6 @@ static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned lo
> }
> }
> }
> - current_clr_polling();
> }
>
> /*
> diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
> index 7a453c5ff303..5a44afafe185 100644
> --- a/drivers/acpi/acpi_pad.c
> +++ b/drivers/acpi/acpi_pad.c
> @@ -174,6 +174,7 @@ static int power_saving_thread(void *data)
> stop_critical_timings();
>
> mwait_idle_with_hints(power_saving_mwait_eax, 1);
> + current_clr_polling();
>
> start_critical_timings();
> tick_broadcast_exit();
> --
> 2.34.1
>
+Add huacai
asm instruction move should be replaced with li.w, the other looks good to me.
diff --git a/arch/loongarch/kernel/genex.S b/arch/loongarch/kernel/genex.S
index 359d693f112e..8a98023ecafd 100644
--- a/arch/loongarch/kernel/genex.S
+++ b/arch/loongarch/kernel/genex.S
@@ -19,7 +19,7 @@
.align 5
SYM_FUNC_START(__arch_cpu_idle)
/* start of idle interrupt region */
- move t0, CSR_CRMD_IE
+ li.w t0, CSR_CRMD_IE
csrxchg t0, t0, LOONGARCH_CSR_CRMD
/*
* If an interrupt lands here; between enabling interrupts above and
By the way __arch_cpu_idle is called by machine_halt/machine_power_off etc,
irq will be enabled with new patch. Need we add something like this?
diff --git a/arch/loongarch/kernel/reset.c b/arch/loongarch/kernel/reset.c
index 1ef8c6383535..9ecd42c0c804 100644
--- a/arch/loongarch/kernel/reset.c
+++ b/arch/loongarch/kernel/reset.c
@@ -20,6 +20,11 @@
void (*pm_power_off)(void);
EXPORT_SYMBOL(pm_power_off);
+static __always_inline void native_halt(void)
+{
+ asm volatile("idle 0": : :"memory");
+}
+
void machine_halt(void)
{
#ifdef CONFIG_SMP
@@ -32,9 +37,9 @@ void machine_halt(void)
pr_notice("\n\n** You can safely turn off the power now **\n\n");
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
- while (true) {
- __arch_cpu_idle();
- }
+ while (1) {
+ native_halt();
+ };
}
void machine_power_off(void)
@@ -52,9 +57,9 @@ void machine_power_off(void)
efi.reset_system(EFI_RESET_SHUTDOWN, EFI_SUCCESS, 0, NULL);
#endif
- while (true) {
- __arch_cpu_idle();
- }
+ while (1) {
+ native_halt();
+ };
}
void machine_restart(char *command)
@@ -73,7 +78,7 @@ void machine_restart(char *command)
if (!acpi_disabled)
acpi_reboot();
- while (true) {
- __arch_cpu_idle();
- }
+ while (1) {
+ native_halt();
+ };
}
Regards
Bibo Mao
在 2023/8/12 01:00, Frederic Weisbecker 写道:
> From: Peter Zijlstra <[email protected]>
>
> Loongson re-enables interrupts on its idle routine and performs a
> TIF_NEED_RESCHED check afterwards before putting the CPU to sleep.
>
> The IRQs firing between the check and the idling instruction may set the
> TIF_NEED_RESCHED flag. In order to deal with the such a race, IRQs
> interrupting __arch_cpu_idle() rollback their return address to the
> beginning of __arch_cpu_idle() so that TIF_NEED_RESCHED is checked
> again before going back to sleep.
>
> However idle IRQs can also queue timers that may require a tick
> reprogramming through a new generic idle loop iteration but those timers
> would go unnoticed here because __arch_cpu_idle() only checks
> TIF_NEED_RESCHED. It doesn't check for pending timers.
>
> Fix this with fast-forwarding idle IRQs return value to the end of the
> idle routine instead of the beginning, so that the generic idle loop
> handles both TIF_NEED_RESCHED and pending timers.
>
> Fixes: 0603839b18f4 (LoongArch: Add exception/interrupt handling)
> Tested-by: Bibo, Mao <[email protected]>
> Not-yet-signed-off-by: Peter Zijlstra <[email protected]>
> Cc: WANG Xuerui <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> ---
> arch/loongarch/kernel/genex.S | 32 ++++++++++++++++++--------------
> arch/loongarch/kernel/idle.c | 1 -
> 2 files changed, 18 insertions(+), 15 deletions(-)
>
> diff --git a/arch/loongarch/kernel/genex.S b/arch/loongarch/kernel/genex.S
> index 78f066384657..359d693f112e 100644
> --- a/arch/loongarch/kernel/genex.S
> +++ b/arch/loongarch/kernel/genex.S
> @@ -18,27 +18,31 @@
>
> .align 5
> SYM_FUNC_START(__arch_cpu_idle)
> - /* start of rollback region */
> - LONG_L t0, tp, TI_FLAGS
> - nop
> - andi t0, t0, _TIF_NEED_RESCHED
> - bnez t0, 1f
> - nop
> - nop
> - nop
> + /* start of idle interrupt region */
> + move t0, CSR_CRMD_IE
> + csrxchg t0, t0, LOONGARCH_CSR_CRMD
> + /*
> + * If an interrupt lands here; between enabling interrupts above and
> + * going idle on the next instruction, we must *NOT* go idle since the
> + * interrupt could have set TIF_NEED_RESCHED or caused an timer to need
> + * reprogramming. Fall through -- see handle_vint() below -- and have
> + * the idle loop take care of things.
> + */
> idle 0
> - /* end of rollback region */
> -1: jr ra
> + nop
> + /* end of idle interrupt region */
> +SYM_INNER_LABEL(__arch_cpu_idle_exit, SYM_L_LOCAL)
> + jr ra
> SYM_FUNC_END(__arch_cpu_idle)
>
> SYM_FUNC_START(handle_vint)
> BACKUP_T0T1
> SAVE_ALL
> - la_abs t1, __arch_cpu_idle
> + la_abs t1, __arch_cpu_idle_exit
> LONG_L t0, sp, PT_ERA
> - /* 32 byte rollback region */
> - ori t0, t0, 0x1f
> - xori t0, t0, 0x1f
> + /* 16 byte idle interrupt region */
> + ori t0, t0, 0x0f
> + addi.d t0, t0, 1
> bne t0, t1, 1f
> LONG_S t0, sp, PT_ERA
> 1: move a0, sp
> diff --git a/arch/loongarch/kernel/idle.c b/arch/loongarch/kernel/idle.c
> index 0b5dd2faeb90..5ba72d229920 100644
> --- a/arch/loongarch/kernel/idle.c
> +++ b/arch/loongarch/kernel/idle.c
> @@ -11,7 +11,6 @@
>
> void __cpuidle arch_cpu_idle(void)
> {
> - raw_local_irq_enable();
> __arch_cpu_idle(); /* idle instruction needs irq enabled */
> raw_local_irq_disable();
> }
On Tue, Aug 15, 2023 at 06:10:43PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 11, 2023 at 07:00:39PM +0200, Frederic Weisbecker wrote:
> > Hi,
> >
> > The first part of the series is cpuidle callback fixes against timers,
> > some of which haven't been Signed by Peter yet.
> >
> > Another part is removing the overhead of useless TIF_NR_POLLING clearing.
>
> So I've again forgotten why we don't simply set TIF_NEED_RESCHED if we
> need the timer re-programmed. That is by far the simplest fix.
>
> I'm sure we talked about it, but this was a long time ago and I can't
> remember :/
I don't think we did but the rationale is that with forcing setting
TIF_NEED_RESCHED, you force a needless timer restart (which is then going
to be cancelled shortly after) and a round trip to the scheduler with the
rq lock overhead, etc...
Just for the fun I just tried the following change:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..ec43d135cf65 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1132,8 +1132,10 @@ static void wake_up_idle_cpu(int cpu)
{
struct rq *rq = cpu_rq(cpu);
- if (cpu == smp_processor_id())
+ if (cpu == smp_processor_id()) {
+ set_tsk_need_resched(current);
return;
+ }
if (set_nr_and_not_polling(rq->idle))
smp_send_reschedule(cpu);
Then I computed the average of 100 runs of "make clean -C tools/perf; make -C
tools/perf/" before and after this patch.
I observed an average regression of 1.27% less time spent in C-states.
So this has a measurable impact.
>
> Anyway, the patches look good, except I think there's a whole bunch of
> architectures that still need fixing. In particular since loongson
> 'borrowed' the whole lot from MIPS, they need an identical fix. But I'm
> sure there's more architectures affected.
MIPS at least yes, I only did a quick check and it seems that most archs
use a "wfi" like instruction. I'll check for others.
Thanks.
On Fri, Aug 11, 2023 at 07:37:55PM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 11, 2023 at 7:01 PM Frederic Weisbecker <[email protected]> wrote:
> >
> > poll_idle() can't be called while the tick is stopped because it enables
> > interrupts and only polls on TIF_NEED_RESCHED, which doesn't tell if an
> > interrupt queues a timer that would require a tick re-programming.
> >
> > There is no point anyway to poll with the tick stopped so add a check
> > to make sure it never happens.
>
> I'd rather update governors so they never use polling states when the
> tick has been stopped and then add the WARN_ON().
Sounds good, let's put this one aside.
In the meantime feel free to pick up those you acked and see fit in your
queue.
Thanks!