From: "Gautham R. Shenoy" <[email protected]>
In the situations where snooze is the only cpuidle state due to
firmware not exposing any platform idle states, the idle CPUs will
remain in snooze for a long time with interrupts disabled causing the
Hard-lockup detector to complain.
watchdog: CPU 51 detected hard LOCKUP on other CPUs 59
watchdog: CPU 51 TB:535296107736, last SMP heartbeat TB:527472229239 (15281ms ago)
watchdog: CPU 59 Hard LOCKUP
watchdog: CPU 59 TB:535296252849, last heartbeat TB:526554725466 (17073ms ago)
Fix this by adding CPUIDLE_FLAG_POLLING flag to the state, so that the
cpuidle governor will do the right thing, such as not stopping the
tick if it is going to put the idle cpu to snooze.
Reported-by: Akshay Adiga <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Gautham R. Shenoy <[email protected]>
---
drivers/cpuidle/cpuidle-powernv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
index d29e4f0..b73041b 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -156,6 +156,7 @@ static int stop_loop(struct cpuidle_device *dev,
{ /* Snooze */
.name = "snooze",
.desc = "snooze",
+ .flags = CPUIDLE_FLAG_POLLING,
.exit_latency = 0,
.target_residency = 0,
.enter = snooze_loop },
--
1.9.4
* Gautham R Shenoy <[email protected]> [2018-07-03 10:54:16]:
> From: "Gautham R. Shenoy" <[email protected]>
>
> In the situations where snooze is the only cpuidle state due to
> firmware not exposing any platform idle states, the idle CPUs will
> remain in snooze for a long time with interrupts disabled causing the
> Hard-lockup detector to complain.
snooze_loop() will spin in SMT low priority with interrupt enabled. We
have local_irq_enable() before we get into the snooze loop.
Since this is a polling state, we should wakeup without an interrupt
and hence we set TIF_POLLING_NRFLAG as well.
> watchdog: CPU 51 detected hard LOCKUP on other CPUs 59
> watchdog: CPU 51 TB:535296107736, last SMP heartbeat TB:527472229239 (15281ms ago)
> watchdog: CPU 59 Hard LOCKUP
> watchdog: CPU 59 TB:535296252849, last heartbeat TB:526554725466 (17073ms ago)
hmm.. not sure why watchdog will complain, maybe something more is
going on.
> Fix this by adding CPUIDLE_FLAG_POLLING flag to the state, so that the
> cpuidle governor will do the right thing, such as not stopping the
> tick if it is going to put the idle cpu to snooze.
>
> Reported-by: Akshay Adiga <[email protected]>
> Cc: Nicholas Piggin <[email protected]>
> Signed-off-by: Gautham R. Shenoy <[email protected]>
> ---
> drivers/cpuidle/cpuidle-powernv.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> index d29e4f0..b73041b 100644
> --- a/drivers/cpuidle/cpuidle-powernv.c
> +++ b/drivers/cpuidle/cpuidle-powernv.c
> @@ -156,6 +156,7 @@ static int stop_loop(struct cpuidle_device *dev,
> { /* Snooze */
> .name = "snooze",
> .desc = "snooze",
> + .flags = CPUIDLE_FLAG_POLLING,
> .exit_latency = 0,
> .target_residency = 0,
> .enter = snooze_loop },
Adding the CPUIDLE_FLAG_POLLING is good and enables more optimization.
But the reason that we spin with interrupt disabled does not seem
right.
--Vaidy
On Tue, Jul 03, 2018 at 07:36:16PM +0530, Vaidyanathan Srinivasan wrote:
> * Gautham R Shenoy <[email protected]> [2018-07-03 10:54:16]:
>
> > From: "Gautham R. Shenoy" <[email protected]>
> >
> > In the situations where snooze is the only cpuidle state due to
> > firmware not exposing any platform idle states, the idle CPUs will
> > remain in snooze for a long time with interrupts disabled causing the
> > Hard-lockup detector to complain.
>
> snooze_loop() will spin in SMT low priority with interrupt enabled. We
> have local_irq_enable() before we get into the snooze loop.
> Since this is a polling state, we should wakeup without an interrupt
> and hence we set TIF_POLLING_NRFLAG as well.
>
You are right. We have a local_irq_enable() inside the snooze_loop.
>
> > watchdog: CPU 51 detected hard LOCKUP on other CPUs 59
> > watchdog: CPU 51 TB:535296107736, last SMP heartbeat TB:527472229239 (15281ms ago)
> > watchdog: CPU 59 Hard LOCKUP
> > watchdog: CPU 59 TB:535296252849, last heartbeat TB:526554725466 (17073ms ago)
>
> hmm.. not sure why watchdog will complain, maybe something more is
> going on.
Will look into this Vaidy.
>
> > Fix this by adding CPUIDLE_FLAG_POLLING flag to the state, so that the
> > cpuidle governor will do the right thing, such as not stopping the
> > tick if it is going to put the idle cpu to snooze.
> >
> > Reported-by: Akshay Adiga <[email protected]>
> > Cc: Nicholas Piggin <[email protected]>
> > Signed-off-by: Gautham R. Shenoy <[email protected]>
> > ---
> > drivers/cpuidle/cpuidle-powernv.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c
> > index d29e4f0..b73041b 100644
> > --- a/drivers/cpuidle/cpuidle-powernv.c
> > +++ b/drivers/cpuidle/cpuidle-powernv.c
> > @@ -156,6 +156,7 @@ static int stop_loop(struct cpuidle_device *dev,
> > { /* Snooze */
> > .name = "snooze",
> > .desc = "snooze",
> > + .flags = CPUIDLE_FLAG_POLLING,
> > .exit_latency = 0,
> > .target_residency = 0,
> > .enter = snooze_loop },
>
> Adding the CPUIDLE_FLAG_POLLING is good and enables more optimization.
> But the reason that we spin with interrupt disabled does not seem
> right.
Fair point.
>
> --Vaidy
>
Gautham R Shenoy <[email protected]> writes:
> On Tue, Jul 03, 2018 at 07:36:16PM +0530, Vaidyanathan Srinivasan wrote:
>> * Gautham R Shenoy <[email protected]> [2018-07-03 10:54:16]:
>>
>> > From: "Gautham R. Shenoy" <[email protected]>
>> >
>> > In the situations where snooze is the only cpuidle state due to
>> > firmware not exposing any platform idle states, the idle CPUs will
>> > remain in snooze for a long time with interrupts disabled causing the
>> > Hard-lockup detector to complain.
>>
>> snooze_loop() will spin in SMT low priority with interrupt enabled. We
>> have local_irq_enable() before we get into the snooze loop.
>> Since this is a polling state, we should wakeup without an interrupt
>> and hence we set TIF_POLLING_NRFLAG as well.
>>
>
> You are right. We have a local_irq_enable() inside the snooze_loop.
>>
>> > watchdog: CPU 51 detected hard LOCKUP on other CPUs 59
>> > watchdog: CPU 51 TB:535296107736, last SMP heartbeat TB:527472229239 (15281ms ago)
>> > watchdog: CPU 59 Hard LOCKUP
>> > watchdog: CPU 59 TB:535296252849, last heartbeat TB:526554725466 (17073ms ago)
>>
>> hmm.. not sure why watchdog will complain, maybe something more is
>> going on.
>
> Will look into this Vaidy.
I'll wait for a v2.
cheers