LinuxLists.cc - [PATCH] cpufreq: Stop BUGing the system

2014-12-17 15:52:40

Subject: [PATCH] cpufreq: Stop BUGing the system

CPUFRreq subsystem is not a system catastrophic failure point.
Failures in these cases DONOT need complete system shutdown with BUG.
just refuse to let cpufreq function should be good enough.

Signed-off-by: Nishanth Menon <[email protected]>
---
drivers/cpufreq/cpufreq.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a09a29c..a5aa2fa 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -281,7 +281,10 @@ static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, unsigned int state)
{
- BUG_ON(irqs_disabled());
+ if (irqs_disabled()) {
+ WARN(1, "IRQs disabled!\n");
+ return;
+ }

if (cpufreq_disabled())
return;
@@ -1253,9 +1256,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
/*
* Reaching here after boot in a few seconds may not
* mean that system will remain stable at "unknown"
- * frequency for longer duration. Hence, a BUG_ON().
+ * frequency for longer duration. Hence, a WARN().
*/
- BUG_ON(ret);
+ if (ret) {
+ WARN(1, "SYSTEM operating at invalid freq %u", policy->cur);
+ goto err_out_unregister;
+ }
pr_warn("%s: CPU%d: Unlisted initial frequency changed to: %u KHz\n",
__func__, policy->cpu, policy->cur);
}
@@ -2556,7 +2562,10 @@ static int __init cpufreq_core_init(void)
return -ENODEV;

cpufreq_global_kobject = kobject_create();
- BUG_ON(!cpufreq_global_kobject);
+ if (!cpufreq_global_kobject) {
+ WARN(1, "No memory for cpufreq_global_kobject\n");
+ return -ENOMEM;
+ }

return 0;
}
--
1.7.9.5

2014-12-17 19:16:41

by Kevin Hilman

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Stop BUGing the system

Nishanth Menon <[email protected]> writes:

> CPUFRreq subsystem is not a system catastrophic failure point.
> Failures in these cases DONOT need complete system shutdown with BUG.
> just refuse to let cpufreq function should be good enough.
>
> Signed-off-by: Nishanth Menon <[email protected]>

Acked-by: Kevin Hilman <[email protected]>

2014-12-18 02:08:44

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Stop BUGing the system

On 17 December 2014 at 21:21, Nishanth Menon <[email protected]> wrote:
> CPUFRreq subsystem is not a system catastrophic failure point.
> Failures in these cases DONOT need complete system shutdown with BUG.
> just refuse to let cpufreq function should be good enough.
>
> Signed-off-by: Nishanth Menon <[email protected]>
> ---
> drivers/cpufreq/cpufreq.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index a09a29c..a5aa2fa 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -281,7 +281,10 @@ static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
> static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
> struct cpufreq_freqs *freqs, unsigned int state)
> {
> - BUG_ON(irqs_disabled());
> + if (irqs_disabled()) {
> + WARN(1, "IRQs disabled!\n");
> + return;
> + }

What about:

> + if (WARN(irqs_disabled(), "IRQs disabled!\n")
> + return;

Same for the last change as well..

>
> if (cpufreq_disabled())
> return;
> @@ -1253,9 +1256,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> /*
> * Reaching here after boot in a few seconds may not
> * mean that system will remain stable at "unknown"
> - * frequency for longer duration. Hence, a BUG_ON().
> + * frequency for longer duration. Hence, a WARN().
> */
> - BUG_ON(ret);
> + if (ret) {
> + WARN(1, "SYSTEM operating at invalid freq %u", policy->cur);
> + goto err_out_unregister;
> + }

And I still don't agree for this one. We shouldn't keep on working on a
potentially unstable frequency.

2014-12-18 14:49:22

by Nishanth Menon

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Stop BUGing the system

On 07:38-20141218, Viresh Kumar wrote:
> On 17 December 2014 at 21:21, Nishanth Menon <[email protected]> wrote:
> > CPUFRreq subsystem is not a system catastrophic failure point.
> > Failures in these cases DONOT need complete system shutdown with BUG.
> > just refuse to let cpufreq function should be good enough.
> >
> > Signed-off-by: Nishanth Menon <[email protected]>
> > ---
> > drivers/cpufreq/cpufreq.c | 17 +++++++++++++----
> > 1 file changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index a09a29c..a5aa2fa 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -281,7 +281,10 @@ static inline void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
> > static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
> > struct cpufreq_freqs *freqs, unsigned int state)
> > {
> > - BUG_ON(irqs_disabled());
> > + if (irqs_disabled()) {
> > + WARN(1, "IRQs disabled!\n");
> > + return;
> > + }
>
> What about:
>
> > + if (WARN(irqs_disabled(), "IRQs disabled!\n")
> > + return;
>
> Same for the last change as well..

k.
>
> >
> > if (cpufreq_disabled())
> > return;
> > @@ -1253,9 +1256,12 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> > /*
> > * Reaching here after boot in a few seconds may not
> > * mean that system will remain stable at "unknown"
> > - * frequency for longer duration. Hence, a BUG_ON().
> > + * frequency for longer duration. Hence, a WARN().
> > */
> > - BUG_ON(ret);
> > + if (ret) {
> > + WARN(1, "SYSTEM operating at invalid freq %u", policy->cur);
> > + goto err_out_unregister;
> > + }
>
> And I still don't agree for this one. We shouldn't keep on working on a
> potentially unstable frequency.

I can add "could be unstable" -> the point being there can be psuedo
errors reported in the system - example - clock framework bugs. Dont
just stop the boot. example: what if cpufreq was a driver module - it
would not have rescued the system because cpufreq had'nt detected the
logic - if we are going to force this on the system, we should probably
not do this in cpufreq code, instead should be somewhere generic.

While I do empathise (and had infact advocated in the past) of not
favouring system attempting to continue at an invalid configuration and
our attempt to rescue has failed - given that we cannot provide a
consistent behavior (it is not a core system behavior) and potential of
a false-postive (example clk framework or underlying bug), it should be
good enough to "enhance" WARN to be "severe sounding enough" to
flag it for developer and continue while keeping the system alive as
much as possible.

--
Regards,
Nishanth Menon

2014-12-19 01:41:22

by Viresh Kumar

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Stop BUGing the system

On 18 December 2014 at 20:19, Nishanth Menon <[email protected]> wrote:
> I can add "could be unstable" -> the point being there can be psuedo
> errors reported in the system - example - clock framework bugs. Dont
> just stop the boot. example: what if cpufreq was a driver module - it
> would not have rescued the system because cpufreq had'nt detected the
> logic - if we are going to force this on the system, we should probably
> not do this in cpufreq code, instead should be somewhere generic.
>
> While I do empathise (and had infact advocated in the past) of not
> favouring system attempting to continue at an invalid configuration and
> our attempt to rescue has failed - given that we cannot provide a
> consistent behavior (it is not a core system behavior) and potential of
> a false-postive (example clk framework or underlying bug), it should be
> good enough to "enhance" WARN to be "severe sounding enough" to
> flag it for developer and continue while keeping the system alive as
> much as possible.

There is no way out for the kernel to know if its a false positive or a real
bug. And in the worst case, it can screw up a platform completely.

I am still not sure if changing it to a WARN would be good idea.

@Rafael: Thoughts ?

2014-12-19 01:46:46

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Stop BUGing the system

On Friday, December 19, 2014 07:11:19 AM Viresh Kumar wrote:
> On 18 December 2014 at 20:19, Nishanth Menon <[email protected]> wrote:
> > I can add "could be unstable" -> the point being there can be psuedo
> > errors reported in the system - example - clock framework bugs. Dont
> > just stop the boot. example: what if cpufreq was a driver module - it
> > would not have rescued the system because cpufreq had'nt detected the
> > logic - if we are going to force this on the system, we should probably
> > not do this in cpufreq code, instead should be somewhere generic.
> >
> > While I do empathise (and had infact advocated in the past) of not
> > favouring system attempting to continue at an invalid configuration and
> > our attempt to rescue has failed - given that we cannot provide a
> > consistent behavior (it is not a core system behavior) and potential of
> > a false-postive (example clk framework or underlying bug), it should be
> > good enough to "enhance" WARN to be "severe sounding enough" to
> > flag it for developer and continue while keeping the system alive as
> > much as possible.
>
> There is no way out for the kernel to know if its a false positive or a real
> bug. And in the worst case, it can screw up a platform completely.
>
> I am still not sure if changing it to a WARN would be good idea.
>
> @Rafael: Thoughts ?

I'm a bit divided here. On the one hand I don't like BUG_ON() as a rule and it
is used in too many places where it doesn't have to be used.

On the other hand, in this particular case, I'm not sure if allowing the system
to run without cpufreq when it might rely on it for CPU cooling, for one example,
is a good idea.

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-12-24 15:06:11

by Nishanth Menon

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Stop BUGing the system

On 12/18/2014 08:08 PM, Rafael J. Wysocki wrote:
> On Friday, December 19, 2014 07:11:19 AM Viresh Kumar wrote:
>> On 18 December 2014 at 20:19, Nishanth Menon <[email protected]> wrote:
>>> I can add "could be unstable" -> the point being there can be psuedo
>>> errors reported in the system - example - clock framework bugs. Dont
>>> just stop the boot. example: what if cpufreq was a driver module - it
>>> would not have rescued the system because cpufreq had'nt detected the
>>> logic - if we are going to force this on the system, we should probably
>>> not do this in cpufreq code, instead should be somewhere generic.
>>>
>>> While I do empathise (and had infact advocated in the past) of not
>>> favouring system attempting to continue at an invalid configuration and
>>> our attempt to rescue has failed - given that we cannot provide a
>>> consistent behavior (it is not a core system behavior) and potential of
>>> a false-postive (example clk framework or underlying bug), it should be
>>> good enough to "enhance" WARN to be "severe sounding enough" to
>>> flag it for developer and continue while keeping the system alive as
>>> much as possible.
>>
>> There is no way out for the kernel to know if its a false positive or a real
>> bug. And in the worst case, it can screw up a platform completely.
>>
>> I am still not sure if changing it to a WARN would be good idea.
>>
>> @Rafael: Thoughts ?
>
> I'm a bit divided here. On the one hand I don't like BUG_ON() as a rule and it
> is used in too many places where it doesn't have to be used.
>
> On the other hand, in this particular case, I'm not sure if allowing the system
> to run without cpufreq when it might rely on it for CPU cooling, for one example,
> is a good idea.

but then, CPUFReq is not a mandatory feature - we could as well do the
same with CPU_FREQ disabled.

--
Regards,
Nishanth Menon

2014-12-27 20:15:21

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Stop BUGing the system

On Wednesday, December 24, 2014 09:06:05 AM Nishanth Menon wrote:
> On 12/18/2014 08:08 PM, Rafael J. Wysocki wrote:
> > On Friday, December 19, 2014 07:11:19 AM Viresh Kumar wrote:
> >> On 18 December 2014 at 20:19, Nishanth Menon <[email protected]> wrote:
> >>> I can add "could be unstable" -> the point being there can be psuedo
> >>> errors reported in the system - example - clock framework bugs. Dont
> >>> just stop the boot. example: what if cpufreq was a driver module - it
> >>> would not have rescued the system because cpufreq had'nt detected the
> >>> logic - if we are going to force this on the system, we should probably
> >>> not do this in cpufreq code, instead should be somewhere generic.
> >>>
> >>> While I do empathise (and had infact advocated in the past) of not
> >>> favouring system attempting to continue at an invalid configuration and
> >>> our attempt to rescue has failed - given that we cannot provide a
> >>> consistent behavior (it is not a core system behavior) and potential of
> >>> a false-postive (example clk framework or underlying bug), it should be
> >>> good enough to "enhance" WARN to be "severe sounding enough" to
> >>> flag it for developer and continue while keeping the system alive as
> >>> much as possible.
> >>
> >> There is no way out for the kernel to know if its a false positive or a real
> >> bug. And in the worst case, it can screw up a platform completely.
> >>
> >> I am still not sure if changing it to a WARN would be good idea.
> >>
> >> @Rafael: Thoughts ?
> >
> > I'm a bit divided here. On the one hand I don't like BUG_ON() as a rule and it
> > is used in too many places where it doesn't have to be used.
> >
> > On the other hand, in this particular case, I'm not sure if allowing the system
> > to run without cpufreq when it might rely on it for CPU cooling, for one example,
> > is a good idea.
>
> but then, CPUFReq is not a mandatory feature - we could as well do the
> same with CPU_FREQ disabled.

Some platforms pretty much require CPU_FREQ and will always have it set, but
with the $subject patch they may end up not using it.

So this isn't a valid argument to me, sorry.

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.