2008-08-20 17:02:30

by Thomas Renninger

[permalink] [raw]
Subject: [RFC] Introduce interface to report BIOS bugs

The whole description is in the first patch.

Is this acceptable?
Comments, enhancement suggestions?


2008-08-20 17:02:51

by Thomas Renninger

[permalink] [raw]
Subject: [PATCH 1/3] Introduce interface to report BIOS bugs

From: Christian Kornacker <[email protected]>

This is mostly needed for ACPI systems.
ACPI introduces an endless amount of possible BIOS
bugs like wrong values, missing functions, etc.
The kernel has to sanity check all of them and should
report BIOS bugs as such to the user.

ACPI is the main target, of course others, who already declare BIOS bugs,
also benefit from this, e.g. PCI:
arch/x86/pci/pcbios.c:
printk(KERN_WARNING "bios32_service(0x%lx): returned 0x%x -- BIOS bug!\n",
printk (KERN_ERR "PCI: BIOS BUG #%x[%08x] found\n",
...
This one I stumbled over recently (when >4GB BIOS sets up IO mem for this
device wrongly on some Dell notebooks):
ohci_hcd 0000:00:02.0: USB HC takeover failed! (BIOS/SMM bug)
...


There are two kind of BIOS bug messages introduced:
- FW_PRINT_CRIT(..)
Is intended to replace
printk(KERN_ERR/KERN_CRIT/KERN_EMERG/KERN_WARN "BIOS bug...");
messages. The string will always be compiled into the kernel and thus
use some memory. Depending on the severity it may or may not pop up
in the syslogs as:
Jun 11 11:28:11 linux kernel: [BIOS] ...

- FW_PRINT_WARN(..)
Is intended to replace
printk (KERN_WARN/KERN_INFO "BIOS bug...");
messages which may result in minor malfunction of a device, less
performance or just any kind of more or less harmless BIOS bug which
vendors should still correct in the future.
The only difference to above FW_PRINT_CRIT(..) is, that these
messages could get compiled out on production kernels.

Advantage:

- Be able to detect BIOS bugs as such through userspace programs, e.g.
linuxfirmwarekit.

- Easier testing for HW vendors for Linux compatibility.

- Makes it easier for the ordinary user how to proceed when machine/device
is not working: When a BIOS bug is shown in dmesg, first step should
be to search for a BIOS update.

- Makes it easier for certification and QA people testing Linux.
Certification of BIOS/HW should always fail if BIOS bugs with a level
of e.g. FW_ERR or FW_WARN happen. It's hard for people not being
deeply involved in a subsystem to decide how critical a bug is. In general
they need to ask a kernel developer searching in the code who will finally
tell them that this is a BIOS bug and QA/Certification should poke the
vendor to fix this up. The step to ask the kernel developer should not be
needed anymore then.

Difference to printk:
- No newline needed
- Severity is an extra argument instead a string getting concatinated
to the message


Signed-off-by: Thomas Renninger <[email protected]>
---
include/linux/firmware_error.h | 36 ++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 10 ++++++++++
2 files changed, 46 insertions(+), 0 deletions(-)
create mode 100644 include/linux/firmware_error.h

diff --git a/include/linux/firmware_error.h b/include/linux/firmware_error.h
new file mode 100644
index 0000000..74d454e
--- /dev/null
+++ b/include/linux/firmware_error.h
@@ -0,0 +1,36 @@
+/*
+ * Firmware error reporting interface
+ *
+ */
+
+#include <linux/kernel.h>
+
+#define FW_EMERG KERN_EMERG /* System cannot boot */
+#define FW_ALERT KERN_ALERT /* Risk of HW or data damage,
+ e.g. overheating, dmraid */
+#define FW_CRIT KERN_CRIT /* A major device is not functional
+ e.g. hpet, lapic, network... */
+#define FW_ERR KERN_ERR /* A major device is not working
+ as expected, e.g. cpufreq stuck
+ to lowest freq, lowered
+ performance, increased power
+ consumption... */
+#define FW_WARN KERN_WARNING /* A minor device does not work
+ or is not fully functional,
+ e.g. backlight brightness,
+ Hotplug capabilities of a
+ device that should be
+ hot-plugable will not work */
+#define FW_INFO KERN_INFO /* Anything else related to BIOS
+ that is worth mentioning */
+
+
+#ifdef CONFIG_REPORT_FIRMWARE_BUGS
+ #define FW_PRINT_WARN(severity, fmt, args...) printk("%s[BIOS]: " fmt "\n", \
+ severity, ##args)
+#else
+ #define FW_PRINT_WARN(severity, fmt, args...) do { } while (0)
+#endif
+
+#define FW_PRINT_CRIT(severity, fmt, args...) printk("%s[BIOS]: " fmt "\n", \
+ severity, ##args)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 800ac84..6743d09 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -143,6 +143,16 @@ config DEBUG_SHIRQ
Drivers ought to be able to handle interrupts coming in at those
points; some don't and need to be caught.

+config REPORT_FIRMWARE_BUGS
+ bool "Report Firmware Bugs"
+ default y
+ help
+ This option will make the kernel print out all firmware bug messages
+ it finds. This especially is very useful on ACPI systems where
+ potentially a lot firmware bugs can happen and should be reported.
+
+ Always say yes here unless memory really matters.
+
config DETECT_SOFTLOCKUP
bool "Detect Soft Lockups"
depends on DEBUG_KERNEL && !S390
--
1.5.4.5

2008-08-20 17:03:46

by Thomas Renninger

[permalink] [raw]
Subject: [PATCH 3/3] acpi-cpufreq: Make use of firmware bug report interface

Enhance check for processor cpufreq support, but missing or wrong ACPI
cpufreq package information. This often happens if the BIOS is older
as the CPU. Users should be told to update the BIOS then.

Not sure whether the check will show false negatives, e.g. some old
Pentium M (Banias/Dothan) worked with static compiled-in cpufreq tables.
-> needs review.
Best probably is to check whether a _PPC function exists in the processor
object, but this must be done in drivers/acpi/processor_*.c.
This should always be an indicator for an too old BIOS.

Signed-off-by: Thomas Renninger <[email protected]>
---
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 16 +++++++++++++++-
1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index dd097b8..1288142 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -26,6 +26,7 @@
*/

#include <linux/kernel.h>
+#include <linux/firmware_error.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/smp.h>
@@ -67,6 +68,8 @@ struct acpi_cpufreq_data {
unsigned int cpu_feature;
};

+#define PFX "acpi-cpufreq: "
+
static DEFINE_PER_CPU(struct acpi_cpufreq_data *, drv_data);

/* acpi_perf_data is a pointer to percpu data. */
@@ -354,7 +357,12 @@ static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
/*
* The dreaded BIOS frequency change behind our back.
* Force set the frequency on next target call.
+ * Dell tends to do this on their laptops and it is evil.
*/
+ FW_PRINT_WARN(FW_WARN, PFX " BIOS must not change frequency "
+ "on _PPC changes, let OS do this");
+
+
data->resume = 1;
}

@@ -587,8 +595,14 @@ static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy)
acpi_cpufreq_driver.flags |= CPUFREQ_CONST_LOOPS;

result = acpi_processor_register_performance(data->acpi_data, cpu);
- if (result)
+ if (result) {
+ if (cpu_has(c, X86_FEATURE_EST)) {
+ FW_PRINT_CRIT(FW_ERR, PFX "Speedstep supported, but "
+ "invalid ACPI CPU information, BIOS "
+ "update needed");
+ }
goto err_free;
+ }

perf = data->acpi_data;
policy->shared_type = perf->shared_type;
--
1.5.4.5

2008-08-20 17:03:21

by Thomas Renninger

[permalink] [raw]
Subject: [PATCH 2/3] Powernow-k8: Make use of firmware bug report interface

Signed-off-by: Thomas Renninger <[email protected]>
---
arch/x86/kernel/cpu/cpufreq/powernow-k8.c | 45 ++++++++++++++++-------------
1 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
index 4e72719..f05025f 100644
--- a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
+++ b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
@@ -25,6 +25,7 @@
*/

#include <linux/kernel.h>
+#include <linux/firmware_error.h>
#include <linux/smp.h>
#include <linux/module.h>
#include <linux/init.h>
@@ -45,7 +46,6 @@
#endif

#define PFX "powernow-k8: "
-#define BFX PFX "BIOS error: "
#define VERSION "version 2.20.00"
#include "powernow-k8.h"

@@ -536,35 +536,39 @@ static int check_pst_table(struct powernow_k8_data *data, struct pst_s *pst, u8

for (j = 0; j < data->numps; j++) {
if (pst[j].vid > LEAST_VID) {
- printk(KERN_ERR PFX "vid %d invalid : 0x%x\n", j, pst[j].vid);
+ FW_PRINT_CRIT(FW_ERR, PFX "vid %d invalid : 0x%x", j, pst[j].vid);
return -EINVAL;
}
if (pst[j].vid < data->rvo) { /* vid + rvo >= 0 */
- printk(KERN_ERR BFX "0 vid exceeded with pstate %d\n", j);
+ FW_PRINT_CRIT(FW_ERR, PFX "0 vid exceeded with pstate"
+ " %d", j);
return -ENODEV;
}
if (pst[j].vid < maxvid + data->rvo) { /* vid + rvo >= maxvid */
- printk(KERN_ERR BFX "maxvid exceeded with pstate %d\n", j);
+ FW_PRINT_CRIT(FW_ERR, PFX "maxvid exceeded with "
+ "pstate %d", j);
return -ENODEV;
}
if (pst[j].fid > MAX_FID) {
- printk(KERN_ERR BFX "maxfid exceeded with pstate %d\n", j);
+ FW_PRINT_CRIT(FW_ERR, PFX "maxfid exceeded with pstate "
+ "%d", j);
return -ENODEV;
}
if (j && (pst[j].fid < HI_FID_TABLE_BOTTOM)) {
/* Only first fid is allowed to be in "low" range */
- printk(KERN_ERR BFX "two low fids - %d : 0x%x\n", j, pst[j].fid);
+ FW_PRINT_CRIT(FW_ERR, PFX "two low fids - %d : 0x%x",
+ j, pst[j].fid);
return -EINVAL;
}
if (pst[j].fid < lastfid)
lastfid = pst[j].fid;
}
if (lastfid & 1) {
- printk(KERN_ERR BFX "lastfid invalid\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "lastfid invalid");
return -EINVAL;
}
if (lastfid > LO_FID_TABLE_TOP)
- printk(KERN_INFO BFX "first fid not from lo freq table\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "first fid not from lo freq table");

return 0;
}
@@ -672,13 +676,13 @@ static int find_psb_table(struct powernow_k8_data *data)

dprintk("table vers: 0x%x\n", psb->tableversion);
if (psb->tableversion != PSB_VERSION_1_4) {
- printk(KERN_ERR BFX "PSB table is not v1.4\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "PSB table is not v1.4");
return -ENODEV;
}

dprintk("flags: 0x%x\n", psb->flags1);
if (psb->flags1) {
- printk(KERN_ERR BFX "unknown flags\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "unknown flags");
return -ENODEV;
}

@@ -705,7 +709,7 @@ static int find_psb_table(struct powernow_k8_data *data)
}
}
if (cpst != 1) {
- printk(KERN_ERR BFX "numpst must be 1\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "numpst must be 1");
return -ENODEV;
}

@@ -730,7 +734,7 @@ static int find_psb_table(struct powernow_k8_data *data)
* BIOS and Kernel Developer's Guide, which is available on
* http://www.amd.com
*/
- printk(KERN_ERR PFX "BIOS error - no PSB or ACPI _PSS objects\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "BIOS error - no PSB or ACPI _PSS objects");
return -ENODEV;
}

@@ -860,8 +864,7 @@ static int fill_powernow_table_pstate(struct powernow_k8_data *data, struct cpuf

index = data->acpi_data->states[i].control & HW_PSTATE_MASK;
if (index > data->max_hw_pstate) {
- printk(KERN_ERR PFX "invalid pstate %d - bad value %d.\n", i, index);
- printk(KERN_ERR PFX "Please report to BIOS manufacturer\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "invalid pstate %d - bad value %d", i, index);
powernow_table[i].frequency = CPUFREQ_ENTRY_INVALID;
continue;
}
@@ -1165,17 +1168,19 @@ static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol)
"ACPI Processor module before starting this "
"driver.\n");
#else
- printk(KERN_ERR PFX "Your BIOS does not provide ACPI "
- "_PSS objects in a way that Linux understands. "
- "Please report this to the Linux ACPI maintainers"
- " and complain to your BIOS vendor.\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "Your BIOS does not provide "
+ "ACPI _PSS objects in a way that Linux "
+ "understands. Please report this to the "
+ "Linux ACPI maintainers and complain to "
+ "your BIOS vendor");
#endif
kfree(data);
return -ENODEV;
}
if (pol->cpu != 0) {
- printk(KERN_ERR PFX "No ACPI _PSS objects for CPU other than "
- "CPU0. Complain to your BIOS vendor.\n");
+ FW_PRINT_CRIT(FW_ERR, PFX "No ACPI _PSS objects for CPU"
+ " other than CPU0. Complain to your BIOS"
+ " vendor");
kfree(data);
return -ENODEV;
}
--
1.5.4.5

2008-08-20 17:37:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/3] Introduce interface to report BIOS bugs

Thomas Renninger wrote:

I approve of the basic idea. Some nits in the implementation.

First shouted macros look ugly. Can you use something lower case
similar to dev_printk()?

> +
> +#define FW_EMERG KERN_EMERG /* System cannot boot */
> +#define FW_ALERT KERN_ALERT /* Risk of HW or data damage,
> + e.g. overheating, dmraid */
> +#define FW_CRIT KERN_CRIT /* A major device is not functional
> + e.g. hpet, lapic, network... */
> +#define FW_ERR KERN_ERR /* A major device is not working
> + as expected, e.g. cpufreq stuck
> + to lowest freq, lowered
> + performance, increased power
> + consumption... */

These should probably have another string after the KERN_* severities appended
because KERN_* doesn't make it into the syslog log files or serial
console logs and you would lose the severity then.


> +config REPORT_FIRMWARE_BUGS
> + bool "Report Firmware Bugs"
> + default y
> + help
> + This option will make the kernel print out all firmware bug messages
> + it finds. This especially is very useful on ACPI systems where
> + potentially a lot firmware bugs can happen and should be reported.
> +
> + Always say yes here unless memory really matters.

I don't think this should be a CONFIG. If someone really wants to save
that much memory they can disable printk.

-Andi

2008-08-20 18:38:04

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 1/3] Introduce interface to report BIOS bugs

On Wednesday 20 August 2008 11:02:04 am Thomas Renninger wrote:
> From: Christian Kornacker <[email protected]>
>
> This is mostly needed for ACPI systems.
> ACPI introduces an endless amount of possible BIOS
> bugs like wrong values, missing functions, etc.
> The kernel has to sanity check all of them and should
> report BIOS bugs as such to the user.

I can't quite decide whether the whole idea is over-engineering
or not. I guess my hesitation is mainly that things like this take
ongoing maintenance to keep them valuable, and that's often where
things fall apart.

> +#define FW_EMERG KERN_EMERG /* System cannot boot */
> +#define FW_ALERT KERN_ALERT /* Risk of HW or data damage,
> + e.g. overheating, dmraid */
> +#define FW_CRIT KERN_CRIT /* A major device is not functional
> + e.g. hpet, lapic, network... */
> +#define FW_ERR KERN_ERR /* A major device is not working
> + as expected, e.g. cpufreq stuck
> + to lowest freq, lowered
> + performance, increased power
> + consumption... */
> +#define FW_WARN KERN_WARNING /* A minor device does not work
> + or is not fully functional,
> + e.g. backlight brightness,
> + Hotplug capabilities of a
> + device that should be
> + hot-plugable will not work */
> +#define FW_INFO KERN_INFO /* Anything else related to BIOS
> + that is worth mentioning */
> +
> +
> +#ifdef CONFIG_REPORT_FIRMWARE_BUGS
> + #define FW_PRINT_WARN(severity, fmt, args...) printk("%s[BIOS]: " fmt "\n", \
> + severity, ##args)
> +#else
> + #define FW_PRINT_WARN(severity, fmt, args...) do { } while (0)
> +#endif
> +
> +#define FW_PRINT_CRIT(severity, fmt, args...) printk("%s[BIOS]: " fmt "\n", \
> + severity, ##args)

I think there are too many possibilities (FW_PRINT_WARN vs FW_PRINT_CRIT,
then one of FW_INFO, FW_WARN, FW_ERR, FW_CRIT, FW_ALERT, FW_EMERG).
A simpler interface with only one or two choices would give 90% of the
benefit.

My preference would be to *not* add a newline inside the interface.
Everybody knows printk needs a newline, and it's simpler if all the
printk variants follow that same rule.

The "BIOS" string is very x86-centric. I'd prefer something like
"firmware" or "FW" that's also applicable to non-x86 systems.

I'm on a real dev_printk() kick at the moment, so I'd like to see
a way to hook a message to *something*, whether it's a specific
device, an ACPI method, a table at a specific physical address, etc.
For example, this:

+ FW_PRINT_CRIT(FW_ERR, PFX "No ACPI _PSS objects for CPU"
+ " other than CPU0. Complain to your BIOS"
+ " vendor");

would be nicer if it could report the specific CPU device.
Admittedly, many of the places you touch don't currently have
an idea of a "device." But sometimes that's a deficiency in
the current Linux implementation, so I think your interface
should at least allow a device.

Maybe even something as simple as:

#define FW_BUG "[FW bug]: "

would be sufficient, with the idea that people could do this:

dev_err(&dev->dev, FW_BUG "interrupts left enabled\n");

I think the user-space value derives from having a consistent string
to grep for, so this gives you that. I'm not sure what value we get
from adding the new FW_PRINT_CRIT()/FW_PRINT_WARN() interfaces in the
kernel.

Bjorn

2008-08-21 13:52:21

by Thomas Renninger

[permalink] [raw]
Subject: Re: [PATCH 1/3] Introduce interface to report BIOS bugs

On Wednesday 20 August 2008 20:37:42 Bjorn Helgaas wrote:
> On Wednesday 20 August 2008 11:02:04 am Thomas Renninger wrote:
> > From: Christian Kornacker <[email protected]>
> >
> > This is mostly needed for ACPI systems.
> > ACPI introduces an endless amount of possible BIOS
> > bugs like wrong values, missing functions, etc.
> > The kernel has to sanity check all of them and should
> > report BIOS bugs as such to the user.
>
> I can't quite decide whether the whole idea is over-engineering
> or not. I guess my hesitation is mainly that things like this take
> ongoing maintenance to keep them valuable, and that's often where
> things fall apart.
It mainly depends on reviewing, telling people to use it if a BIOS
related bug happens.

> > +#define FW_EMERG KERN_EMERG /* System cannot boot */
> > +#define FW_ALERT KERN_ALERT /* Risk of HW or data damage,
> > + e.g. overheating, dmraid */
> > +#define FW_CRIT KERN_CRIT /* A major device is not functional
> > + e.g. hpet, lapic, network... */
> > +#define FW_ERR KERN_ERR /* A major device is not working
> > + as expected, e.g. cpufreq stuck
> > + to lowest freq, lowered
> > + performance, increased power
> > + consumption... */
> > +#define FW_WARN KERN_WARNING /* A minor device does not work
> > + or is not fully functional,
> > + e.g. backlight brightness,
> > + Hotplug capabilities of a
> > + device that should be
> > + hot-plugable will not work */
> > +#define FW_INFO KERN_INFO /* Anything else related to BIOS
> > + that is worth mentioning */
> > +
> > +
> > +#ifdef CONFIG_REPORT_FIRMWARE_BUGS
> > + #define FW_PRINT_WARN(severity, fmt, args...) printk("%s[BIOS]: " fmt
> > "\n", \ + severity, ##args)
> > +#else
> > + #define FW_PRINT_WARN(severity, fmt, args...) do { } while (0)
> > +#endif
> > +
> > +#define FW_PRINT_CRIT(severity, fmt, args...) printk("%s[BIOS]: " fmt
> > "\n", \ + severity, ##args)
>
> I think there are too many possibilities (FW_PRINT_WARN vs FW_PRINT_CRIT,
> then one of FW_INFO, FW_WARN, FW_ERR, FW_CRIT, FW_ALERT, FW_EMERG).
> A simpler interface with only one or two choices would give 90% of the
> benefit.
One idea is to use the printk severity feature as it is there anyway.
Another idea is that a lot BIOS violations do not necessarily
harm functionality, keyword "Windows compatibility" and other work
arounds.
You generally want to hide those from the customer, but there
may be use cases where you still want to know such things.

> My preference would be to *not* add a newline inside the interface.
> Everybody knows printk needs a newline, and it's simpler if all the
> printk variants follow that same rule.
>
> The "BIOS" string is very x86-centric. I'd prefer something like
> "firmware" or "FW" that's also applicable to non-x86 systems.
Firmware is a bit long?
For What is FW? FireWire?
Firmware could be used, if it is called BIOS everybody knows that it is
related to some firmware? Not sure, I am not a native English speaker.

> I'm on a real dev_printk() kick at the moment, so I'd like to see
> a way to hook a message to *something*, whether it's a specific
> device, an ACPI method, a table at a specific physical address, etc.
> For example, this:
>
> + FW_PRINT_CRIT(FW_ERR, PFX "No ACPI _PSS objects for
> CPU" + " other than CPU0. Complain to
> your BIOS" + " vendor");
>
> would be nicer if it could report the specific CPU device.
> Admittedly, many of the places you touch don't currently have
> an idea of a "device." But sometimes that's a deficiency in
> the current Linux implementation, so I think your interface
> should at least allow a device.
>
> Maybe even something as simple as:
>
> #define FW_BUG "[FW bug]: "
>
> would be sufficient, with the idea that people could do this:
>
> dev_err(&dev->dev, FW_BUG "interrupts left enabled\n");
>
> I think the user-space value derives from having a consistent string
> to grep for, so this gives you that. I'm not sure what value we get
> from adding the new FW_PRINT_CRIT()/FW_PRINT_WARN() interfaces in the
> kernel.

What about this one:
pr_fw_err() or dev_fw_err() on harmful firmware bugs
and
pr_fw_info() on non-harmful but ugly BIOS constructs
which should not exist, violate the spec or could make
trouble in the future (maybe Firmware is really better...):

diff --git a/include/linux/device.h b/include/linux/device.h
index d24a47f..3204cea 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -538,6 +538,9 @@ extern const char *dev_driver_string(struct device *dev);
#define dev_info(dev, format, arg...) \
dev_printk(KERN_INFO , dev , format , ## arg)

+#define dev_fw_err(dev, format, arg...) \
+ dev_printk(KERN_ERR "[BIOS Bug] ", dev , format , ## arg)
+
#ifdef DEBUG
#define dev_dbg(dev, format, arg...) \
dev_printk(KERN_DEBUG , dev , format , ## arg)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2651f80..b20f618 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -303,6 +303,12 @@ static inline char *pack_hex_byte(char *buf, u8 byte)
#define pr_info(fmt, arg...) \
printk(KERN_INFO fmt, ##arg)

+#define pr_fw_err(fmt, arg...) \
+ printk(KERN_ERR "[BIOS Bug]" fmt, ##arg)
+
+#define pr_fw_info(fmt, arg...) \
+ printk(KERN_INFO "[BIOS]" fmt, ##arg)
+
#ifdef DEBUG
/* If you are writing a driver, please use dev_dbg instead */
#define pr_debug(fmt, arg...) \

2008-08-21 15:19:59

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 1/3] Introduce interface to report BIOS bugs

On Thursday 21 August 2008 07:52:04 am Thomas Renninger wrote:
> On Wednesday 20 August 2008 20:37:42 Bjorn Helgaas wrote:

> > Maybe even something as simple as:
> >
> > #define FW_BUG "[FW bug]: "
> >
> > would be sufficient, with the idea that people could do this:
> >
> > dev_err(&dev->dev, FW_BUG "interrupts left enabled\n");
> >
> > I think the user-space value derives from having a consistent string
> > to grep for, so this gives you that. I'm not sure what value we get
> > from adding the new FW_PRINT_CRIT()/FW_PRINT_WARN() interfaces in the
> > kernel.
>
> What about this one:
> pr_fw_err() or dev_fw_err() on harmful firmware bugs
> and
> pr_fw_info() on non-harmful but ugly BIOS constructs
> which should not exist, violate the spec or could make
> trouble in the future (maybe Firmware is really better...):
>
> diff --git a/include/linux/device.h b/include/linux/device.h
> index d24a47f..3204cea 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -538,6 +538,9 @@ extern const char *dev_driver_string(struct device *dev);
> #define dev_info(dev, format, arg...) \
> dev_printk(KERN_INFO , dev , format , ## arg)
>
> +#define dev_fw_err(dev, format, arg...) \
> + dev_printk(KERN_ERR "[BIOS Bug] ", dev , format , ## arg)
> +
> #ifdef DEBUG
> #define dev_dbg(dev, format, arg...) \
> dev_printk(KERN_DEBUG , dev , format , ## arg)
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 2651f80..b20f618 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -303,6 +303,12 @@ static inline char *pack_hex_byte(char *buf, u8 byte)
> #define pr_info(fmt, arg...) \
> printk(KERN_INFO fmt, ##arg)
>
> +#define pr_fw_err(fmt, arg...) \
> + printk(KERN_ERR "[BIOS Bug]" fmt, ##arg)
> +
> +#define pr_fw_info(fmt, arg...) \
> + printk(KERN_INFO "[BIOS]" fmt, ##arg)
> +
> #ifdef DEBUG
> /* If you are writing a driver, please use dev_dbg instead */
> #define pr_debug(fmt, arg...) \

What value do we gain by defining dev_fw_err(), pr_fw_err(), and
pr_fw_info(), versus just defining FW_BUG? If they're useful
for something, fine; it's just that I don't know yet what they add.

Bjorn

2008-08-22 10:19:23

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 1/3] Introduce interface to report BIOS bugs

Hi!

>
> This is mostly needed for ACPI systems.
> ACPI introduces an endless amount of possible BIOS
> bugs like wrong values, missing functions, etc.
> The kernel has to sanity check all of them and should
> report BIOS bugs as such to the user.
>
> ACPI is the main target, of course others, who already declare BIOS bugs,
> also benefit from this, e.g. PCI:
> arch/x86/pci/pcbios.c:
> printk(KERN_WARNING "bios32_service(0x%lx): returned 0x%x -- BIOS bug!\n",
> printk (KERN_ERR "PCI: BIOS BUG #%x[%08x] found\n",
> ...
> This one I stumbled over recently (when >4GB BIOS sets up IO mem for this
> device wrongly on some Dell notebooks):
> ohci_hcd 0000:00:02.0: USB HC takeover failed! (BIOS/SMM bug)

I like the idea... Plus, it would be cool to have message clearly
state if Linux could work around this problem.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-08-27 13:27:26

by Thomas Renninger

[permalink] [raw]
Subject: [PATCH 1/3] Introduce FW_BUG and FW_INFO to consistenly tell users about BIOS bugs

The idea is to add this to printk after the severity:
printk(KERN_ERR FW_BUG "This is not our fault\n");

If a Firmware issue should be hidden, because it is
work-arounded, but you still want to see something popping up e.g.
for info only:
printk(KERN_INFO FW_INFO "This is done stupid, we can handle it,
but it should better be avoided in future\n");

or on the Linuxfirmwarekit to tell vendors that they did something
stupid or wrong without bothering the user:
printk(KERN_INFO FW_BUG "This is done stupid, we can handle it,
but it should better be avoided in future\n");

Signed-off-by: Thomas Renninger <[email protected]>
---
include/linux/kernel.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 2651f80..68ebfc2 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -190,6 +190,9 @@ extern int kernel_text_address(unsigned long addr);
struct pid;
extern struct pid *session_of_pgrp(struct pid *pgrp);

+#define FW_BUG "[FW Bug]: "
+#define FW_INFO "[FW Info]: "
+
#ifdef CONFIG_PRINTK
asmlinkage int vprintk(const char *fmt, va_list args)
__attribute__ ((format (printf, 1, 0)));
--
1.5.4.5

2008-08-27 13:27:40

by Thomas Renninger

[permalink] [raw]
Subject: [PATCH 3/3] CPUFREQ: processor.ko: Try to detect old BIOS, not supporting CPU freq on a recent CPU.

On Intel CPUs it is rather common and a good hint that BIOSes which do provide
_PPC func, but not the frequencies itself in _PSS function, are old and need
to be updated for CPU freq support.
Tell the user he has a BIOS/firmware problem.

Signed-off-by: Thomas Renninger <[email protected]>
---
drivers/acpi/processor_perflib.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 0133af4..3a2886f 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -38,6 +38,7 @@

#include <asm/uaccess.h>
#endif
+#include <asm/cpufeature.h>

#include <acpi/acpi_bus.h>
#include <acpi/processor.h>
@@ -334,7 +335,6 @@ static int acpi_processor_get_performance_info(struct acpi_processor *pr)
acpi_status status = AE_OK;
acpi_handle handle = NULL;

-
if (!pr || !pr->performance || !pr->handle)
return -EINVAL;

@@ -347,13 +347,25 @@ static int acpi_processor_get_performance_info(struct acpi_processor *pr)

result = acpi_processor_get_performance_control(pr);
if (result)
- return result;
+ goto update_bios;

result = acpi_processor_get_performance_states(pr);
if (result)
- return result;
+ goto update_bios;

return 0;
+
+ /*
+ * Having _PPC but missing frequencies (_PSS, _PCT) is a very good hint that
+ * the BIOS is older than the CPU and does not know its frequencies
+ */
+ update_bios:
+ if (ACPI_SUCCESS(acpi_get_handle(pr->handle, "_PPC", &handle))){
+ if(boot_cpu_has(X86_FEATURE_EST))
+ printk(KERN_WARNING FW_BUG "BIOS needs update for CPU "
+ "frequency support\n");
+ }
+ return result;
}

int acpi_processor_notify_smm(struct module *calling_module)
--
1.5.4.5

2008-08-27 13:27:53

by Thomas Renninger

[permalink] [raw]
Subject: Introduce interface to report BIOS bugs (reworked, FW_BUG simple solution)

Even simplier, using FW_BUG define. I agree that the first implementation
was over-designed, I somehow liked the printk_fw_err() and printk_fw_info()
more, looked somehow more consistent with:

#define pr_info(fmt, arg...) \
printk(KERN_INFO fmt, ##arg)
(and others).

Hmm, it shouldn't really matter.
I wonder whether it works out that messages are hidden to the ordinary
user with printk(KERN_INFO FW_BUG...).
Something that vendors get an idea that they should not do this, but
hiding it on a normally booted system is needed (but e.g. show it on a
linuxfirmwarekit booted kernel).

printk(KERN_DEBUG FW_BUG...)
Won't work because there you get too much output?
Are these patches sufficient to achieve above?

Thomas

PS: Forgot to add Pavel to CC who also commented on previous patches,
it's work to do it right now, I expect you read this anyway.


2008-08-27 13:28:16

by Thomas Renninger

[permalink] [raw]
Subject: [PATCH 2/3] CPUFREQ: powernow-k8: Try to detect old BIOS, not supporting CPU freq on a recent AMD CPUs.

Signed-off-by: Thomas Renninger <[email protected]>
---
arch/x86/kernel/cpu/cpufreq/powernow-k8.c | 42 ++++++++++++++++------------
1 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
index 4e72719..e3fa464 100644
--- a/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
+++ b/arch/x86/kernel/cpu/cpufreq/powernow-k8.c
@@ -45,7 +45,6 @@
#endif

#define PFX "powernow-k8: "
-#define BFX PFX "BIOS error: "
#define VERSION "version 2.20.00"
#include "powernow-k8.h"

@@ -536,35 +535,40 @@ static int check_pst_table(struct powernow_k8_data *data, struct pst_s *pst, u8

for (j = 0; j < data->numps; j++) {
if (pst[j].vid > LEAST_VID) {
- printk(KERN_ERR PFX "vid %d invalid : 0x%x\n", j, pst[j].vid);
+ printk(KERN_ERR FW_BUG PFX "vid %d invalid : 0x%x\n",
+ j, pst[j].vid);
return -EINVAL;
}
if (pst[j].vid < data->rvo) { /* vid + rvo >= 0 */
- printk(KERN_ERR BFX "0 vid exceeded with pstate %d\n", j);
+ printk(KERN_ERR FW_BUG PFX "0 vid exceeded with pstate"
+ " %d\n", j);
return -ENODEV;
}
if (pst[j].vid < maxvid + data->rvo) { /* vid + rvo >= maxvid */
- printk(KERN_ERR BFX "maxvid exceeded with pstate %d\n", j);
+ printk(KERN_ERR FW_BUG PFX "maxvid exceeded with pstate"
+ " %d\n", j);
return -ENODEV;
}
if (pst[j].fid > MAX_FID) {
- printk(KERN_ERR BFX "maxfid exceeded with pstate %d\n", j);
+ printk(KERN_ERR FW_BUG PFX "maxfid exceeded with pstate"
+ " %d\n", j);
return -ENODEV;
}
if (j && (pst[j].fid < HI_FID_TABLE_BOTTOM)) {
/* Only first fid is allowed to be in "low" range */
- printk(KERN_ERR BFX "two low fids - %d : 0x%x\n", j, pst[j].fid);
+ printk(KERN_ERR FW_BUG PFX "two low fids - %d : "
+ "0x%x\n", j, pst[j].fid);
return -EINVAL;
}
if (pst[j].fid < lastfid)
lastfid = pst[j].fid;
}
if (lastfid & 1) {
- printk(KERN_ERR BFX "lastfid invalid\n");
+ printk(KERN_ERR FW_BUG PFX "lastfid invalid\n");
return -EINVAL;
}
if (lastfid > LO_FID_TABLE_TOP)
- printk(KERN_INFO BFX "first fid not from lo freq table\n");
+ printk(KERN_INFO FW_BUG PFX "first fid not from lo freq table\n");

return 0;
}
@@ -672,13 +676,13 @@ static int find_psb_table(struct powernow_k8_data *data)

dprintk("table vers: 0x%x\n", psb->tableversion);
if (psb->tableversion != PSB_VERSION_1_4) {
- printk(KERN_ERR BFX "PSB table is not v1.4\n");
+ printk(KERN_ERR FW_BUG PFX "PSB table is not v1.4\n");
return -ENODEV;
}

dprintk("flags: 0x%x\n", psb->flags1);
if (psb->flags1) {
- printk(KERN_ERR BFX "unknown flags\n");
+ printk(KERN_ERR FW_BUG PFX "unknown flags\n");
return -ENODEV;
}

@@ -705,7 +709,7 @@ static int find_psb_table(struct powernow_k8_data *data)
}
}
if (cpst != 1) {
- printk(KERN_ERR BFX "numpst must be 1\n");
+ printk(KERN_ERR FW_BUG PFX "numpst must be 1\n");
return -ENODEV;
}

@@ -1165,17 +1169,19 @@ static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol)
"ACPI Processor module before starting this "
"driver.\n");
#else
- printk(KERN_ERR PFX "Your BIOS does not provide ACPI "
- "_PSS objects in a way that Linux understands. "
- "Please report this to the Linux ACPI maintainers"
- " and complain to your BIOS vendor.\n");
+ printk(KERN_ERR FW_BUG PFX "Your BIOS does not provide"
+ " ACPI _PSS objects in a way that Linux "
+ "understands. Please report this to the Linux "
+ "ACPI maintainers and complain to your BIOS "
+ "vendor.\n");
#endif
kfree(data);
return -ENODEV;
}
if (pol->cpu != 0) {
- printk(KERN_ERR PFX "No ACPI _PSS objects for CPU other than "
- "CPU0. Complain to your BIOS vendor.\n");
+ printk(KERN_ERR FW_BUG PFX "No ACPI _PSS objects for "
+ "CPU other than CPU0. Complain to your BIOS "
+ "vendor.\n");
kfree(data);
return -ENODEV;
}
@@ -1225,7 +1231,7 @@ static int __cpuinit powernowk8_cpu_init(struct cpufreq_policy *pol)

/* min/max the cpu is capable of */
if (cpufreq_frequency_table_cpuinfo(pol, data->powernow_table)) {
- printk(KERN_ERR PFX "invalid powernow_table\n");
+ printk(KERN_ERR FW_BUG PFX "invalid powernow_table\n");
powernow_k8_cpu_exit_acpi(data);
kfree(data->powernow_table);
kfree(data);
--
1.5.4.5

2008-08-27 13:34:30

by Thomas Renninger

[permalink] [raw]
Subject: Re: Introduce interface to report BIOS bugs (reworked, FW_BUG simple solution)

On Wednesday 27 August 2008 15:27:08 Thomas Renninger wrote:
> Even simplier, using FW_BUG define. I agree that the first implementation
...
Ehh, this is not against ACPI test but latest Linus .27-rcX tree.
I hope/think it should still patch in both.

While this is fairly simple code, be aware that it's only compile tested.

Thomas

2008-08-27 15:19:37

by Andi Kleen

[permalink] [raw]
Subject: Re: Introduce interface to report BIOS bugs (reworked, FW_BUG simple solution)

Thomas Renninger wrote:
> Even simplier, using FW_BUG define. I agree that the first implementation
> was over-designed, I somehow liked the printk_fw_err() and printk_fw_info()
> more, looked somehow more consistent with:

Looks good to me now. The only issue is that the patchkit is cross subsystem
(cpufreq and ACPI). Either the patches go in through Andrew or we need
to figure out how to merge this.
-Andi

2008-08-30 21:51:46

by Pavel Machek

[permalink] [raw]
Subject: Re: Introduce interface to report BIOS bugs (reworked, FW_BUG simple solution)

On Wed 2008-08-27 15:27:08, Thomas Renninger wrote:
> Even simplier, using FW_BUG define. I agree that the first implementation
> was over-designed, I somehow liked the printk_fw_err() and printk_fw_info()
> more, looked somehow more consistent with:
>
> #define pr_info(fmt, arg...) \
> printk(KERN_INFO fmt, ##arg)
> (and others).
>
> Hmm, it shouldn't really matter.
> I wonder whether it works out that messages are hidden to the ordinary
> user with printk(KERN_INFO FW_BUG...).
> Something that vendors get an idea that they should not do this, but
> hiding it on a normally booted system is needed (but e.g. show it on a
> linuxfirmwarekit booted kernel).
>
> printk(KERN_DEBUG FW_BUG...)
> Won't work because there you get too much output?
> Are these patches sufficient to achieve above?
>
> Thomas
>
> PS: Forgot to add Pavel to CC who also commented on previous patches,
> it's work to do it right now, I expect you read this anyway.

Pavel listens, but sees no patch on either this or followup mail.

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html