In collecting this report, oopses and warnings with versions prior to 2.6.27 are ignored.
This week, a total of 5450 oopses and warnings have been reported of version 2.6.27+,
compared to 2198 reports in the previous week.
This report is a bit different than the previous weeks; all 2.6.26 and earlier issues are no
longer used, which means the top 12 has shuffled quite a bit, with some new star appearances.
Also I've reworked the "are these two backtraces the same" algorithm; the website should now
be presenting a more compact/concise view due to having the backtraces consolidated in a much
more logical (for the human) way.
Per file statistics
936 external/virtualbox/module
602 drivers/pci/slot.c
455 drivers/net/wireless/iwlwifi/iwl-tx.c
364 kernel/power/main.c
274 drivers/net/r8169.c
231 drivers/net/wireless/iwlwifi/iwl-3945-rs.c
231 fs/jbd/journal.c
227 arch/x86/include/asm/mtrr.h
147 drivers/ata/libata-sff.c
137 drivers/net/sis900.c
71 net/ipv4/tcp.c
62 drivers/gpu/drm/radeon/radeon_cp.c
Rank 1: VBoxDrvLinuxIOCtl (warning)
Reported 934 times (1635 total reports)
[external] bug in the VirtualBox drivers
This warning was last seen in version 2.6.28-rc3, and first seen in 2.6.25.11.
More info: http://www.kerneloops.org/searchweek.php?search=VBoxDrvLinuxIOCtl
Rank 2: pci_create_slot (warning)
Reported 603 times (639 total reports)
BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs
This warning was last seen in version 2.6.27.5, and first seen in 2.6.27-rc7-git1.
More info: http://www.kerneloops.org/searchweek.php?search=pci_create_slot
Rank 3: iwl_tx_cmd_complete (warning)
Reported 455 times (693 total reports)
Bug in the IWL wireless driver; partial fix available
This warning was last seen in version 2.6.28-rc4, and first seen in 2.6.27-rc9.
More info: http://www.kerneloops.org/searchweek.php?search=iwl_tx_cmd_complete
Rank 4: suspend_test_finish (warning)
Reported 362 times (1202 total reports)
Fedora is shipping with the suspend test on.. and it's failing everywhere.
The patch to report what fails is in 2.6.28-rc6 and later
This warning was last seen in version 2.6.28-rc1, and first seen in 2.6.27-rc0-git14.
More info: http://www.kerneloops.org/searchweek.php?search=suspend_test_finish
Rank 5: dev_watchdog(r8169) (oops)
Reported 274 times (1414 total reports)
Network driver not handling timeouts itself.
This oops was last seen in version 2.6.28-rc4, and first seen in 2.6.26.6.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(r8169)
Rank 6: rs_get_rate (oops)
Reported 232 times (1152 total reports)
Bug in the Intel IWL wireless drivers
This oops was last seen in version 2.6.27.5, and first seen in 2.6.25-rc2-git5.
More info: http://www.kerneloops.org/searchweek.php?search=rs_get_rate
Rank 7: journal_update_superblock (warning)
Reported 231 times (6506 total reports)
Likely caused by the user removing a USB stick while mounted
This warning was last seen in version 2.6.27.7, and first seen in 2.6.24-rc6-git1.
More info: http://www.kerneloops.org/searchweek.php?search=journal_update_superblock
Rank 8: mtrr_trim_uncached_memory (warning)
Reported 227 times (619 total reports)
There is a high number of machines where our MTRR checks trigger. I suspect we are too
picky in accepting the MTRR configuration.
This warning was last seen in version 2.6.27.5, and first seen in 2.6.24.
More info: http://www.kerneloops.org/searchweek.php?search=mtrr_trim_uncached_memory
Rank 9: __atapi_pio_bytes (warning)
Reported 146 times (224 total reports)
Alan said this was due to some other layer giving the libata drivers a weird
scatter gather list. It just happens a lot, and somehow it mostly happens in
virtualized environments
This warning was last seen in version 2.6.27.5, and first seen in 2.6.27.4.
More info: http://www.kerneloops.org/searchweek.php?search=__atapi_pio_bytes
Rank 10: dev_watchdog(sis900) (oops)
Reported 137 times (1538 total reports)
This oops was last seen in version 2.6.27.6, and first seen in 2.6.26-rc4-git2.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(sis900)
Rank 11: tcp_recvmsg (warning)
Reported 71 times (167 total reports)
This warning was last seen in version 2.6.27.5, and first seen in 2.6.25.
More info: http://www.kerneloops.org/searchweek.php?search=tcp_recvmsg
Rank 12: dev_watchdog(atl1) (oops)
Reported 56 times (109 total reports)
This oops was last seen in version 2.6.27.5, and first seen in 2.6.26.6.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(atl1)
Rank 13: nv_set_page_attrib_cached (warning)
Reported 56 times (65 total reports)
[external] bug in the binary nvidia driver
warning only shows up in tainted kernels
This warning was last seen in version 2.6.27.5, and first seen in 2.6.27.5.
More info: http://www.kerneloops.org/searchweek.php?search=nv_set_page_attrib_cached
On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote:
> Rank 2: pci_create_slot (warning)
> Reported 603 times (639 total reports)
> BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs
> This warning was last seen in version 2.6.27.5, and first seen in
> 2.6.27-rc7-git1. More info:
> http://www.kerneloops.org/searchweek.php?search=pci_create_slot
IIRC we fixed this one post-2.6.27. I didn't send the patches back to -stable
because they were a bit big, but if someone were sufficiently motiviated I'm
sure the backport wouldn't be that hard...
Jesse
* Jesse Barnes <[email protected]> wrote:
> On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote:
> > Rank 2: pci_create_slot (warning)
> > Reported 603 times (639 total reports)
> > BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs
> > This warning was last seen in version 2.6.27.5, and first seen in
> > 2.6.27-rc7-git1. More info:
> > http://www.kerneloops.org/searchweek.php?search=pci_create_slot
>
> IIRC we fixed this one post-2.6.27. I didn't send the patches back
> to -stable because they were a bit big, but if someone were
> sufficiently motiviated I'm sure the backport wouldn't be that
> hard...
having the commit IDs mentioned here would be nice, should anyone feel
motivated.
Ingo
* Arjan van de Ven <[email protected]> wrote:
> Rank 8: mtrr_trim_uncached_memory (warning)
> Reported 227 times (619 total reports)
> There is a high number of machines where our MTRR checks
> trigger. I suspect we are too picky in accepting the MTRR
> configuration.
the warning here means: "the BIOS messed up but we fixed it up for
you just fine".
Should we print a DMI descriptor so that it can be tracked back to the
bad BIOSen in question? Or should we (partially) silence the warning
itself? Those BIOS bugs need fixing really: older kernels will boot up
with bad MTRR settings - resulting in a super-slow system or other
weirdnesses. We can tone down the message so that it doesnt show up in
kerneloops.org. It's up to you.
Ingo
On Thursday, November 27, 2008 3:52 am Ingo Molnar wrote:
> * Arjan van de Ven <[email protected]> wrote:
> > Rank 8: mtrr_trim_uncached_memory (warning)
> > Reported 227 times (619 total reports)
> > There is a high number of machines where our MTRR checks
> > trigger. I suspect we are too picky in accepting the MTRR
> > configuration.
>
> the warning here means: "the BIOS messed up but we fixed it up for
> you just fine".
>
> Should we print a DMI descriptor so that it can be tracked back to the
> bad BIOSen in question? Or should we (partially) silence the warning
> itself? Those BIOS bugs need fixing really: older kernels will boot up
> with bad MTRR settings - resulting in a super-slow system or other
> weirdnesses. We can tone down the message so that it doesnt show up in
> kerneloops.org. It's up to you.
I actually think we're doing something wrong here, since so many platforms
have this behavior. It's likely that there's an undocumented, additional
check needed to determine whether a slot is hot pluggable. Matthew Garrett
recently posted a patch to check for ACPI _RMV methods, which should be an
improvement. I'll be putting that into linux-next soon for testing.
--
Jesse Barnes, Intel Open Source Technology Center
Ingo Molnar wrote:
> * Arjan van de Ven <[email protected]> wrote:
>
>> Rank 8: mtrr_trim_uncached_memory (warning)
>> Reported 227 times (619 total reports)
>> There is a high number of machines where our MTRR checks
>> trigger. I suspect we are too picky in accepting the MTRR
>> configuration.
>
> the warning here means: "the BIOS messed up but we fixed it up for
> you just fine".
I don't believe that right now.
we see so many of these, including many "there's no MTRRs at all",
that I am seriously suspecting that our code is just incorrect somehow
and triggering too much.
* Jesse Barnes <[email protected]>:
> On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote:
> > Rank 2: pci_create_slot (warning)
> > Reported 603 times (639 total reports)
> > BIOS provided duplicated slot names, the PCI layer blindly passes to sysfs
> > This warning was last seen in version 2.6.27.5, and first seen in
> > 2.6.27-rc7-git1. More info:
> > http://www.kerneloops.org/searchweek.php?search=pci_create_slot
>
> IIRC we fixed this one post-2.6.27. I didn't send the patches back to -stable
> because they were a bit big, but if someone were sufficiently motiviated I'm
> sure the backport wouldn't be that hard...
I can do this backport. A few questions though...
We're seeing a proliferation of this one presumably because
Fedora10 uses 2.6.27.5 as a starting point? If I just backport
the fixes against Greg's latest tree, do I have to do anything
special to make sure they get into the Fedora kernel?
Also, does kerneloops capture any of the machine information,
like DMI output, etc. or does it just get the oops? It would be
nice to see which machines out there have the broken BIOS that
causes this oops.
Thanks.
/ac
On Thu, 27 Nov 2008 12:42:10 -0700
Alex Chiang <[email protected]> wrote:
> * Jesse Barnes <[email protected]>:
> > On Wednesday, November 26, 2008 3:11 pm Arjan van de Ven wrote:
> > > Rank 2: pci_create_slot (warning)
> > > Reported 603 times (639 total reports)
> > > BIOS provided duplicated slot names, the PCI layer
> > > blindly passes to sysfs This warning was last seen in version
> > > 2.6.27.5, and first seen in 2.6.27-rc7-git1. More info:
> > > http://www.kerneloops.org/searchweek.php?search=pci_create_slot
> >
> > IIRC we fixed this one post-2.6.27. I didn't send the patches back
> > to -stable because they were a bit big, but if someone were
> > sufficiently motiviated I'm sure the backport wouldn't be that
> > hard...
>
> I can do this backport. A few questions though...
>
> We're seeing a proliferation of this one presumably because
> Fedora10 uses 2.6.27.5 as a starting point? If I just backport
> the fixes against Greg's latest tree, do I have to do anything
> special to make sure they get into the Fedora kernel?
Fedora tends to follow -stable quite closely so that ought to be enough
>
> Also, does kerneloops capture any of the machine information,
> like DMI output, etc. or does it just get the oops? It would be
> nice to see which machines out there have the broken BIOS that
> causes this oops.
right now we do this for oopses, but not for warnings ;(
I'll make a patch to add this; it's generally useful.
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
* Arjan van de Ven <[email protected]> wrote:
> Ingo Molnar wrote:
>> * Arjan van de Ven <[email protected]> wrote:
>>
>>> Rank 8: mtrr_trim_uncached_memory (warning)
>>> Reported 227 times (619 total reports)
>>> There is a high number of machines where our MTRR checks trigger. I
>>> suspect we are too picky in accepting the MTRR configuration.
>>
>> the warning here means: "the BIOS messed up but we fixed it up for you
>> just fine".
>
> I don't believe that right now. we see so many of these, including
> many "there's no MTRRs at all", that I am seriously suspecting that
> our code is just incorrect somehow and triggering too much.
well we looked at existing reports and Linux was right to fix them up.
Show us one that is incorrect, then we can fix it up.
the "no MTRR's" are vmware/(also qemu?) guests not implementing a full
CPU emulation.
Ingo
On Thu, 27 Nov 2008 21:18:36 +0100
Ingo Molnar <[email protected]> wrote:
>
> * Arjan van de Ven <[email protected]> wrote:
>
> > Ingo Molnar wrote:
> >> * Arjan van de Ven <[email protected]> wrote:
> >>
> >>> Rank 8: mtrr_trim_uncached_memory (warning)
> >>> Reported 227 times (619 total reports)
> >>> There is a high number of machines where our MTRR checks
> >>> trigger. I suspect we are too picky in accepting the MTRR
> >>> configuration.
> >>
> >> the warning here means: "the BIOS messed up but we fixed it up for
> >> you just fine".
> >
> > I don't believe that right now. we see so many of these, including
> > many "there's no MTRRs at all", that I am seriously suspecting that
> > our code is just incorrect somehow and triggering too much.
>
> well we looked at existing reports and Linux was right to fix them
> up. Show us one that is incorrect, then we can fix it up.
>
> the "no MTRR's" are vmware/(also qemu?) guests not implementing a
> full CPU emulation.
... and it's still our fault in part, since we don't even check to see
if a cpu claims to support MTRR before complaining about it...
easy to fix though:
>From 7e987ae541c41ce908b414fee9d8e2fd2099a083 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <[email protected]>
Date: Thu, 27 Nov 2008 12:25:47 -0800
Subject: [PATCH] x86: make sure the CPU advertizes MTRR support before complaining about the lack thereoff...
We complain loudly if a CPU does not have MTRR support... but we don't check if the CPU
exposes MTRR support in the CPUID flags first. While this might not fix all of the
broken virtualization systems out there, it will at least fix those that properly don't
advertize things they don't support.
Signed-off-by: Arjan van de Ven <[email protected]>
---
arch/x86/kernel/cpu/mtrr/main.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 1159e26..0044e61 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
* Make sure we only trim uncachable memory on machines that
* support the Intel MTRR architecture:
*/
+ if (!cpu_has_mtrr)
+ return 0;
if (!is_cpu(INTEL) || disable_mtrr_trim)
return 0;
rdmsr(MTRRdefType_MSR, def, dummy);
--
1.6.0.4
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
* Arjan van de Ven <[email protected]> wrote:
> On Thu, 27 Nov 2008 21:18:36 +0100
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Arjan van de Ven <[email protected]> wrote:
> >
> > > Ingo Molnar wrote:
> > >> * Arjan van de Ven <[email protected]> wrote:
> > >>
> > >>> Rank 8: mtrr_trim_uncached_memory (warning)
> > >>> Reported 227 times (619 total reports)
> > >>> There is a high number of machines where our MTRR checks
> > >>> trigger. I suspect we are too picky in accepting the MTRR
> > >>> configuration.
> > >>
> > >> the warning here means: "the BIOS messed up but we fixed it up for
> > >> you just fine".
> > >
> > > I don't believe that right now. we see so many of these, including
> > > many "there's no MTRRs at all", that I am seriously suspecting that
> > > our code is just incorrect somehow and triggering too much.
> >
> > well we looked at existing reports and Linux was right to fix them
> > up. Show us one that is incorrect, then we can fix it up.
> >
> > the "no MTRR's" are vmware/(also qemu?) guests not implementing a
> > full CPU emulation.
>
> ... and it's still our fault in part, since we don't even check to
> see if a cpu claims to support MTRR before complaining about it...
>
> easy to fix though:
IIRC the problem is that vmware _does_ claim that it supports MTRRs.
Ingo
On Thu, 27 Nov 2008 21:47:14 +0100
Ingo Molnar <[email protected]> wrote:
> IIRC the problem is that vmware _does_ claim that it supports MTRRs.
it might.
but even if they would fix that, we would still WARN (
at least we should do our side correctly...
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Arjan van de Ven wrote:
> + if (!cpu_has_mtrr)
> + return 0;
> if (!is_cpu(INTEL) || disable_mtrr_trim)
> return 0;
> rdmsr(MTRRdefType_MSR, def, dummy);
cpu_has_mtrr there should presumably replace is_cpu(INTEL). I'm not
sure if this can be replaced by use_intel(); in particular use_intel()
relies on mtrr_if having been initialized.
Looking...
-hpa (out of town for Thanksgiving)
Arjan van de Ven wrote:
> On Thu, 27 Nov 2008 21:18:36 +0100
> Ingo Molnar <[email protected]> wrote:
>
>> * Arjan van de Ven <[email protected]> wrote:
>>
>>> Ingo Molnar wrote:
>>>> * Arjan van de Ven <[email protected]> wrote:
>>>>
>>>>> Rank 8: mtrr_trim_uncached_memory (warning)
>>>>> Reported 227 times (619 total reports)
>>>>> There is a high number of machines where our MTRR checks
>>>>> trigger. I suspect we are too picky in accepting the MTRR
>>>>> configuration.
>>>> the warning here means: "the BIOS messed up but we fixed it up for
>>>> you just fine".
>>> I don't believe that right now. we see so many of these, including
>>> many "there's no MTRRs at all", that I am seriously suspecting that
>>> our code is just incorrect somehow and triggering too much.
>> well we looked at existing reports and Linux was right to fix them
>> up. Show us one that is incorrect, then we can fix it up.
>>
>> the "no MTRR's" are vmware/(also qemu?) guests not implementing a
>> full CPU emulation.
>
> ... and it's still our fault in part, since we don't even check to see
> if a cpu claims to support MTRR before complaining about it...
>
> easy to fix though:
>
> From 7e987ae541c41ce908b414fee9d8e2fd2099a083 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <[email protected]>
> Date: Thu, 27 Nov 2008 12:25:47 -0800
> Subject: [PATCH] x86: make sure the CPU advertizes MTRR support before complaining about the lack thereoff...
>
> We complain loudly if a CPU does not have MTRR support... but we don't check if the CPU
> exposes MTRR support in the CPUID flags first. While this might not fix all of the
> broken virtualization systems out there, it will at least fix those that properly don't
> advertize things they don't support.
>
> Signed-off-by: Arjan van de Ven <[email protected]>
> ---
> arch/x86/kernel/cpu/mtrr/main.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index 1159e26..0044e61 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
> * Make sure we only trim uncachable memory on machines that
> * support the Intel MTRR architecture:
> */
> + if (!cpu_has_mtrr)
> + return 0;
that is not needed, we already check that in mtrr_bp_init before this function is called, and it will assign mtrr_if
and
#define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd)
will make it sure mtrr is there.
ps: here INTEL mean any cpu has same interface like intel cpu's
YH
> if (!is_cpu(INTEL) || disable_mtrr_trim)
> return 0;
> rdmsr(MTRRdefType_MSR, def, dummy);
Arjan van de Ven wrote:
>
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index 1159e26..0044e61 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -1567,6 +1567,8 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
> * Make sure we only trim uncachable memory on machines that
> * support the Intel MTRR architecture:
> */
> + if (!cpu_has_mtrr)
> + return 0;
> if (!is_cpu(INTEL) || disable_mtrr_trim)
> return 0;
> rdmsr(MTRRdefType_MSR, def, dummy);
Okay... is_cpu() here is defined as:
#define is_cpu(vnd) (mtrr_if && mtrr_if->vendor == X86_VENDOR_##vnd)
... so an MTRR interface has been identified. Therefore testing
cpu_has_mtrr is redundant.
As far as use_intel() versus is_cpu(INTEL), it looks to me as though the
two are identical in the current code -- mtrr_if->vendor is never set in
the generic code, and so defaults to 0 - meaning X86_VENDOR_INTEL.
All in all, it looks like the vendor ID stuff is a bad case of "works by
accident" in the MTRR code, however, *given the current code* I conclude
that is_cpu(INTEL) == use_intel() and that neither can be true without
MTRRs enabled.
-hpa
* Arjan van de Ven <[email protected]> wrote:
> On Thu, 27 Nov 2008 21:47:14 +0100
> Ingo Molnar <[email protected]> wrote:
>
> > IIRC the problem is that vmware _does_ claim that it supports MTRRs.
>
> it might.
> but even if they would fix that, we would still WARN (
> at least we should do our side correctly...
As pointed out in other parts of the thread, that is not the case.
Anyway, as i said it in the onset, if you think we should remove the
warning altogether, or tweak it, we can do that - it is important to
have relevant warnings show up in kerneloops.org.
To sum it up: the only remaining MTRR warnings we know of are either:
1) apparently genuine BIOS bugs that do cause problems if the (new)
kernel does not fix them up.
The MTRR warning is relevant and correct in those cases.
or:
2) sucky virtualization solutions that cheat the guest OS by faking
"MTRR support" in the CPUID info, but not actually showing any
MTRRs. These virtualization solutions do not even properly
identify themselves to the kernel.
The MTRR warning is unnecessary in this case.
So what we did in the x86 tree was remove the warning in the second
case - is to properly identify vmware (and in general, virtualization)
guests.
It was not a simple oneliner:
earth4:~/tip> gll linus..x86/detect-hyper
4e42ebd: x86: hypervisor - fix sparse warnings
c450d78: x86: vmware - fix sparse warnings
fd8cd7e: x86: vmware: look for DMI string in the product serial key
6bdbfe9: x86: VMware: Fix vmware_get_tsc code
395628e: x86: Skip verification by the watchdog for TSC clocksource.
eca0cd0: x86: Add a synthetic TSC_RELIABLE feature bit.
88b094f: x86: Hypervisor detection and get tsc_freq from hypervisor
49ab56a: x86: add X86_FEATURE_HYPERVISOR feature bit
b2bcc7b: x86: add a synthetic TSC_RELIABLE feature bit
and it will benefit vmware guests in many more areas than just a
sharper MTRR warning message. That code is queued up for v2.6.29.
Ingo