This week, a total of 3340 oopses and warnings have been reported,
compared to 3022 reports in the previous week.
As with last week, this report only covers kernels 2.6.26 and later.
Per file statistics
722 net/sched/sch_generic.c
206 fs/jbd/journal.c
178 external/utrace
131 drivers/base/power/main.c
104 drivers/ata/libata-sff.c
104 include/linux/pagemap.h
96 kernel/timer.c
62 fs/ext3/super.c
61 external/fireglx/binary (P)
61 drivers/usb/serial/usb-serial.c
61 net/core/dev.c
Unsolved issues
===============
Rank 1: dev_watchdog (warning)
Reported 431 times (1363 total reports)
Network TX watchdog timeouts for unidentified NICs. 2.6.27-rc7+ will identify the NIC.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog
The general top 10 distribution over the identified cases is like this:
count | guilty
-------+---------------------------
390 | dev_watchdog(sis900)
114 | dev_watchdog(usbnet)
102 | dev_watchdog(via-rhine)
99 | dev_watchdog(8390)
47 | dev_watchdog(8139too)
44 | dev_watchdog(orinoco)
40 | dev_watchdog(r8169)
26 | dev_watchdog(3c59x)
17 | dev_watchdog(forcedeth)
15 | dev_watchdog(fealnx)
Rank 3: journal_update_superblock (warning)
Reported 202 times (4509 total reports)
Likely caused by the user removing a USB stick while mounted
Ted has a patch to fix this; queued for 2.6.28
This warning was last seen in version 2.6.27-rc8-git4, and first seen in 2.6.24-rc6-git1.
More info: http://www.kerneloops.org/searchweek.php?search=journal_update_superblock
Rank 4: utrace_control (oops)
Reported 166 times (1001 total reports)
[fedora] Fedora merged a broken utrace patch
This oops was last seen in version 2.6.26.5, and first seen in 2.6.26.1.
More info: http://www.kerneloops.org/searchweek.php?search=utrace_control
Rank 5: dev_watchdog(sis900) (warning)
Reported 135 times (390 total reports)
This warning was last seen in version 2.6.27-rc8-git4, and first seen in 2.6.26-rc4-git2.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(sis900)
Rank 7: lock_page (warning)
Reported 104 times (150 total reports)
The hwclock program disables interrupts from userspace, and then hits a pagefault.
The new diagnostics do a WARN_ON for faults-with-interrupts-off, and trap this.
This warning was last seen in version 2.6.27-rc8-git7, and first seen in 2.6.27-rc1-git2.
More info: http://www.kerneloops.org/searchweek.php?search=lock_page
Rank 8: run_timer_softirq (oops)
Reported 83 times (301 total reports)
softlockup; likely fixed by the timer cleanups done by Thomas
This oops was last seen in version 2.6.27-rc1-git4, and first seen in 2.6.25.
More info: http://www.kerneloops.org/searchweek.php?search=run_timer_softirq
Rank 9: device_pm_add (warning)
Reported 81 times (323 total reports)
Drivers with suspect suspend/resume logic; a patch is queued for 2.6.28 to identify
which drivers are involved.
This warning was last seen in version 2.6.27-rc4, and first seen in 2.6.26-rc5.
More info: http://www.kerneloops.org/searchweek.php?search=device_pm_add
Rank 10: ext3_commit_super (warning)
Reported 58 times (1164 total reports)
Likely caused by the user removing a USB stick while mounted
Ted has a fix for this.
This warning was last seen in version 2.6.27, and first seen in 2.6.24.
More info: http://www.kerneloops.org/searchweek.php?search=ext3_commit_super
Rank 11: device_suspend (warning)
Reported 50 times (139 total reports)
Drivers with suspect suspend/resume logic; a patch is queued for 2.6.28 to identify
which drivers are involved.
This warning was last seen in version 2.6.26.5, and first seen in 2.6.26.
More info: http://www.kerneloops.org/searchweek.php?search=device_suspend
Rank 12: suspend_test_finish (warning)
Reported 49 times (103 total reports)
This warning was last seen in version 2.6.27, and first seen in 2.6.27-rc0-git14.
More info: http://www.kerneloops.org/searchweek.php?search=suspend_test_finish
Rank 13: rs_get_rate (warning)
Reported 43 times (423 total reports)
Bug in the Intel IWL wireless drivers
This warning was last seen in version 2.6.27-rc8-git1, and first seen in 2.6.25-rc2-git5.
More info: http://www.kerneloops.org/searchweek.php?search=rs_get_rate
Rank 14: dev_watchdog(usbnet) (warning)
Reported 41 times (114 total reports)
This warning was last seen in version 2.6.27-rc5-git9, and first seen in 2.6.26.
More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(usbnet)
Fixed issues
============
Rank 6: ata_sff_hsm_move (oops)
Reported 104 times (1156 total reports)
[fixed] redundant WARN_ON; fixed in 9c2676b61a5a4b6d99e65fb2f438fb3914302eda
This oops was last seen in version 2.6.27-rc2-git1, and first seen in 2.6.25.4.
More info: http://www.kerneloops.org/searchweek.php?search=ata_sff_hsm_move
On Tuesday, 7 of October 2008, Arjan van de Ven wrote:
>
> This week, a total of 3340 oopses and warnings have been reported,
> compared to 3022 reports in the previous week.
> As with last week, this report only covers kernels 2.6.26 and later.
>
>
>
> Per file statistics
> 722 net/sched/sch_generic.c
> 206 fs/jbd/journal.c
> 178 external/utrace
> 131 drivers/base/power/main.c
> 104 drivers/ata/libata-sff.c
> 104 include/linux/pagemap.h
> 96 kernel/timer.c
> 62 fs/ext3/super.c
> 61 external/fireglx/binary (P)
> 61 drivers/usb/serial/usb-serial.c
> 61 net/core/dev.c
>
>
> Unsolved issues
> ===============
>
> Rank 1: dev_watchdog (warning)
> Reported 431 times (1363 total reports)
> Network TX watchdog timeouts for unidentified NICs. 2.6.27-rc7+ will identify the NIC.
> More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog
> The general top 10 distribution over the identified cases is like this:
>
> count | guilty
> -------+---------------------------
> 390 | dev_watchdog(sis900)
> 114 | dev_watchdog(usbnet)
> 102 | dev_watchdog(via-rhine)
> 99 | dev_watchdog(8390)
> 47 | dev_watchdog(8139too)
> 44 | dev_watchdog(orinoco)
> 40 | dev_watchdog(r8169)
> 26 | dev_watchdog(3c59x)
> 17 | dev_watchdog(forcedeth)
> 15 | dev_watchdog(fealnx)
>
>
> Rank 3: journal_update_superblock (warning)
> Reported 202 times (4509 total reports)
> Likely caused by the user removing a USB stick while mounted
> Ted has a patch to fix this; queued for 2.6.28
> This warning was last seen in version 2.6.27-rc8-git4, and first seen in 2.6.24-rc6-git1.
> More info: http://www.kerneloops.org/searchweek.php?search=journal_update_superblock
>
> Rank 4: utrace_control (oops)
> Reported 166 times (1001 total reports)
> [fedora] Fedora merged a broken utrace patch
> This oops was last seen in version 2.6.26.5, and first seen in 2.6.26.1.
> More info: http://www.kerneloops.org/searchweek.php?search=utrace_control
>
> Rank 5: dev_watchdog(sis900) (warning)
> Reported 135 times (390 total reports)
> This warning was last seen in version 2.6.27-rc8-git4, and first seen in 2.6.26-rc4-git2.
> More info: http://www.kerneloops.org/searchweek.php?search=dev_watchdog(sis900)
>
> Rank 7: lock_page (warning)
> Reported 104 times (150 total reports)
> The hwclock program disables interrupts from userspace, and then hits a pagefault.
> The new diagnostics do a WARN_ON for faults-with-interrupts-off, and trap this.
> This warning was last seen in version 2.6.27-rc8-git7, and first seen in 2.6.27-rc1-git2.
> More info: http://www.kerneloops.org/searchweek.php?search=lock_page
>
> Rank 8: run_timer_softirq (oops)
> Reported 83 times (301 total reports)
> softlockup; likely fixed by the timer cleanups done by Thomas
> This oops was last seen in version 2.6.27-rc1-git4, and first seen in 2.6.25.
> More info: http://www.kerneloops.org/searchweek.php?search=run_timer_softirq
>
> Rank 9: device_pm_add (warning)
> Reported 81 times (323 total reports)
> Drivers with suspect suspend/resume logic; a patch is queued for 2.6.28 to identify
> which drivers are involved.
> This warning was last seen in version 2.6.27-rc4, and first seen in 2.6.26-rc5.
> More info: http://www.kerneloops.org/searchweek.php?search=device_pm_add
This should have been fixed by:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26
Thanks,
Rafael
* Rafael J. Wysocki <[email protected]> wrote:
> > Rank 9: device_pm_add (warning)
> > Reported 81 times (323 total reports)
> > Drivers with suspect suspend/resume logic; a patch is queued for 2.6.28 to identify
> > which drivers are involved.
> > This warning was last seen in version 2.6.27-rc4, and first seen in 2.6.26-rc5.
> > More info: http://www.kerneloops.org/searchweek.php?search=device_pm_add
>
> This should have been fixed by:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26
hm, that is:
| From f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26 Mon Sep 17 00:00:00 2001
| From: Rafael J. Wysocki <[email protected]>
| Date: Sat, 9 Aug 2008 01:05:13 +0200
| Subject: [PATCH] PM: Remove WARN_ON from device_pm_add
|
| PM: Remove WARN_ON from device_pm_add
|
| Fix message in device_pm_add() saying that the device will not be
| added to dpm_list, although in fact the device is going to be added
| to the list regardless of the ordering violation.
|
| Remove the WARN_ON(true) triggered in that situation, because it is
| hit by USB very often and spams the users' logs.
|
| This patch fixes bug #11263
+ if (dev->parent->power.status >= DPM_SUSPENDING)
+ dev_warn(dev, "parent %s should not be sleeping\n",
dev->parent->bus_id);
- WARN_ON(true);
- }
i.e. no bug was fixed in reality - we still emit a kernel log entry, but
the WARN_ON() was removed, so that it does not fall under the scope of
kerneloops.org, right?
Ingo
On Tuesday, 7 of October 2008, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > > Rank 9: device_pm_add (warning)
> > > Reported 81 times (323 total reports)
> > > Drivers with suspect suspend/resume logic; a patch is queued for 2.6.28 to identify
> > > which drivers are involved.
> > > This warning was last seen in version 2.6.27-rc4, and first seen in 2.6.26-rc5.
> > > More info: http://www.kerneloops.org/searchweek.php?search=device_pm_add
> >
> > This should have been fixed by:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26
>
> hm, that is:
>
> | From f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26 Mon Sep 17 00:00:00 2001
> | From: Rafael J. Wysocki <[email protected]>
> | Date: Sat, 9 Aug 2008 01:05:13 +0200
> | Subject: [PATCH] PM: Remove WARN_ON from device_pm_add
> |
> | PM: Remove WARN_ON from device_pm_add
> |
> | Fix message in device_pm_add() saying that the device will not be
> | added to dpm_list, although in fact the device is going to be added
> | to the list regardless of the ordering violation.
> |
> | Remove the WARN_ON(true) triggered in that situation, because it is
> | hit by USB very often and spams the users' logs.
> |
> | This patch fixes bug #11263
>
> + if (dev->parent->power.status >= DPM_SUSPENDING)
> + dev_warn(dev, "parent %s should not be sleeping\n",
> dev->parent->bus_id);
> - WARN_ON(true);
> - }
>
>
> i.e. no bug was fixed in reality - we still emit a kernel log entry, but
> the WARN_ON() was removed, so that it does not fall under the scope of
> kerneloops.org, right?
Sort of. In fact, the WARN_ON() was added prematurely and caused lots of
unnecessary reports to be generated.
Thanks,
Rafael
> This should have been fixed by:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26
You mean - hidden by - that change should IMHO be reverted
From: Alan Cox <[email protected]>
Date: Tue, 7 Oct 2008 23:58:33 +0100
> > This should have been fixed by:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26
>
> You mean - hidden by - that change should IMHO be reverted
Grrr... I totally agree.
On Wednesday, 8 of October 2008, David Miller wrote:
> From: Alan Cox <[email protected]>
> Date: Tue, 7 Oct 2008 23:58:33 +0100
>
> > > This should have been fixed by:
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26
> >
> > You mean - hidden by - that change should IMHO be reverted
>
> Grrr... I totally agree.
Even though they are false positives in many cases?
(Yes, they are).
* Rafael J. Wysocki <[email protected]> wrote:
> On Wednesday, 8 of October 2008, David Miller wrote:
> > From: Alan Cox <[email protected]>
> > Date: Tue, 7 Oct 2008 23:58:33 +0100
> >
> > > > This should have been fixed by:
> > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26
> > >
> > > You mean - hidden by - that change should IMHO be reverted
> >
> > Grrr... I totally agree.
>
> Even though they are false positives in many cases?
>
> (Yes, they are).
it's your judgement call i guess - i just wanted to challenge the
terminology of calling it a fix ;-)
Can you think of any good way of reintroducing the good aspect of the
WARN(), without the false positives? [ If not, and if the WARN()s do
more harm than good then it's the right change. ]
But generally the trend is the opposite direction: we try to add as many
WARN()s as possible, and we need even more good WARN()s in the kernel.
When we meet false positives we try to improve the quality/yield of the
warning, not eliminate it.
Granted, they are a bit embarrassing when they show up in the top 10 but
they are also very helpful and are tracked very nicely and lead to
actual bugfixes that _matter_.
Kerneloops.org is a wonderful tool: it enables a broad spectrum of users
to help us out with their feedback, without them being forced into any
manual work, and without them having to understand the kernel bug
reporting workflow. Its scale is already enormous: 3000+ bugs reported
per week. (many kudos Arjan!)
Once its growth stabilizes (all the large distros have it either enabled
today or have the client in their pipeline) we can even observe
long-term trends and estimate release-to-release suckiness. That was
impossible before and maintainers were left largely to imprecise
intuitive guesses about where we stand wrt. kernel quality.
Ingo
On Wednesday, 8 of October 2008, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > On Wednesday, 8 of October 2008, David Miller wrote:
> > > From: Alan Cox <[email protected]>
> > > Date: Tue, 7 Oct 2008 23:58:33 +0100
> > >
> > > > > This should have been fixed by:
> > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f5a6d958b5d0a10e7e7a9dee1862fb31d08c6d26
> > > >
> > > > You mean - hidden by - that change should IMHO be reverted
> > >
> > > Grrr... I totally agree.
> >
> > Even though they are false positives in many cases?
> >
> > (Yes, they are).
>
> it's your judgement call i guess - i just wanted to challenge the
> terminology of calling it a fix ;-)
Well, since I added the WARN_ON() and then reallized it did more harm than
good, I consider it as a fix. :-)
> Can you think of any good way of reintroducing the good aspect of the
> WARN(), without the false positives? [ If not, and if the WARN()s do
> more harm than good then it's the right change. ]
Unfortunately not. In particular, the MMC subsystem is known to trigger this
WARN_ON() which in turn is not related to any functional problem (it's a design
issue in this case) and it's not going to be fixed any time soon IMO.
USB used to trigger it too, but that has been fixed already.
> But generally the trend is the opposite direction: we try to add as many
> WARN()s as possible, and we need even more good WARN()s in the kernel.
> When we meet false positives we try to improve the quality/yield of the
> warning, not eliminate it.
>
> Granted, they are a bit embarrassing when they show up in the top 10 but
> they are also very helpful and are tracked very nicely and lead to
> actual bugfixes that _matter_.
>
> Kerneloops.org is a wonderful tool: it enables a broad spectrum of users
> to help us out with their feedback, without them being forced into any
> manual work, and without them having to understand the kernel bug
> reporting workflow. Its scale is already enormous: 3000+ bugs reported
> per week. (many kudos Arjan!)
>
> Once its growth stabilizes (all the large distros have it either enabled
> today or have the client in their pipeline) we can even observe
> long-term trends and estimate release-to-release suckiness. That was
> impossible before and maintainers were left largely to imprecise
> intuitive guesses about where we stand wrt. kernel quality.
While I agree with all that, I also think that it's not really reasonable to
use WARNs causing people to waste their time for reporting issues that actually
don't need to be reported.
IMO it will make sense to add this WARN_ON() back again in future, but not at
the moment.
Thanks,
Rafael