2008-01-12 18:51:20

by Arjan van de Ven

[permalink] [raw]
Subject: Top 10 kernel oopses for the week ending January 12th, 2008

The http://www.kerneloops.org website collects kernel oops and
warning reports from various mailing lists and bugzillas as well as
with a client users can install to auto-submit oopses.
Below is a top 10 list of the oopses collected in the last 7 days.
(Reports prior to 2.6.23 have been omitted in collecting the top 10)

This week, a total of 136 oopses and warnings have been reported,
compared to 46 reports in the previous 7 days.

kerneloops.org news:
* Based on feedback from last weeks report, the website now tries
to also present a disassembled Code: line
* the kerneloops collection client is now part of Fedora (rawhide)
(yum install kerneloops)
* the kerneloops collection client is now included in Debian testing
(apt-get install kerneloops)
* gentoo has received an updated version of the client


Rank 1: implement (hid code)
WARN_ON at drivers/hid/hid-core.c:784
Reported 23 times (39 total reports)
This appears to be the kernel doing a WARN_ON based on unexpected ioctl() arguments
More info: http://www.kerneloops.org/search.php?search=implement

Rank 2: __ieee80211_rx
WARN_ON at net/mac80211/rx.c:1663
Reported 14 times (25 total reports)
This is the recurring problem from the last few weeks
The iwl3945 driver is heavily implicated for this one
More info: http://www.kerneloops.org/search.php?search=__ieee80211_rx

Rank 3: uart_flush_buffer (caused by bluetooth)
WARN_ON drivers/serial/serial_core.c:544
Same issue as a few weeks ago; a fix was posted on LKML already
Caused by the bluetooth tty layer double closing/freeing the tty
Reported 8 times (24 total reports)
More info: http://www.kerneloops.org/search.php?search=uart_flush_buffer

Rank 4: i2c_transfer (by the cx8802/cx22702 driver)
kernel NULL pointer
Reported 6 times
Only reported on 2.6.23.9 so far
DVB related
More info: http://www.kerneloops.org/search.php?search=i2c_transfer

Rank 5: __ieee80211_rx_handle_packet (iwl3945)
WARNING at net/mac80211/rx.c:1693
Reported 5 times
Only seen with the iwl3945 driver
Not the same warning as #2, but the cause may be related
More info: http://www.kerneloops.org/search.php?search=__ieee80211_rx_handle_packet

Rank 6: NdisDispatchPnp
Kernel page fault
Reported 5 times
Tainted and in the external ndiswrapper binary driver loader
More info: http://www.kerneloops.org/search.php?search=NdisDispatchPnp

Rank 7: __lock_acquire
Kernel page fault
Reported 4 times (14 total reports)
Reported in 2.5.24-rc5 and rc7 but previously in 2.6.18 timeframe
Appears to be EXT3 related this time around
More info: http://bugzilla.kernel.org/show_bug.cgi?id=9718
More info: http://www.kerneloops.org/search.php?search=__lock_acquire

Rank 8: evdev_disconnect
Same issue as last week
Reported 4 times (14 total reports)
Only seen upto 2.6.23
Al Viro diagnosed that this got fixed in 2.6.24-rc but the patch wasn't put in 2.6.23-stable
More info: http://www.kerneloops.org/search.php?search=evdev_disconnect

Rank 9: hfsplus_releasepage
Kernel null pointer deref
Reported 3 times
Only reported for 2.6.24-rc7
More info: http://www.kerneloops.org/search.php?search=hfsplus_releasepage

Rank 10: cfq_remove_request
Kernel NULL pointer
Reported 3 times
Only reported for 2.6.23
Reported both from interrupt and process context
More info: http://www.kerneloops.org/search.php?search=cfq_remove_request


2008-01-12 22:24:25

by Adrian Bunk

[permalink] [raw]
Subject: Re: Top 10 kernel oopses for the week ending January 12th, 2008

On Sat, Jan 12, 2008 at 10:48:05AM -0800, Arjan van de Ven wrote:
> The http://www.kerneloops.org website collects kernel oops and
> warning reports from various mailing lists and bugzillas as well as
> with a client users can install to auto-submit oopses.
> Below is a top 10 list of the oopses collected in the last 7 days.
> (Reports prior to 2.6.23 have been omitted in collecting the top 10)
>
> This week, a total of 136 oopses and warnings have been reported,
> compared to 46 reports in the previous 7 days.
>
> kerneloops.org news:
> * Based on feedback from last weeks report, the website now tries
> to also present a disassembled Code: line
> * the kerneloops collection client is now part of Fedora (rawhide)
> (yum install kerneloops)
> * the kerneloops collection client is now included in Debian testing
> (apt-get install kerneloops)
> * gentoo has received an updated version of the client
>
>
> Rank 1: implement (hid code)
> WARN_ON at drivers/hid/hid-core.c:784
> Reported 23 times (39 total reports)
> This appears to be the kernel doing a WARN_ON based on unexpected ioctl() arguments
> More info: http://www.kerneloops.org/search.php?search=implement
>...

The only complete bug reports seems to be from one user who loaded a
module whose distribution might be considered a criminal act in some
countries.

All the other reports only contain the plain trace. Is there any way to
get more information whether the former is a pattern or not, and to
get this information somehow displayed on the webpage?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-01-12 23:15:44

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Top 10 kernel oopses for the week ending January 12th, 2008

Adrian Bunk wrote:
>
> All the other reports only contain the plain trace. Is there any way to
> get more information whether the former is a pattern or not, and to
> get this information somehow displayed on the webpage?

IF the kernel prints that its tainted or whatever it'll be shown, as well
as the exact versions etc etc if they are there.
Sadly none of this information is there prior to 2.6.24-rc4.
(I wonder if the patch to print this should be put in -stable ;-)

2008-01-12 23:34:56

by Adrian Bunk

[permalink] [raw]
Subject: Re: Top 10 kernel oopses for the week ending January 12th, 2008

On Sat, Jan 12, 2008 at 03:13:29PM -0800, Arjan van de Ven wrote:
> Adrian Bunk wrote:
>>
>> All the other reports only contain the plain trace. Is there any way to
>> get more information whether the former is a pattern or not, and to
>> get this information somehow displayed on the webpage?
>
> IF the kernel prints that its tainted or whatever it'll be shown, as well
> as the exact versions etc etc if they are there.
> Sadly none of this information is there prior to 2.6.24-rc4.
>...

OK, the problem might actually not be the omission of displaying the
tainted information but the omission of considering any relevant
context.

Looking deeper:

Number #2424 is WARN_ON-after-tainted-oops.

Is your rank 1 just a symptom that the system is in a bad state after
running in what is your rank 8?

In this case the information when following e.g. #2827 is quite useless
since wherever you got this trace from all related context information
like e.g. whether it's like #2424 just the symptom of a previous Oops is
not displayed.

In the worst case, an entry might only contain WARN_ON traces without
any information where the traces came from and whether it's worth
looking at them or whether the system always already was in a known-bad
state when they occured?

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2008-01-12 23:46:22

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Top 10 kernel oopses for the week ending January 12th, 2008

Adrian Bunk wrote:
> On Sat, Jan 12, 2008 at 03:13:29PM -0800, Arjan van de Ven wrote:
>> Adrian Bunk wrote:
>>> All the other reports only contain the plain trace. Is there any way to
>>> get more information whether the former is a pattern or not, and to
>>> get this information somehow displayed on the webpage?
>> IF the kernel prints that its tainted or whatever it'll be shown, as well
>> as the exact versions etc etc if they are there.
>> Sadly none of this information is there prior to 2.6.24-rc4.
>> ...
>
> OK, the problem might actually not be the omission of displaying the
> tainted information but the omission of considering any relevant
> context.
>
> Looking deeper:
>
> Number #2424 is WARN_ON-after-tainted-oops.
>
> Is your rank 1 just a symptom that the system is in a bad state after
> running in what is your rank 8?
>
> In this case the information when following e.g. #2827 is quite useless
> since wherever you got this trace from all related context information
> like e.g. whether it's like #2424 just the symptom of a previous Oops is
> not displayed.

the tainted flags have a flag for "there was a previous oops", and if that's set,
the kerneloops.org website ignores the report. Simple as that.

> In the worst case, an entry might only contain WARN_ON traces without
> any information where the traces came from and whether it's worth
> looking at them or whether the system always already was in a known-bad
> state when they occured?

again as of 2.6.24-rc4 or so, this is just no longer the case. The problem is with
older kernels which had a WARN_ON() that didn't print ANY information other than
a plain backtrace.

2008-01-15 10:36:43

by Jiri Kosina

[permalink] [raw]
Subject: Re: Top 10 kernel oopses for the week ending January 12th, 2008

On Sun, 13 Jan 2008, Adrian Bunk wrote:

> > Rank 1: implement (hid code)
> > WARN_ON at drivers/hid/hid-core.c:784
> > Reported 23 times (39 total reports)
> > This appears to be the kernel doing a WARN_ON based on unexpected ioctl() arguments
> > More info: http://www.kerneloops.org/search.php?search=implement
> >...
> The only complete bug reports seems to be from one user who loaded a
> module whose distribution might be considered a criminal act in some
> countries.

The most usual case is hid2hci utility, that tries to switch modes on
Bluetooth HID peripherials (via hiddev). When these "switching" packets
are not compliant with HID report descriptor of the device, this WARN_ON()
happens.

Unfortunately there is no standard specifying how these packets should
look like, so it is guess-game and some rev-eng, that Marcel has put into
hid2hci for individual vendors, and sometimes this just happens not to
work.

I have had the commit below queued in my tree for 2.6.25 for quite some
time

commit dbacd67dc33f7b0d5fe64323668cf266d18f4b3f
Author: Jiri Kosina <[email protected]>
Date: Fri Nov 30 11:12:58 2007 +0100

HID: remove redundant WARN_ON()s in order not to scare users

The WARN_ON() in implement() and extract() spit out stacktraces and
a lot of other information that might make users think that there is
something seriously wrong with the system. WARN_ON() should not be
deliberately triggerable by userspace application, which these can be.
Usually this WARN_ON() triggers when hid2hci utility is sending the
data that don't correspond to the device's report descriptor.

Convert these messages to more friendly printk().

--
Jiri Kosina