2013-09-17 13:51:54

by Fengguang Wu

[permalink] [raw]
Subject: [pps_ktimer_event] WARNING: CPU: 0 PID: 1 at lib/list_debug.c:33 __list_add()

Greetings,

I got the below dmesg and the first bad commit is

commit c817a67ecba7c3c2aaa104796d78f160af60920d
Author: Russell King <[email protected]>
Date: Thu Jun 27 15:06:14 2013 +0100

kobject: delayed kobject release: help find buggy drivers

Implement debugging for kobject release functions. kobjects are
reference counted, so the drop of the last reference to them is not
predictable. However, the common case is for the last reference to be
the kobject's removal from a subsystem, which results in the release
function being immediately called.

This can hide subtle bugs, which can occur when another thread holds a
reference to the kobject at the same time that a kobject is removed.
This results in the release method being delayed.

In order to make these kinds of problems more visible, the following
patch implements a delayed release; this has the effect that the
release function will be out of order with respect to the removal of
the kobject in the same manner that it would be if a reference was
being held.

This provides us with an easy way to allow driver writers to debug
their drivers and fix otherwise hidden problems.

Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ 85.364667] libceph: loaded (mon/osd proto 15/24)
[ 85.381535] ------------[ cut here ]------------
[ 85.383996] WARNING: CPU: 0 PID: 1 at lib/list_debug.c:33 __list_add+0x130/0x150()
[ 85.383996] list_add corruption. prev->next should be next (c304ac60), but was (null). (prev=cf835894).
[ 85.383996] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-09992-g5a7d8a2 #8
[ 85.383996] c2a1fb20 c0035c78 c246c8a3 c0035ca8 c10cb4e1 c2a1fb4c c0035cd4 00000001
[ 85.383996] c2a1fb20 00000021 c13c9310 c13c9310 00000000 ffff047f ffff05ab c0035cc0
[ 85.383996] c10cb5da 00000009 c0035cb8 c2a1fb4c c0035cd4 c0035cf0 c13c9310 c2a1fb20
[ 85.383996] Call Trace:
[ 85.383996] [<c246c8a3>] dump_stack+0x40/0x5e
[ 85.383996] [<c10cb4e1>] warn_slowpath_common+0xb1/0x100
[ 85.383996] [<c13c9310>] ? __list_add+0x130/0x150
[ 85.383996] [<c13c9310>] ? __list_add+0x130/0x150
[ 85.383996] [<c10cb5da>] warn_slowpath_fmt+0x4a/0x60
[ 85.383996] [<c13c9310>] __list_add+0x130/0x150
[ 85.383996] [<c10d9b5d>] __internal_add_timer+0x7d/0x110
[ 85.383996] [<c10d9ccc>] internal_add_timer+0x1c/0x70
[ 85.383996] [<c10da937>] mod_timer+0x127/0x220
[ 85.383996] [<c1c435f0>] ? pps_kc_event+0xb0/0xb0
[ 85.383996] [<c1c4365a>] pps_ktimer_event+0x6a/0x80
[ 85.383996] [<c10d9df5>] call_timer_fn+0xa5/0x170
[ 85.383996] [<c10d9d50>] ? process_timeout+0x30/0x30
[ 85.383996] [<c10da060>] run_timer_softirq+0x1a0/0x2f0
[ 85.383996] [<c1c435f0>] ? pps_kc_event+0xb0/0xb0
[ 85.383996] [<c10d272f>] __do_softirq+0x11f/0x2b0
[ 85.383996] [<c110fe03>] ? sched_clock_cpu+0x123/0x200
[ 85.383996] [<c10d2bb3>] irq_exit+0xe3/0x100
[ 85.383996] [<c106dc57>] do_IRQ+0x67/0x120
[ 85.383996] [<c113b437>] ? trace_hardirqs_off+0x27/0x40
[ 85.383996] [<c24a37b1>] common_interrupt+0x31/0x36
[ 85.383996] [<c113f690>] ? lock_acquire+0xd0/0xf0
[ 85.383996] [<c2034bc2>] ? sock_register+0x32/0x100
[ 85.383996] [<c24a0a4c>] _raw_spin_lock+0x6c/0x110
[ 85.383996] [<c2034bc2>] ? sock_register+0x32/0x100
[ 85.383996] [<c2034bc2>] sock_register+0x32/0x100
[ 85.383996] [<c2454030>] vsock_core_init+0x160/0x280
[ 85.383996] [<c2fbb9cd>] ? nfc_hci_init+0x26/0x26
[ 85.383996] [<c2fbbb0c>] vmci_transport_init+0x13f/0x1a0
[ 85.383996] [<c2f3b5c1>] do_one_initcall+0xf2/0x21e


git bisect start 844aa7974395df1f0c7a866007e900e5f979fc7b v3.11 --
git bisect bad a09e9a7a4b907f2dfa9bdb2b98a1828ab4b340b2 # 17:13 2- Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
git bisect bad 40031da445fb4d269af9c7c445b2adf674f171e7 # 17:50 3- Merge tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 91121c103ae93ef117e58712786864270d7f488e # 18:33 46+ staging: dwc2: make dwc2_core_params documentation more complete
git bisect good 1d1fdd95df681f0c065d90ffaafa215a0e8825e2 # 19:15 46+ Merge tag 'char-misc-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect bad d472d9d98b463dd7a04f2bcdeafe4261686ce6ab # 19:24 1- lockref: Relax in cmpxchg loop
git bisect good e438ac9ade074932383bb00d51b8ded8a0ee164b # 20:02 46+ OMAP: serial: Remove incorrect disabling of IER interrupt
git bisect bad 36c27a655a531f16b5567066600f0ea7fcb973b4 # 20:12 2- w1: add attribute groups to struct w1_family_ops
git bisect bad 13e2237f36874a7abef44b98969f399127b1917d # 20:21 2- v4l2: convert class code to use dev_groups
git bisect bad 3499116b915e260b275c167a471e0e05bbe3f1d6 # 20:31 2- PTP: convert class code to use dev_groups
git bisect bad 58b267d3e3f3ce87c3e559e4c330c8c03e905f5e # 20:36 0- dma: convert dma_devclass to use dev_groups
git bisect good 899826f16a034f1ab6d23cd5b60cd113eaefad19 # 21:36 46+ enclosure: convert class code to use dev_groups
git bisect good ac3785d5c15922c8a4c6b9e050a5bc59895a6b9b # 22:18 46+ c2port: convert class code to use bin_attrs in groups
git bisect bad c817a67ecba7c3c2aaa104796d78f160af60920d # 22:33 2- kobject: delayed kobject release: help find buggy drivers
git bisect good 7c42721fe0c58a848849b43ff558cf2fb86aa35a # 23:47 46+ char: tile-srom: fix build error
git bisect good 7c42721fe0c58a848849b43ff558cf2fb86aa35a # 00:23 138+ char: tile-srom: fix build error
git bisect bad 5a7d8a28080caed7fd4cb1b81d092adac4445e8e # 00:23 0- Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect good 249c01e3c56c17b2905f0a33429d9466c4d4eb59 # 01:14 138+ Revert "kobject: delayed kobject release: help find buggy drivers"
git bisect bad d8efd82eece89f8a5790b0febf17522affe9e1f1 # 01:26 3- Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Thanks,
Fengguang


Attachments:
(No filename) (6.08 kB)
dmesg-quantal-ant-8:20130914124036:3.11.0-09992-g5a7d8a2:8 (54.54 kB)
bisect-5a7d8a28080caed7fd4cb1b81d092adac4445e8e-i386-randconfig-c4-0912-WARNING:-CPU:-59419.log (37.18 kB)
config-3.11.0-09992-g5a7d8a2 (79.25 kB)
Download all attachments

2013-09-17 14:07:34

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [pps_ktimer_event] WARNING: CPU: 0 PID: 1 at lib/list_debug.c:33 __list_add()

On Tue, Sep 17, 2013 at 09:51:22PM +0800, Fengguang Wu wrote:
> Greetings,
>
> I got the below dmesg and the first bad commit is

I'm afraid this report is virtually useless in terms of debugging what's
going on, because the timer list has been corrupted by some bad driver
kfree()'ing memory associated with a kobject before its release function
has been called.

This contains no hint as to whom is responsible for that.

I've tried to improve the debugging by decoupling the delayed work queue
from its associated kobject, but that has proven to be a fruitless task
due to the antics that x86 gets up to in its early initialisation.

I'm stumped with this; I can't see how we can get sane debugging out of
the kernel.

What we should have done is have explicitly delayed kobject releasing
from the very outset of sysfs to prevent this kind of madness from day
one - but we didn't and now we have something of a nasty headache with
it.