2013-08-25 00:04:38

by Fengguang Wu

[permalink] [raw]
Subject: [tps_init] BUG: unable to handle kernel paging request at 484970c9

Greetings,

It seems the below patch discloses a bug in tps_init().

commit c817a67ecba7c3c2aaa104796d78f160af60920d
Author: Russell King <[email protected]>
Date: Thu Jun 27 15:06:14 2013 +0100

kobject: delayed kobject release: help find buggy drivers

Implement debugging for kobject release functions. kobjects are
reference counted, so the drop of the last reference to them is not
predictable. However, the common case is for the last reference to be
the kobject's removal from a subsystem, which results in the release
function being immediately called.

This can hide subtle bugs, which can occur when another thread holds a
reference to the kobject at the same time that a kobject is removed.
This results in the release method being delayed.

In order to make these kinds of problems more visible, the following
patch implements a delayed release; this has the effect that the
release function will be out of order with respect to the removal of
the kobject in the same manner that it would be if a reference was
being held.

This provides us with an easy way to allow driver writers to debug
their drivers and fix otherwise hidden problems.

Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>


Parent commit not clean. Look out for wrong bisect!

BUG: kernel boot crashed

/kernel/i386-randconfig-j1-08220808/7c42721fe0c58a848849b43ff558cf2fb86aa35a/dmesg-kvm-nhm4-6710-20130824125721-3.11.0-rc2-00007-ge56341a-282

[ 0.304827] WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:530 sysfs_add_one+0x89/0x9b()
[ 0.306316] sysfs: cannot create duplicate filename '/bus/i2c/drivers/tps65010'
[ 0.307436] Modules linked in:
[ 0.307941] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-rc6-next-20130821-06842-g2d55880 #1
[ 0.308005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 0.308887] cc459da0 cc459da0 cc459d60 c12ae633 cc459d90 c102a725 c141580a cc459dbc
[ 0.310261] 00000001 c14157a1 00000212 c110e2cc c110e2cc cc41a000 cc657630 cc459ddc
[ 0.312898] cc459da8 c102a76f 00000009 cc459da0 c141580a cc459dbc cc459dd0 c110e2cc
[ 0.314269] Call Trace:
[ 0.314680] [<c12ae633>] dump_stack+0x16/0x18
[ 0.315386] [<c102a725>] warn_slowpath_common+0x7a/0x91
[ 0.316007] [<c110e2cc>] ? sysfs_add_one+0x89/0x9b
[ 0.316760] [<c110e2cc>] ? sysfs_add_one+0x89/0x9b
[ 0.317513] [<c102a76f>] warn_slowpath_fmt+0x33/0x35
[ 0.318290] [<c110e2cc>] sysfs_add_one+0x89/0x9b
[ 0.319010] [<c110e3e3>] create_dir+0x57/0x8d
[ 0.319698] [<c110e769>] sysfs_create_dir+0xe9/0x102
[ 0.320008] [<c113da3b>] kobject_add_internal+0x113/0x220
[ 0.320856] [<c10da635>] ? kfree+0xa9/0x21a
[ 0.321521] [<c1147d30>] ? kvasprintf+0x3e/0x47
[ 0.322235] [<c113dd5c>] kobject_init_and_add+0x43/0x48
[ 0.323061] [<c11ec11f>] bus_add_driver+0xcd/0x26b
[ 0.324007] [<c11ed623>] driver_register+0x96/0xca
[ 0.324774] [<c122e3a8>] i2c_register_driver+0x64/0x123
[ 0.325604] [<c1658167>] tps_init+0xab/0xbf
[ 0.326271] [<c16580bc>] ? wm8994_i2c_driver_init+0x11/0x11
[ 0.327143] [<c1000452>] do_one_initcall+0xf4/0x1af
[ 0.328016] [<c105515b>] ? local_clock+0x36/0x4e
[ 0.328745] [<c163542a>] ? repair_env_string+0x12/0x51
[ 0.329553] [<c104690e>] ? parse_args+0x284/0x38f
[ 0.330296] [<c1635b03>] kernel_init_freeable+0x116/0x1b0
[ 0.331134] [<c1635418>] ? do_early_param+0x7a/0x7a
[ 0.332006] [<c12a8940>] kernel_init+0x10/0xc8
[ 0.332704] [<c12b50d7>] ret_from_kernel_thread+0x1b/0x28
[ 0.333542] [<c12a8930>] ? rest_init+0x131/0x131
[ 0.334317] ---[ end trace c6073849988f8b3e ]---

git bisect start dcb396261fba04df952bfef5d6fc0d5d12438e3a v3.10 --
git bisect good 938d0a840d0f97b627111fd038a735f3924fd987 # 18:41 160+ watchdog: Add Broadcom BCM2835 watchdog timer driver
git bisect good 945a8ce78f4d18e61b1a70c228b51f26c9ca3df2 # 22:27 160+ Merge remote-tracking branch 'wireless-next/master'
git bisect bad 701f688ca117e94ee9c35bdbf065c2966ee1df91 # 22:27 0- Merge remote-tracking branch 'driver-core/driver-core-next'
git bisect good 956c3bfbbedd7d549b342d320db0eb9bc702428f # 05:55 160+ Merge remote-tracking branch 'modules/modules-next'
git bisect good d76759ab20e40abafbfab722feeb21851635a4b5 # 06:12 160+ Merge remote-tracking branch 'spi/for-next'
git bisect good cb1d0c0432c07090815ae9ee5d050d8290018377 # 09:34 160+ Merge branch 'x86/asm'
git bisect good 08fea51b96c8d0c9db06f98a5953d21db9950419 # 09:50 160+ Merge remote-tracking branch 'rcu/rcu/next'
git bisect good ede0e16dd25b2274270e3cfa30d9c9a88dbac0e8 # 13:13 160+ Merge remote-tracking branch 'regmap/for-next'
git bisect bad a93d6b0a879718a1a08553935744b0d5c58354c0 # 13:29 0- devfreq: convert devfreq_class to use dev_groups
git bisect bad 5baa7503a7a0b8d65308ff3b2ab887bf343c65d3 # 13:38 0- leds: convert class code to use dev_groups
git bisect bad c817a67ecba7c3c2aaa104796d78f160af60920d # 13:50 0- kobject: delayed kobject release: help find buggy drivers
git bisect good 899826f16a034f1ab6d23cd5b60cd113eaefad19 # 16:44 160+ enclosure: convert class code to use dev_groups
git bisect good e56341ad380114454119ac043e6d710ddbbb7710 # 20:12 160+ staging: comedi: convert class code to use dev_groups
git bisect good 7c42721fe0c58a848849b43ff558cf2fb86aa35a # 20:55 160+ char: tile-srom: fix build error
git bisect good 7c42721fe0c58a848849b43ff558cf2fb86aa35a # 23:43 480+ char: tile-srom: fix build error
git bisect bad 2d55880c40a3ac145ed9175d3bbdf11b80eaf861 # 23:44 0- Add linux-next specific files for 20130821
git bisect good 8daa20e710a4e9e3c893a043f0d455f8ef882859 # 00:56 480+ Revert "kobject: delayed kobject release: help find buggy drivers"
git bisect good 89b53e50bd2b75c22944d594ada36beee1b23e0f # 02:35 480+ Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
git bisect bad 245f08699d5320f021070ed8698ac89ebf2f0670 # 02:41 0- Add linux-next specific files for 20130822

Thanks,
Fengguang


Attachments:
(No filename) (6.06 kB)
dmesg-kvm-bens-3971-20130822120507-3.11.0-rc6-next-20130821-06842-g2d55880-1 (245.86 kB)
bisect-2d55880c40a3ac145ed9175d3bbdf11b80eaf861-i386-randconfig-j1-08220808-BUG:-unable-to-handle-kernel-paging-request-at-118691.log (56.91 kB)
config-3.11.0-rc6-next-20130821-06842-g2d55880 (60.95 kB)
Download all attachments

2013-08-25 04:07:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [tps_init] BUG: unable to handle kernel paging request at 484970c9

On Sun, Aug 25, 2013 at 08:04:21AM +0800, Fengguang Wu wrote:
> Greetings,
>
> It seems the below patch discloses a bug in tps_init().
>
> commit c817a67ecba7c3c2aaa104796d78f160af60920d
> Author: Russell King <[email protected]>
> Date: Thu Jun 27 15:06:14 2013 +0100
>
> kobject: delayed kobject release: help find buggy drivers
>
> Implement debugging for kobject release functions. kobjects are
> reference counted, so the drop of the last reference to them is not
> predictable. However, the common case is for the last reference to be
> the kobject's removal from a subsystem, which results in the release
> function being immediately called.
>
> This can hide subtle bugs, which can occur when another thread holds a
> reference to the kobject at the same time that a kobject is removed.
> This results in the release method being delayed.
>
> In order to make these kinds of problems more visible, the following
> patch implements a delayed release; this has the effect that the
> release function will be out of order with respect to the removal of
> the kobject in the same manner that it would be if a reference was
> being held.
>
> This provides us with an easy way to allow driver writers to debug
> their drivers and fix otherwise hidden problems.
>
> Signed-off-by: Russell King <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
>
> Parent commit not clean. Look out for wrong bisect!
>
> BUG: kernel boot crashed
>
> /kernel/i386-randconfig-j1-08220808/7c42721fe0c58a848849b43ff558cf2fb86aa35a/dmesg-kvm-nhm4-6710-20130824125721-3.11.0-rc2-00007-ge56341a-282
>
> [ 0.304827] WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:530 sysfs_add_one+0x89/0x9b()
> [ 0.306316] sysfs: cannot create duplicate filename '/bus/i2c/drivers/tps65010'

Same as before, are you unloading and loading modules? We have a fix
for modules that cause problems when unloading with the config option
above enabled. But that shouldn't be this issue.

thanks,

greg k-h

2013-08-25 04:39:31

by Fengguang Wu

[permalink] [raw]
Subject: Re: [tps_init] BUG: unable to handle kernel paging request at 484970c9

On Sat, Aug 24, 2013 at 09:09:39PM -0700, Greg KH wrote:
> On Sun, Aug 25, 2013 at 08:04:21AM +0800, Fengguang Wu wrote:
> > Greetings,
> >
> > It seems the below patch discloses a bug in tps_init().
> >
> > commit c817a67ecba7c3c2aaa104796d78f160af60920d
> > Author: Russell King <[email protected]>
> > Date: Thu Jun 27 15:06:14 2013 +0100
> >
> > kobject: delayed kobject release: help find buggy drivers
> >
> > Implement debugging for kobject release functions. kobjects are
> > reference counted, so the drop of the last reference to them is not
> > predictable. However, the common case is for the last reference to be
> > the kobject's removal from a subsystem, which results in the release
> > function being immediately called.
> >
> > This can hide subtle bugs, which can occur when another thread holds a
> > reference to the kobject at the same time that a kobject is removed.
> > This results in the release method being delayed.
> >
> > In order to make these kinds of problems more visible, the following
> > patch implements a delayed release; this has the effect that the
> > release function will be out of order with respect to the removal of
> > the kobject in the same manner that it would be if a reference was
> > being held.
> >
> > This provides us with an easy way to allow driver writers to debug
> > their drivers and fix otherwise hidden problems.
> >
> > Signed-off-by: Russell King <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> >

> > Parent commit not clean. Look out for wrong bisect!
> >
> > BUG: kernel boot crashed
> >
> > /kernel/i386-randconfig-j1-08220808/7c42721fe0c58a848849b43ff558cf2fb86aa35a/dmesg-kvm-nhm4-6710-20130824125721-3.11.0-rc2-00007-ge56341a-282

Sorry please ignore the above several lines.

> > [ 0.304827] WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:530 sysfs_add_one+0x89/0x9b()
> > [ 0.306316] sysfs: cannot create duplicate filename '/bus/i2c/drivers/tps65010'
>
> Same as before, are you unloading and loading modules? We have a fix
> for modules that cause problems when unloading with the config option
> above enabled. But that shouldn't be this issue.

At least I didn't explicitly unloading modules. Judging from the
timestamp [ 0.304827], the user space has not started yet.

Thanks,
Fengguang

2013-08-25 09:17:37

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [tps_init] BUG: unable to handle kernel paging request at 484970c9

On Sat, Aug 24, 2013 at 09:09:39PM -0700, Greg Kroah-Hartman wrote:
> Same as before, are you unloading and loading modules? We have a fix
> for modules that cause problems when unloading with the config option
> above enabled. But that shouldn't be this issue.

What seems to be is going on here is that something creates a kobject,
which gets exported into sysfs, then removed and then immediately
recreated.

It appears that sysfs entries are deleted in the cleanup function:

static void kobject_cleanup(struct kobject *kobj)
{
...
/* remove from sysfs if the caller did not do it */
if (kobj->state_in_sysfs) {
pr_debug("kobject: '%s' (%p): auto cleanup kobject_del\n",
kobject_name(kobj), kobj);
kobject_del(kobj);
}

which is now delayed. I don't think this is specifically a problem of
my patch - I think my patch is revealing a problem here.

Let's take it from the point of view of a struct device. Lets say that
a device with name "foo" gets created. This creates a sysfs directory
entry called "foo". Various things take a reference on the device.

You then unregister that struct device because it's been removed. As
all the references haven't yet been released, it means that the sysfs
directory persists. Now, if the device is recreated with the same
name, sysfs complains that the directory exists.

In this case, it's not a struct device, but a device driver itself:

/* some boards have startup glitches */
while (tries--) {
status = i2c_add_driver(&tps65010_driver);
if (the_tps)
break;
i2c_del_driver(&tps65010_driver);
if (!tries) {
printk(KERN_ERR "%s: no chip?\n", DRIVER_NAME);
return -ENODEV;
}
pr_debug("%s: re-probe ...\n", DRIVER_NAME);
msleep(10);
}

Notice how the (i2c) driver is registered and removed multiple times.
If something can independently grab a reference to the kobject associated
with that driver, then it can cause the sysfs directory to persist longer
than the re-probe, which will cause sysfs to complain.

I don't think moving the sysfs directory cleanup (via kobject_del()) before
the delayed release fixes the fundamental problem here, because that just
covers up what happens if a reference is held.

Commit 0f4dafc05 (Kobject: auto-cleanup on final unref) introduced the
kobject_del() into the cleanup function.

Any thoughts?