2013-10-06 08:25:09

by Fengguang Wu

[permalink] [raw]
Subject: BUG: sleeping function called from invalid context at mm/slab.c:3060

Greetings,

I got the below dmesg and the first bad commit is

commit c817a67ecba7c3c2aaa104796d78f160af60920d
Author: Russell King <[email protected]>
Date: Thu Jun 27 15:06:14 2013 +0100

kobject: delayed kobject release: help find buggy drivers

Implement debugging for kobject release functions. kobjects are
reference counted, so the drop of the last reference to them is not
predictable. However, the common case is for the last reference to be
the kobject's removal from a subsystem, which results in the release
function being immediately called.

This can hide subtle bugs, which can occur when another thread holds a
reference to the kobject at the same time that a kobject is removed.
This results in the release method being delayed.

In order to make these kinds of problems more visible, the following
patch implements a delayed release; this has the effect that the
release function will be out of order with respect to the removal of
the kobject in the same manner that it would be if a reference was
being held.

This provides us with an easy way to allow driver writers to debug
their drivers and fix otherwise hidden problems.

Signed-off-by: Russell King <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

mount: mounting proc on /proc failed: No such device
grep: /proc/filesystems: No such file or directory
[ 4.188118] BUG: sleeping function called from invalid context at mm/slab.c:3060
[ 4.190236] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 4.191696] 1 lock held by swapper/0/0:
Starting Bootlog daemon:
[ 4.192991] #0: (H

git bisect start 45c4ea0c4c1b8abffb575b865243fd1c69bbecd9 v3.11 --
git bisect bad 39eda2aba6be642b71f2e0ad623dcb09fd9d79cf # 03:04 0- Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
git bisect bad aa7054f5a5a9ff728ce291cb103afa19f4f849eb # 03:16 160- Merge tag 'pinctrl-v3.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad 751144271f4b63d5de9005ea4e5e6e5c7c6fd629 # 03:40 110- Merge tag 'staging-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good c211148fb6355f347ec819a05c3034acd56e75c1 # 09:25 2140+ staging: dgnc: adds TODO
git bisect good b9a1048137f4ae43ee90f61a3f34f0efe863cfeb # 14:30 2140+ usbcore: fix incorrect type in assignment in descriptors_changed()
git bisect good ec1882a9391c55332ebf3d1654f40b76e4a6c010 # 21:05 2140+ Merge tag 'regulator-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
git bisect good 1d1fdd95df681f0c065d90ffaafa215a0e8825e2 # 03:49 2140+ Merge tag 'char-misc-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect bad f9ae443b5a1aa3d34d57df966a220a51a854d4c8 # 03:59 37- sysfs: group.c: add kerneldoc for sysfs_remove_group
git bisect bad 910f5c7cdb1d4541997f01c17b731bdfe2e92eda # 04:23 190- Correct unfaithful translation on HOWTO in ko_KR
git bisect bad 86df268705696ce2286410f1493e294e889bdce6 # 04:35 83- drivers:base:core: Moved sym export macros to respective functions
git bisect good 7c42721fe0c58a848849b43ff558cf2fb86aa35a # 13:41 2140+ char: tile-srom: fix build error
git bisect bad 92a3e661bf55457091817c5c346c4313be2508af # 13:49 310- SCSI: OSD: convert class code to use dev_groups
git bisect bad 58b267d3e3f3ce87c3e559e4c330c8c03e905f5e # 13:53 249- dma: convert dma_devclass to use dev_groups
git bisect bad c817a67ecba7c3c2aaa104796d78f160af60920d # 14:00 445- kobject: delayed kobject release: help find buggy drivers
git bisect good 7c42721fe0c58a848849b43ff558cf2fb86aa35a # 21:46 10000+ char: tile-srom: fix build error
git bisect bad a0cf1abc25ac197dd97b857c0f6341066a8cb1cf # 21:47 55- Add linux-next specific files for 20130927
git bisect good 918cf69e1b11370ca6da77c96ff3383b06a72bf3 # 02:15 10000+ Revert "kobject: delayed kobject release: help find buggy drivers"
git bisect bad 7dee8dff47734ee52da2cd8b8ea9736e42c5062f # 02:20 25- Merge tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad a0cf1abc25ac197dd97b857c0f6341066a8cb1cf # 02:20 0- Add linux-next specific files for 20130927

Thanks,
Fengguang


Attachments:
(No filename) (4.33 kB)
dmesg-yocto-lkp-tt02-10:20130928023627:3.12.0-rc2-02842-g45c4ea0:25 (61.50 kB)
bisect-45c4ea0c4c1b8abffb575b865243fd1c69bbecd9-x86_64-randconfig-x1-0928-BUG:-sleeping-function-called-from-invalid-context-at-105490.log (97.28 kB)
config-3.12.0-rc2-02842-g45c4ea0 (79.11 kB)
Download all attachments

2013-10-06 08:41:43

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: BUG: sleeping function called from invalid context at mm/slab.c:3060

On Sun, Oct 06, 2013 at 03:58:11PM +0800, Fengguang Wu wrote:
> Greetings,
>
> I got the below dmesg and the first bad commit is
>
> commit c817a67ecba7c3c2aaa104796d78f160af60920d
> Author: Russell King <[email protected]>
> Date: Thu Jun 27 15:06:14 2013 +0100
>
> kobject: delayed kobject release: help find buggy drivers
>
> Implement debugging for kobject release functions. kobjects are
> reference counted, so the drop of the last reference to them is not
> predictable. However, the common case is for the last reference to be
> the kobject's removal from a subsystem, which results in the release
> function being immediately called.
>
> This can hide subtle bugs, which can occur when another thread holds a
> reference to the kobject at the same time that a kobject is removed.
> This results in the release method being delayed.
>
> In order to make these kinds of problems more visible, the following
> patch implements a delayed release; this has the effect that the
> release function will be out of order with respect to the removal of
> the kobject in the same manner that it would be if a reference was
> being held.
>
> This provides us with an easy way to allow driver writers to debug
> their drivers and fix otherwise hidden problems.
>
> Signed-off-by: Russell King <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> mount: mounting proc on /proc failed: No such device
> grep: /proc/filesystems: No such file or directory
> [ 4.188118] BUG: sleeping function called from invalid context at mm/slab.c:3060
> [ 4.190236] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
> [ 4.191696] 1 lock held by swapper/0/0:
> Starting Bootlog daemon:
> [ 4.192991] #0: (H

Sorry, I don't believe this one. This patch adds no new allocation.
How does device_not_available() end up being called, or
math_state_restore() ?

2013-10-07 05:27:44

by Fengguang Wu

[permalink] [raw]
Subject: Re: BUG: sleeping function called from invalid context at mm/slab.c:3060

On Sun, Oct 06, 2013 at 09:41:21AM +0100, Russell King - ARM Linux wrote:
> On Sun, Oct 06, 2013 at 03:58:11PM +0800, Fengguang Wu wrote:
> > Greetings,
> >
> > I got the below dmesg and the first bad commit is
> >
> > commit c817a67ecba7c3c2aaa104796d78f160af60920d
> > Author: Russell King <[email protected]>
> > Date: Thu Jun 27 15:06:14 2013 +0100
> >
> > kobject: delayed kobject release: help find buggy drivers
> >
> > Implement debugging for kobject release functions. kobjects are
> > reference counted, so the drop of the last reference to them is not
> > predictable. However, the common case is for the last reference to be
> > the kobject's removal from a subsystem, which results in the release
> > function being immediately called.
> >
> > This can hide subtle bugs, which can occur when another thread holds a
> > reference to the kobject at the same time that a kobject is removed.
> > This results in the release method being delayed.
> >
> > In order to make these kinds of problems more visible, the following
> > patch implements a delayed release; this has the effect that the
> > release function will be out of order with respect to the removal of
> > the kobject in the same manner that it would be if a reference was
> > being held.
> >
> > This provides us with an easy way to allow driver writers to debug
> > their drivers and fix otherwise hidden problems.
> >
> > Signed-off-by: Russell King <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> >
> > mount: mounting proc on /proc failed: No such device
> > grep: /proc/filesystems: No such file or directory
> > [ 4.188118] BUG: sleeping function called from invalid context at mm/slab.c:3060
> > [ 4.190236] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
> > [ 4.191696] 1 lock held by swapper/0/0:
> > Starting Bootlog daemon:
> > [ 4.192991] #0: (H
>
> Sorry, I don't believe this one. This patch adds no new allocation.
> How does device_not_available() end up being called, or
> math_state_restore() ?

Russell, I confirmed that it's a good bisect. With the patch, 2055
out of 2140 kernel boots have some kind of error messages. After
reverting the commit, it boots 10000 times w/o a single error.

However you do have good reasons to doubt: I don't see this particular
error message "BUG: sleeping function called from invalid context at
mm/slab.c" in commit c817a67ec's bad dmesgs, which contain these error
messages instead:

$ grep_crash_head -h dmesg-* | sed 's/^[^a-zA-Z]*//' | sort | uniq -c | sort -nr

1929 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
1921 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
1897 Kernel panic - not syncing: Fatal exception in interrupt
56 WARNING: CPU: 0 PID: 1 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
28 kernel BUG at /c/wfg/mm/mm/slab.c:3011!
27 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
19 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
16 INFO: lockdep is turned off.
13 general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
13 BUG: unable to handle kernel
8 BUG: sleeping function called from invalid context at /c/wfg/mm/kernel/rwsem.c:20
6 WARNING: CPU: 0 PID: 0 at /c/wfg/mm/lib/debugobjects.c:260 debug_print_object+0x7c/0x8b()
6 WARNING: CPU: 0 PID: 0 at /c/wfg/mm/kernel/workqueue.c:457 work_fixup_activate+0x6a/0x6f()
6 WARNING: CPU: 0 PID: 0 at /c/wfg/mm/kernel/workqueue.c:1378 __queue_work+0x1a1/0x1ee()
5 WARNING: CPU: 1 PID: 0 at /c/wfg/mm/lib/debugobjects.c:260 debug_print_object+0x7c/0x8b()
5 WARNING: CPU: 1 PID: 0 at /c/wfg/mm/kernel/workqueue.c:457 work_fixup_activate+0x6a/0x6f()
5 WARNING: CPU: 1 PID: 0 at /c/wfg/mm/kernel/workqueue.c:1378 __queue_work+0x1a1/0x1ee()
5 Oops: 0002 [#1]
4 WARNING: CPU: 1 PID: 0 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
4 Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
4 BUG: unable to handle kernel NULL pointer dereference
3 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
2 WARNING: CPU: 1 PID: 1 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
2 BUG: unable to handle kernel paging request at ffffffffffffffa8
2 BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
2 BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
2 BUG: scheduling while atomic: rc.local/135/0x10000002
1 invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
1 general protection fault: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
1 general protection fault: 0000 [#1] PREEMPT
1 general protection fault: 0000 [#1] P/RbEinE/sMh:P /Tpr oc/sSMP DEBUG_PAGEALLOC
1 WARNING: CPU: 1 PID: 222 at /c/wfg/mm/include/linux/kref.h:47 kobject_get+0x37/0x44()
1 WARNING: CPU: 1 PID: 139 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 1 PID: 133 at /c/wfg/mm/kernel/lockdep.c:708 look_up_lock_class+0xa0/0xb7()
1 WARNING: CPU: 1 PID: 130 at /c/wfg/mm/kernel/lockdep.c:708 look_up_lock_class+0xa0/0xb7()
1 WARNING: CPU: 1 PID: 129 at /c/wfg/mm/lib/debugobjects.c:260 debug_print_object+0x7c/0x8b()
1 WARNING: CPU: 1 PID: 129 at /c/wfg/mm/kernel/workqueue.c:457 work_fixup_activate+0x6a/0x6f()
1 WARNING: CPU: 1 PID: 129 at /c/wfg/mm/kernel/workqueue.c:1378 __queue_work+0x1a1/0x1ee()
1 WARNING: CPU: 1 PID: 119 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 0 PID: 22 at /c/wfg/mm/lib/debugobjects.c:260 debug_print_object+0x7c/0x8b()
1 WARNING: CPU: 0 PID: 22 at /c/wfg/mm/kernel/workqueue.c:457 work_fixup_activate+0x6a/0x6f()
1 WARNING: CPU: 0 PID: 22 at /c/wfg/mm/kernel/workqueue.c:1378 __queue_work+0x1a1/0x1ee()
1 WARNING: CPU: 0 PID: 209 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 0 PID: 20 at /c/wfg/mm/lib/debugobjects.c:260 debug_print_object+0x7c/0x8b()
1 WARNING: CPU: 0 PID: 20 at /c/wfg/mm/kernel/workqueue.c:457 work_fixup_activate+0x6a/0x6f()
1 WARNING: CPU: 0 PID: 20 at /c/wfg/mm/kernel/workqueue.c:1378 __queue_work+0x1a1/0x1ee()
1 WARNING: CPU: 0 PID: 135 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 0 PID: 133 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 0 PID: 132 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 0 PID: 126 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 0 PID: 122 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 0 PID: 116 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 WARNING: CPU: 0 PID: 1 at /c/wfg/mm/lib/debugobjects.c:260 debug_print_object+0x7c/0x8b()
1 WARNING: CPU: 0 PID: 1 at /c/wfg/mm/kernel/workqueue.c:457 work_fixup_activate+0x6a/0x6f()
1 WARNING: CPU: 0 PID: 1 at /c/wfg/mm/kernel/workqueue.c:1378 __queue_work+0x1a1/0x1ee()
1 WARNING: CPU: 0 PID: 0 at /c/wfg/mm/kernel/workqueue.c:590 set_work_data+0x33/0x50()
1 Oops: 0t0rinity: not fo02 [#1] PREEMPT SMP DEBUG_PAGEALLOC
1 Oops: 0002 [#1] PREEMStParting BootlogT daemon: SMP DEBUG_PAGEALLOC
1 Oops: 0002 [#1] PREEMPT grep: /procS/fileMP DEBUG_PAGEALLOC
1 Oops: 0002 [#1] PREEMPT bootlogd: cannot allocate pseudo tty: No such file or directory
1 Oops: 0002 [#1] CoPnfRigEurEinMg PneTtw SMP orDk EinBteUrfGac_esP..AGEALLOC
1 Oops: 0002 [#1] /PREEMPT SMP DEBUG_PAGEALLOC
1 Oops: 0000 [#3] PREEMPT SMP DEBUG_PAGEALLOC
1 Kernel panic - not syncing: Attempted to kill the idle task!
1 K er ne1l .t892024] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
1 BUG: unable to handle kernel paging request at 00000000fffedf1c
1 BUG: unable to handle kernel paging request at 0000000000010012
1 BUG: unable to handle kernel done.paging request at ffffffffffffffa8
1 BUG: unable to handle kernel PleaNse waitU: boLLo pointer dereference at 0000000000000008
1 BUG: unable to handle kernel NULmount: mLou ntingp oinproc otn /proce rfa dereilefde: Nro sencuch deevice at 000000
1 BUG: unable to handle kernel NULLKe rnpelo itesntst: Beoot OK!
1 BUG: unable to handle kernel NULL pointer dererence at 0000000000000008
1 BUG: unable to handle kernel NULL pointer dereferenceg reap:t / pr0oc0/c0md0l000000000008
1 BUG: unable to handle kernel NULL pointer dereference at mo0un0t:0 n0o 0/p0ro0c/000mo0un0ts0
1 BUG: unable to handle kernel NULL pointer dereference at 00000000000008d8
1 BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
1 BUG: unable to handle kernel NULL pointer dereference at (null)
1 BUG: unable to handle kernel /bin/sh: /proc/self/fd/9: No such file or directory
1 BUG: unable to handle kernel /bin/sh: /proc/sNULL pointer dereference at 0000000000000008
1 BUG: scheduling while atomic: sh/131/0x00000003
1 BUG: scheduling while atomic: reboot/222/0x10000002
1 BUG: scheduling while atomic: reboot/211/0x10000002
1 BUG: scheduling while atomic: reboot/135/0x10000002
1 BUG: scheduling while atomic: ifconfig/231/0x10000002
1 BUG: scheduling while atomic: S99-rc.local/211/0x10000002
1 BUG: Bad rss-counter state mm:ffff88000ca47340 idx:1 val:2
1 BUG: Bad rss-counter state mm:ffff88000ca0ed00 idx:1 val:2

Thanks,
Fengguang