Hi Alessandro,
I got the below dmesg and the first bad commit is
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit 4debfe409b6e550032bfef9733e9f6f7c5613617
Author: Alessandro Rubini <[email protected]>
AuthorDate: Tue Jun 18 23:48:07 2013 +0200
Commit: Greg Kroah-Hartman <[email protected]>
CommitDate: Tue Jun 18 15:42:15 2013 -0700
FMC: add a char-device mezzanine driver
This driver exports the memory area associated with the mezzanine card
as a misc device, so users can access registers.
Signed-off-by: Alessandro Rubini <[email protected]>
Acked-by: Juan David Gonzalez Cobas <[email protected]>
Acked-by: Emilio G. Cota <[email protected]>
Acked-by: Samuel Iglesias Gonsalvez <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
+---------------------------------------------------------------------+------------+------------+------------+
| | 6007b1bd0f | 4debfe409b | 77b24197e8 |
+---------------------------------------------------------------------+------------+------------+------------+
| boot_successes | 27 | 0 | 0 |
| boot_failures | 318 | 345 | 13 |
| BUG:kernel_boot_hang | 118 | 119 | |
| BUG:kernel_early_hang_without_any_printk_output | 89 | 10 | |
| Out_of_memory:Kill_process | 24 | 1 | |
| Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 10 | | |
| backtrace:sock_alloc_file | 1 | | |
| backtrace:SyS_socket | 3 | | |
| backtrace:SyS_socketcall | 3 | | |
| backtrace:user_path_at | 3 | | |
| backtrace:SyS_faccessat | 3 | | |
| backtrace:SyS_access | 3 | | |
| backtrace:sock_create | 2 | | |
| backtrace:do_fork | 1 | 0 | 2 |
| backtrace:SyS_clone | 1 | 0 | 2 |
| backtrace:vfs_fstatat | 1 | | |
| backtrace:SyS_fstatat64 | 1 | | |
| BUG:unable_to_handle_kernel_paging_request | 87 | 38 | |
| Oops | 87 | 38 | |
| EIP_is_at_list_del_init | 86 | 38 | |
| BUG:sleeping_function_called_from_invalid_context_at_kernel/rwsem.c | 87 | 38 | |
| INFO:lockdep_is_turned_off | 87 | 38 | 12 |
| EIP_is_at_kthread_data | 70 | 28 | |
| backtrace:list_del_init | 86 | 38 | |
| backtrace:error_code | 87 | 38 | |
| EIP_is_at_worker_thread | 1 | | |
| BUG:scheduling_while_atomic | 0 | 216 | |
| backtrace:fmc_driver_register | 0 | 216 | |
| backtrace:fc_init | 0 | 216 | |
| backtrace:kernel_init_freeable | 0 | 216 | 13 |
| backtrace:kzalloc | 0 | 2 | |
| backtrace:device_private_init | 0 | 2 | |
| BUG:soft_lockup-CPU_stuck_for_s | 0 | 0 | 1 |
| EIP_is_at_augment_callbacks_propagate | 0 | 0 | 1 |
| Kernel_panic-not_syncing:softlockup:hung_tasks | 0 | 0 | 1 |
| backtrace:erase_augmented | 0 | 0 | 1 |
| backtrace:rbtree_test_init | 0 | 0 | 1 |
| inconsistent_IN-HARDIRQ-W-HARDIRQ-ON-W_usage | 0 | 0 | 12 |
| INFO:task_blocked_for_more_than_seconds | 0 | 0 | 12 |
| EIP_is_at_native_apic_mem_write | 0 | 0 | 10 |
| Kernel_panic-not_syncing:hung_task:blocked_tasks | 0 | 0 | 12 |
| backtrace:driver_register | 0 | 0 | 12 |
| backtrace:ide_cdrom_init | 0 | 0 | 12 |
| backtrace:vfs_read | 0 | 0 | 7 |
| backtrace:SyS_read | 0 | 0 | 7 |
| backtrace:vfs_write | 0 | 0 | 7 |
| backtrace:SyS_write | 0 | 0 | 7 |
| backtrace:watchdog | 0 | 0 | 12 |
| EIP_is_at__default_send_IPI_dest_field | 0 | 0 | 2 |
| backtrace:do_mount | 0 | 0 | 5 |
| backtrace:SyS_mount | 0 | 0 | 5 |
| backtrace:do_sys_open | 0 | 0 | 3 |
| backtrace:SyS_open | 0 | 0 | 3 |
+---------------------------------------------------------------------+------------+------------+------------+
[ 10.047684] fmc fake-design-for-testing-f001: Driver has no ID: matches all
[ 10.052183] fmc_write_eeprom fake-design-for-testing-f001: fmc_write_eeprom: no busid passed, refusing all cards
[ 10.059440] fmc fake-design-for-testing-f001: Driver has no ID: matches all
[ 10.064447] BUG: scheduling while atomic: swapper/1/0x10000002
[ 10.068677] 4 locks held by swapper/1:
[ 10.071897] #0: (&__lockdep_no_validate__){......}, at: [<c1429c9e>] __driver_attach+0x24/0x6a
[ 10.080751] #1: (&__lockdep_no_validate__){......}, at: [<c1429cb6>] __driver_attach+0x3c/0x6a
[ 10.089569] #2: (fc_lock){+.+.+.}, at: [<c1a6ce23>] fc_probe+0x6d/0xd3
[ 10.095948] #3: (misc_mtx){+.+.+.}, at: [<c140d4e9>] misc_register+0x1f/0xf7
[ 10.105338] CPU: 0 PID: 1 Comm: swapper Not tainted 3.10.0-rc6-00087-g4debfe4 #5
[ 10.111618] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 10.115783] cdccc020 cdccc020 cdccfc74 c1b4c870 cdccfc90 c1b48ae7 c1e7a875 cdccc2d0
[ 10.125492] 00000001 10000002 cdccfd6c cdccfd08 c1b5c989 00004082 cdccfcc0 c1044eb7
[ 10.135215] cdccc020 c1056266 cdccfcc4 cdcce000 cdccc020 cdccc51c 00000003 cdccc51c
[ 10.147601] Call Trace:
[ 10.150906] [<c1b4c870>] dump_stack+0x16/0x18
[ 10.154945] [<c1b48ae7>] __schedule_bug+0x5d/0x6f
[ 10.159221] [<c1b5c989>] __schedule+0x4c/0x524
[ 10.163385] [<c1044eb7>] ? sched_clock_cpu+0xc7/0xd5
[ 10.169343] [<c1056266>] ? mark_lock+0x1e/0x20c
[ 10.173700] [<c1057472>] ? __lock_is_held+0x2d/0x43
[ 10.177952] [<c10574cf>] ? lock_is_held+0x47/0x55
[ 10.181680] [<c10442be>] ? __might_sleep+0x6b/0x17a
[ 10.185293] [<c1042d91>] __cond_resched+0x16/0x26
[ 10.189317] [<c1b5cf31>] _cond_resched+0x11/0x18
[ 10.193023] [<c1b5d0ba>] __wait_for_common+0x26/0xdb
[ 10.196718] [<c1b5bb24>] ? console_conditional_schedule+0x24/0x24
[ 10.200837] [<c1042e95>] ? try_to_wake_up+0xf4/0xfe
[ 10.204580] [<c1b5d189>] wait_for_completion+0x1a/0x1c
[ 10.208455] [<c142ca64>] devtmpfs_create_node+0xd0/0xec
[ 10.212343] [<c14275a4>] device_add+0x18c/0x504
[ 10.215984] [<c142792e>] device_register+0x12/0x15
[ 10.219730] [<c1427e59>] device_create_vargs+0x67/0x8e
[ 10.223561] [<c1427ea8>] device_create+0x28/0x2a
[ 10.227198] [<c140d572>] misc_register+0xa8/0xf7
[ 10.230888] [<c1a6ce2b>] fc_probe+0x75/0xd3
[ 10.234439] [<c10dd7f1>] ? sysfs_create_link+0x24/0x2d
[ 10.238250] [<c1a6b4a5>] fmc_probe+0xf/0x11
[ 10.241735] [<c1429b50>] driver_probe_device+0x82/0x180
[ 10.245609] [<c1b5c160>] ? mutex_lock_nested+0x3e/0x44
[ 10.249341] [<c1429cc8>] __driver_attach+0x4e/0x6a
[ 10.253102] [<c142889f>] bus_for_each_dev+0x57/0x61
[ 10.256868] [<c1429a8f>] driver_attach+0x17/0x19
[ 10.260535] [<c1429c7a>] ? __device_attach+0x2c/0x2c
[ 10.264303] [<c1428e9d>] bus_add_driver+0xd2/0x1d5
[ 10.268292] [<c142a2a9>] driver_register+0x7a/0xd3
[ 10.271984] [<c14291b5>] ? bus_register+0x13f/0x20c
[ 10.275685] [<c1a6b60a>] fmc_driver_register+0x23/0x2d
[ 10.279520] [<c1a6cd98>] ? fc_read+0x68/0x68
[ 10.283026] [<c1a6cda5>] fc_init+0xd/0xf
[ 10.286517] [<c227ca7c>] do_one_initcall+0x7e/0x114
[ 10.290213] [<c227cc07>] kernel_init_freeable+0xf5/0x187
[ 10.294110] [<c227c442>] ? do_early_param+0x78/0x78
[ 10.297876] [<c1b45392>] kernel_init+0xb/0xbe
[ 10.301427] [<c1b5eafb>] ret_from_kernel_thread+0x1b/0x30
[ 10.305329] [<c1b45387>] ? rest_init+0x12c/0x12c
git bisect start v3.11 v3.10 --
git bisect bad 89d0abe3d695103505c025dde6e07b9c3dd772f4 # 02:16 2- 11 Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
git bisect bad d2b4a646717153a1a180b64d4a8464054dbd700e # 02:26 0- 15 Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma
git bisect bad 1873e50028ce87dd9014049c86d71a898fa02166 # 02:52 0- 1 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
git bisect bad a4883ef6af5e513a1e8c2ab9aab721604aa3a4f5 # 03:06 4- 14 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad fe3c22bd5cadd8e36977b218b27fbea821381ec8 # 03:10 0- 1 Merge tag 'char-misc-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect good 3f490f7f99053288bd85563f8d9b5032b810e177 # 04:04 345+ 313 Merge tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
git bisect good a84270189e2afc7028b1123415a66d444f460977 # 05:30 345+ 312 Merge tag 'usb-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good 0de10f9ea67d1046b1e16c91fa26accf23939aab # 07:10 345+ 279 Merge tag 'tty-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good ce49b6289fa3878b190f15192e54bb23dca552b6 # 07:41 345+ 294 Merge tag 'staging-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good fd5c25614012ccad75ef505982af4d6c08d0df83 # 08:10 345+ 312 vme: Free DMA resource allocation structure
git bisect good 36dd68319c7b56a76ed450b0e470067b5d74b9b2 # 08:35 345+ 331 memory: tegra30-mc: Fix IRQ handler.
git bisect bad 4640a3f2bff64b808bdedadcddf882aa4606f374 # 08:38 1- 3 FMC: fix error handling in probe() function
git bisect good 056d83f3c30c398f14eb879f1d1707e3a7808f4a # 09:12 345+ 337 FMC: add a software mezzanine driver
git bisect bad 05c3e0bb5629b897b0459e4bfb1b93d729033b99 # 09:17 0- 2 UIO: allow binding uio_pdrv_genirq.c to devices using command line option
git bisect bad 4debfe409b6e550032bfef9733e9f6f7c5613617 # 09:26 0- 3 FMC: add a char-device mezzanine driver
git bisect good 6007b1bd0f752a5c022f7944c65fb96c39d6db3d # 10:03 345+ 318 FMC: add a driver to write mezzanine EEPROM
# first bad commit: [1bc18086231c130895b87ec049be8ddcdab552b8
4debfe409b6e550032bfef9733e9f6f7c5613617] ata: delete non-required instances of include <linux/init.h>
git bisect bad 38dbfb59d1175ef458d006556061adeaa8751b72 # 10:03 0- 345 Linus 3.14-rc1
This script may reproduce the error.
-----------------------------------------------------------------------------
#!/bin/bash
kernel=$1
initrd=quantal-core-i386.cgz
wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd
kvm=(
qemu-system-x86_64 -cpu kvm64 -enable-kvm
-kernel $kernel
-initrd $initrd
-smp 2
-m 256M
-net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio
-net user,vlan=0
-net nic,vlan=1,model=e1000
-net user,vlan=1
-boot order=nc
-no-reboot
-watchdog i6300esb
-serial stdio
-display none
-monitor null
)
append=(
debug
sched_debug
apic=debug
ignore_loglevel
sysrq_always_enabled
panic=10
prompt_ramdisk=0
earlyprintk=ttyS0,115200
console=ttyS0,115200
console=tty0
vga=normal
root=/dev/ram0
rw
)
"${kvm[@]}" --append "${append[*]}"
-----------------------------------------------------------------------------
Thanks,
Fengguang
Hello.
Thank you for the report.
I'm at a conference and I fear I won't be able to test myself in the
next days, but I think this is already fixed (it is part of
the "misc_register" call path, so it's the same problem).
The fix is commit v3.11-rc2-11-g783c2fb
783c2fb FMC: fix locking in sample chardev driver
This commit, however, is not part of v3.11 and I think this is why you
are finding the problem in the v3.10..v3.11 interval.
thank you again
/alessandro
On 04/09/2014 01:08 PM, Alessandro Rubini wrote:
> Hello.
> Thank you for the report.
>
> I'm at a conference and I fear I won't be able to test myself in the
> next days, but I think this is already fixed (it is part of
> the "misc_register" call path, so it's the same problem).
>
> The fix is commit v3.11-rc2-11-g783c2fb
>
> 783c2fb FMC: fix locking in sample chardev driver
>
> This commit, however, is not part of v3.11 and I think this is why you
> are finding the problem in the v3.10..v3.11 interval.
>
> thank you again
> /alessandro
>
I find commit 783c2fb FMC: fix locking in sample chardev driver. I will help to test it.
On Wed, Apr 09, 2014 at 07:08:43AM +0200, Alessandro Rubini wrote:
> Hello.
> Thank you for the report.
>
> I'm at a conference and I fear I won't be able to test myself in the
> next days, but I think this is already fixed (it is part of
> the "misc_register" call path, so it's the same problem).
>
> The fix is commit v3.11-rc2-11-g783c2fb
>
> 783c2fb FMC: fix locking in sample chardev driver
>
> This commit, however, is not part of v3.11 and I think this is why you
> are finding the problem in the v3.10..v3.11 interval.
Alessandro, you are right. There are no more "scheduling while
atomic" bugs in v3.12 and v3.13.
Our bisect log shows
git bisect bad 38dbfb59d1175ef458d006556061adeaa8751b72 # 10:03 0- 345 Linus 3.14-rc1
However that happen to be caused by an independent "scheduling while
atomic" bug:
[ 20.038125] Fixing recursive fault but reboot is needed!
[ 20.038125] BUG: scheduling while atomic: kworker/0:1H/77/0x00000005
[ 20.038125] INFO: lockdep is turned off.
[ 20.038125] irq event stamp: 758
[ 20.038125] hardirqs last enabled at (757): [<c1c31683>] _raw_spin_unlock_irq+0x22/0x30
[ 20.038125] hardirqs last disabled at (758): [<c1c31523>] _raw_spin_lock_irq+0x14/0x73
[ 20.038125] softirqs last enabled at (302): [<c1032d4d>] __do_softirq+0x186/0x1d2
[ 20.038125] softirqs last disabled at (295): [<c1002f99>] do_softirq_own_stack+0x2f/0x35
[ 20.038125] CPU: 0 PID: 77 Comm: kworker/0:1H Tainted: G D W 3.14.0-rc1 #1
[ 20.038125] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 20.038125] c0420610 c0420610 c0449a38 c1c1f562 c0449a54 c1c1b59c c1f91661 c0420938
[ 20.038125] 0000004d 00000005 c0420610 c0449acc c1c2e4e2 c105fff8 01449a7c 000004af
[ 20.038125] c0420610 0000002c 00000001 c0449a7c c0420610 c0449ab4 c106001c 00000000
[ 20.038125] Call Trace:
[ 20.038125] [<c1c1f562>] dump_stack+0x16/0x18
[ 20.038125] [<c1c1b59c>] __schedule_bug+0x5d/0x6f
[ 20.038125] [<c1c2e4e2>] __schedule+0x45/0x55f
[ 20.038125] [<c105fff8>] ? vprintk_emit+0x367/0x3a4
[ 20.038125] [<c106001c>] ? vprintk_emit+0x38b/0x3a4
[ 20.038125] [<c105876b>] ? trace_hardirqs_off+0xb/0xd
[ 20.038125] [<c1c1c185>] ? printk+0x38/0x3a
[ 20.038125] [<c1c2ea59>] schedule+0x5d/0x5f
[ 20.038125] [<c10314b8>] do_exit+0xcc/0x75d
[ 20.038125] [<c1060e7b>] ? kmsg_dump+0x184/0x191
[ 20.038125] [<c1060d13>] ? kmsg_dump+0x1c/0x191
[ 20.038125] [<c1003d54>] oops_end+0x7e/0x83
[ 20.038125] [<c1c1ae82>] no_context+0x1ba/0x1c2
[ 20.038125] [<c1c1afc1>] __bad_area_nosemaphore+0x137/0x13f
[ 20.038125] [<c1c1a82d>] ? pte_offset_kernel+0x13/0x2a
[ 20.038125] [<c1c1aa5f>] ? spurious_fault+0x75/0xd5
[ 20.038125] [<c1c1afdb>] bad_area_nosemaphore+0x12/0x14
Thanks,
Fengguang
On 04/09/2014 01:28 PM, Fengguang Wu wrote:
> On Wed, Apr 09, 2014 at 07:08:43AM +0200, Alessandro Rubini wrote:
>> Hello.
>> Thank you for the report.
>>
>> I'm at a conference and I fear I won't be able to test myself in the
>> next days, but I think this is already fixed (it is part of
>> the "misc_register" call path, so it's the same problem).
>>
>> The fix is commit v3.11-rc2-11-g783c2fb
>>
>> 783c2fb FMC: fix locking in sample chardev driver
>>
>> This commit, however, is not part of v3.11 and I think this is why you
>> are finding the problem in the v3.10..v3.11 interval.
>
> Alessandro, you are right. There are no more "scheduling while
> atomic" bugs in v3.12 and v3.13.
>
> Our bisect log shows
>
> git bisect bad 38dbfb59d1175ef458d006556061adeaa8751b72 # 10:03 0- 345 Linus 3.14-rc1
>
> However that happen to be caused by an independent "scheduling while
> atomic" bug:
Alessandro, Fengguang & I confirmed that this below dmesg is also caused by over locking in fc_probe(). Not a new introduced bug.
>
> [ 20.038125] Fixing recursive fault but reboot is needed!
> [ 20.038125] BUG: scheduling while atomic: kworker/0:1H/77/0x00000005
> [ 20.038125] INFO: lockdep is turned off.
> [ 20.038125] irq event stamp: 758
> [ 20.038125] hardirqs last enabled at (757): [<c1c31683>] _raw_spin_unlock_irq+0x22/0x30
> [ 20.038125] hardirqs last disabled at (758): [<c1c31523>] _raw_spin_lock_irq+0x14/0x73
> [ 20.038125] softirqs last enabled at (302): [<c1032d4d>] __do_softirq+0x186/0x1d2
> [ 20.038125] softirqs last disabled at (295): [<c1002f99>] do_softirq_own_stack+0x2f/0x35
> [ 20.038125] CPU: 0 PID: 77 Comm: kworker/0:1H Tainted: G D W 3.14.0-rc1 #1
> [ 20.038125] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 20.038125] c0420610 c0420610 c0449a38 c1c1f562 c0449a54 c1c1b59c c1f91661 c0420938
> [ 20.038125] 0000004d 00000005 c0420610 c0449acc c1c2e4e2 c105fff8 01449a7c 000004af
> [ 20.038125] c0420610 0000002c 00000001 c0449a7c c0420610 c0449ab4 c106001c 00000000
> [ 20.038125] Call Trace:
> [ 20.038125] [<c1c1f562>] dump_stack+0x16/0x18
> [ 20.038125] [<c1c1b59c>] __schedule_bug+0x5d/0x6f
> [ 20.038125] [<c1c2e4e2>] __schedule+0x45/0x55f
> [ 20.038125] [<c105fff8>] ? vprintk_emit+0x367/0x3a4
> [ 20.038125] [<c106001c>] ? vprintk_emit+0x38b/0x3a4
> [ 20.038125] [<c105876b>] ? trace_hardirqs_off+0xb/0xd
> [ 20.038125] [<c1c1c185>] ? printk+0x38/0x3a
> [ 20.038125] [<c1c2ea59>] schedule+0x5d/0x5f
> [ 20.038125] [<c10314b8>] do_exit+0xcc/0x75d
> [ 20.038125] [<c1060e7b>] ? kmsg_dump+0x184/0x191
> [ 20.038125] [<c1060d13>] ? kmsg_dump+0x1c/0x191
> [ 20.038125] [<c1003d54>] oops_end+0x7e/0x83
> [ 20.038125] [<c1c1ae82>] no_context+0x1ba/0x1c2
> [ 20.038125] [<c1c1afc1>] __bad_area_nosemaphore+0x137/0x13f
> [ 20.038125] [<c1c1a82d>] ? pte_offset_kernel+0x13/0x2a
> [ 20.038125] [<c1c1aa5f>] ? spurious_fault+0x75/0xd5
> [ 20.038125] [<c1c1afdb>] bad_area_nosemaphore+0x12/0x14
>
> Thanks,
> Fengguang
>
On 04/09/2014 01:08 PM, Alessandro Rubini wrote:
> Hello.
> Thank you for the report.
>
> I'm at a conference and I fear I won't be able to test myself in the
> next days, but I think this is already fixed (it is part of
> the "misc_register" call path, so it's the same problem).
>
> The fix is commit v3.11-rc2-11-g783c2fb
>
> 783c2fb FMC: fix locking in sample chardev driver
>
> This commit, however, is not part of v3.11 and I think this is why you
> are finding the problem in the v3.10..v3.11 interval.
>
> thank you again
> /alessandro
Alessandro, your commit 83c2fb FMC: fix locking in sample chardev driver fixes the issue.
Tested-by: Jet Chen <[email protected]>
Thanks,
Jet