2010-08-29 20:00:18

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] New: Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361

Summary: Watchdog detected hard LOCKUP in
jbd2_journal_get_write_access
Product: File System
Version: 2.5
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext3
AssignedTo: [email protected]
ReportedBy: [email protected]
Regression: Yes


Created an attachment (id=28361)
--> (https://bugzilla.kernel.org/attachment.cgi?id=28361)
full dmesg output

Kernel configuration: 2.6.36-rc3
Athlon 64 X2 3GHz in 64bits mode
Slackware64 13.1

maybe an ext4 problem

At boot, with a kernel compiled with support for locks, kmemcheck and so, I got
this from time to time (amoung several other problems reported from kmemcheck
and sometimes a total freeze at boot):


kjournald starting. Commit interval 5 seconds
EXT3-fs (sda1): using internal journal
EXT3-fs (sda1): mounted filesystem with ordered data mode
ADDRCONF(NETDEV_UP): eth0: link is not ready
uli526x: eth0 NIC Link is Up 100 Mbps Full duplex
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
------------[ cut here ]------------
WARNING: at kernel/watchdog.c:233 watchdog_overflow_callback+0xe3/0x110()
Hardware name: K8 Combo-Z
Watchdog detected hard LOCKUP on cpu 0
Modules linked in: snd_intel8x0 snd_cmipci snd_opl3_lib snd_hwdep
snd_mpu401_uart snd_ac97_codec snd_ra
wmidi ac97_bus
Pid: 1923, comm: update-mime-dat Not tainted 2.6.36-rc3 #6
Call Trace:
<NMI> [<ffffffff81042cda>] warn_slowpath_common+0x7a/0xb0
[<ffffffff81042db1>] warn_slowpath_fmt+0x41/0x50
[<ffffffff81086d37>] ? watchdog_overflow_callback+0x7/0x110
[<ffffffff81086e13>] watchdog_overflow_callback+0xe3/0x110
[<ffffffff81096281>] ? __perf_event_overflow+0x141/0x210
[<ffffffff8109133a>] ? perf_event_update_userpage+0x10a/0x1b0
[<ffffffff81091230>] ? perf_event_update_userpage+0x0/0x1b0
[<ffffffff8101247d>] ? x86_perf_event_set_period+0xdd/0x160
[<ffffffff810964f4>] ? perf_event_overflow+0x14/0x20
[<ffffffff8101414f>] ? x86_pmu_handle_irq+0x10f/0x140
[<ffffffff81010d8b>] ? perf_event_nmi_handler+0x4b/0x60
[<ffffffff810660d0>] ? notifier_call_chain+0x50/0xa0
[<ffffffff810666a5>] ? __atomic_notifier_call_chain+0x95/0xd0
[<ffffffff81066610>] ? __atomic_notifier_call_chain+0x0/0xd0
[<ffffffff8108b255>] ? rcu_nmi_exit+0x15/0x60
[<ffffffff8100424b>] ? do_nmi+0x9b/0x2b0
[<ffffffff815981ca>] ? nmi+0x1a/0x2c
[<ffffffff8118fc46>] ? jbd2_journal_get_write_access+0x36/0x50
[<ffffffff81597de0>] ? debug+0x0/0x40
<<EOE>> <#DB> [<ffffffff81086d42>] ? watchdog_overflow_callback+0x12/0x110
<<EOE>>
---[ end trace 1408e87e8e109015 ]---
BUG: sleeping function called from invalid context at fs/ext4/inode.c:5833
in_atomic(): 1, irqs_disabled(): 0, pid: 1923, name: update-mime-dat
2 locks held by update-mime-dat/1923:
#0: (&sb->s_type->i_mutex_key#3){+.+.+.}, at: [<ffffffff810db511>]
do_last+0x111/0x690
#1: (jbd2_handle){+.+...}, at: [<ffffffff811904f0>]
start_this_handle+0x470/0x510
Pid: 1923, comm: update-mime-dat Tainted: G W 2.6.36-rc3 #6
Call Trace:
[<ffffffff810721f3>] ? __debug_show_held_locks+0x13/0x30
[<ffffffff81036b85>] __might_sleep+0x105/0x130
[<ffffffff8115c137>] ext4_mark_inode_dirty+0x47/0x1d0
[<ffffffff8116643b>] ext4_add_nondir+0x6b/0x80
[<ffffffff81166bf6>] ext4_create+0xd6/0x130
[<ffffffff810da459>] vfs_create+0x89/0xc0
[<ffffffff810db9fa>] ? do_last+0x5fa/0x690
[<ffffffff810db935>] do_last+0x535/0x690
[<ffffffff810dd97d>] do_filp_open+0x21d/0x660
[<ffffffff81596c98>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff810393c5>] ? sub_preempt_count+0x95/0xd0
[<ffffffff815977d0>] ? _raw_spin_unlock+0x30/0x60
[<ffffffff810e9132>] ? alloc_fd+0x122/0x1d0
[<ffffffff810cd7e0>] do_sys_open+0x60/0x120
[<ffffffff81596c59>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff810cd8cb>] sys_open+0x1b/0x20
[<ffffffff810023ab>] system_call_fastpath+0x16/0x1b
note: update-mime-dat[1923] exited with preempt_count 1
BUG: scheduling while atomic: update-mime-dat/1923/0x10000002
no locks held by update-mime-dat/1923.
Modules linked in: snd_intel8x0 snd_cmipci snd_opl3_lib snd_hwdep
snd_mpu401_uart snd_ac97_codec snd_rawmidi ac97_bus
Pid: 1923, comm: update-mime-dat Tainted: G W 2.6.36-rc3 #6
Call Trace:
[<ffffffff810721f3>] ? __debug_show_held_locks+0x13/0x30
[<ffffffff81039257>] __schedule_bug+0x77/0x80
[<ffffffff815943f9>] schedule+0x6e9/0xa30
[<ffffffff8103d153>] __cond_resched+0x13/0x30
[<ffffffff8159482b>] _cond_resched+0x2b/0x40
[<ffffffff810b1a59>] unmap_vmas+0x7c9/0x940
[<ffffffff810b6c2d>] exit_mmap+0xdd/0x1a0
[<ffffffff81040677>] mmput+0x57/0xe0
[<ffffffff81045250>] exit_mm+0x100/0x130
[<ffffffff810477ad>] do_exit+0x64d/0x720
[<ffffffff810478cf>] do_group_exit+0x4f/0xc0
[<ffffffff8105448a>] get_signal_to_deliver+0x31a/0x510
[<ffffffff81040090>] ? kick_process+0x50/0xa0
[<ffffffff8159778d>] ? _raw_spin_unlock_irqrestore+0x5d/0x70
[<ffffffff810017c0>] do_signal+0x70/0x790
[<ffffffff8159776d>] ? _raw_spin_unlock_irqrestore+0x3d/0x70
[<ffffffff8100dd71>] ? syscall_trace_leave+0xd1/0xe0
[<ffffffff81001f2f>] do_notify_resume+0x4f/0x60
[<ffffffff81596c59>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8100264b>] int_signal+0x12/0x17
BUG: scheduling while atomic: update-mime-dat/1923/0x10000002
no locks held by update-mime-dat/1923.
Modules linked in: snd_intel8x0 snd_cmipci snd_opl3_lib snd_hwdep
snd_mpu401_uart snd_ac97_codec snd_rawmidi ac97_bus
Pid: 1923, comm: update-mime-dat Tainted: G W 2.6.36-rc3 #6
Call Trace:
...

etc
full dmesg appended as attachement.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.


2010-09-07 22:05:23

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361


Andrew Morton <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]
Component|ext3 |ext4
AssignedTo|[email protected] |[email protected]
|g |g




--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-07 22:27:44

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361


Rafael J. Wysocki <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected],
| |[email protected]
Kernel Version| |2.6.36-rc3
Blocks| |16444




--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-14 05:48:12

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #1 from Christian Casteyde <[email protected]> 2010-09-14 05:48:04 ---
Update :
still present in 2.6.36-rc4.
However, the call stack is a little different now. As the computer crashes very
soon, I've taken a photo, see attachment.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-14 05:51:02

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #2 from Christian Casteyde <[email protected]> 2010-09-14 05:50:55 ---
Created an attachment (id=29872)
--> (https://bugzilla.kernel.org/attachment.cgi?id=29872)
console output image

screenshot of warning (can't get it as the computer crashes after that).

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-14 18:08:44

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #3 from Rafael J. Wysocki <[email protected]> 2010-09-14 18:08:31 ---
On Tuesday, September 14, 2010, Christian Casteyde wrote:
> Yes, still present, as well as my other bug that says that the system freezes
> short after the boot.
>
> Le dimanche 12 septembre 2010 20:14:29, vous avez écrit :
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.35. Please verify if it still should be listed and let the
> > tracking team know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=17361
> > Subject : Watchdog detected hard LOCKUP in
> jbd2_journal_get_write_access
> > Submitter : Christian Casteyde <[email protected]>
> > Date : 2010-08-29 19:59 (15 days old)

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.-

2010-09-21 17:19:34

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #4 from Christian Casteyde <[email protected]> 2010-09-21 17:19:32 ---
Update : Still present in 2.6.36-rc5
With the same log as shown in comment #2
Some more info are given after that ("recursive fault fixed, but rebbot is
necessary").
If I wait a little more, I get another pile of messages.

More precisely, the problem occurs indeed after dhcp gets an IP address, and
before (or when) sshd is started. Between the two, I guess iptables rules are
defined in my network scripts.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-22 17:48:06

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #5 from Christian Casteyde <[email protected]> 2010-09-22 17:47:58 ---
Well, I managed to boot in single mode, and tried to execute commands one by
one to see what triggers the problems.
Indeed, it's really random, sometime it's while ldconfig is called by the init
scripts, sometime it's when starting udev...
The callstacks are quite different also, sometime it's kmemcheck, sometime
preempt count, sometime NULL dereference...

So I'm posting the dmesg output I saved each time I got an error. Now at least
they are complete, so the root cause may be easier to find.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-22 17:51:45

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #6 from Christian Casteyde <[email protected]> 2010-09-22 17:51:41 ---
Created an attachment (id=30982)
--> (https://bugzilla.kernel.org/attachment.cgi?id=30982)
warning got while booting

kmemcheck fatal error while in sys_readlink

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-22 17:53:34

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #7 from Christian Casteyde <[email protected]> 2010-09-22 17:53:31 ---
Created an attachment (id=30992)
--> (https://bugzilla.kernel.org/attachment.cgi?id=30992)
warning got while starting udev

now using smp_processor_id() in preemptible while in watchdog_overflow_callback

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-22 17:56:03

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #8 from Christian Casteyde <[email protected]> 2010-09-22 17:55:59 ---
Created an attachment (id=31002)
--> (https://bugzilla.kernel.org/attachment.cgi?id=31002)
quite the same, plus another warning after

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-22 17:58:52

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #9 from Christian Casteyde <[email protected]> 2010-09-22 17:58:50 ---
Created an attachment (id=31012)
--> (https://bugzilla.kernel.org/attachment.cgi?id=31012)
finally, the NULL dereference

I'm not sure the second and next warnings are not due to the first one, and a
bug in kmemcheck itself.
I'm totally unable to tell what cause it, and I gave up bisecting because it
crashes here and there, and I cannot say what crash is due to which bug, if
ever there is only one bug.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-22 18:01:51

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #10 from Christian Casteyde <[email protected]> 2010-09-22 18:01:49 ---
Created an attachment (id=31032)
--> (https://bugzilla.kernel.org/attachment.cgi?id=31032)
.config file

I append the .config file, just to tell what are the kernel debug options I
use.
I'm running in 64bit and build from vanilla source, with gcc 4.4.4.

I don't think this is hardware related, since I get the same on my laptop. More
some racy problems in udev/ldconfig/fc-cache

I'm still thinking it's a vfs problem or kmemcheck itself bug.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-22 18:11:48

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #11 from Christian Casteyde <[email protected]> 2010-09-22 18:11:39 ---
*** Bug 17371 has been marked as a duplicate of this bug. ***

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-22 18:13:15

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #12 from Christian Casteyde <[email protected]> 2010-09-22 18:13:08 ---
*** Bug 16616 has been marked as a duplicate of this bug. ***

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-09-27 19:26:50

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #13 from Rafael J. Wysocki <[email protected]> 2010-09-27 19:26:48 ---
On Monday, September 27, 2010, Christian Casteyde wrote:
> Still present in rc5-git7.
>
>
> Le dimanche 26 septembre 2010 22:04:13, vous avez écrit :
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.35. Please verify if it still should be listed and let the
> > tracking team know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=17361
> > Subject : Watchdog detected hard LOCKUP in
> jbd2_journal_get_write_access
> > Submitter : Christian Casteyde <[email protected]>
> > Date : 2010-08-29 19:59 (29 days old)

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.-

2010-10-02 16:42:51

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361


Theodore Tso <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #14 from Theodore Tso <[email protected]> 2010-10-02 16:42:45 ---
People may not be paying attention to this due to the subject line.

Except for the initial bug report, none of the other stack traces have anything
to do with ext4/jbd2. And in the initial ext4 trace, we see the complaint
that we're calling might_sleep() in ext4_mark_inode_dirty(), in a code path
where we are manifestly not taking any spinlocks. And in fact we don't see
any spinlocks being taken at the point where the complaint is mode in
ext4_mark_inode_dirty(). Yet preempt_count > 1.

It looks to me like some unrelated piece of code is bumping preempt_count, and
not decrementing it. Maybe in some code which is called from an interrupt
handler, in some device driver? That might explain why you're getting
failures all over the kernel.

It may be worth closing this report, and opening several new ones, one for each
failure, and make it clear this is not an ext4-related problem, since the
subject line and component assigned for this bug is highly misleading.

Including your kernel config would also be useful when you do that.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-10-04 18:48:33

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #15 from Christian Casteyde <[email protected]> 2010-10-04 18:48:29 ---
I've explored some of the ideas you mentionned.

First, I disabled kmemcheck, and indeed I've got no warning at all. So this is
really a kmemcheck + something else (buggy driver or perf counters as
mentionned by Vegard in his previous mail).

I could check the reverse: kmemcheck but without perf counters, however I
cannot unselect the perf option on my config (it is forced by something else
I've not found). kmemcheck used to work (I don't know if perf counters were
available in previous kernels however).

Second, I have most of my drivers built in, however I use sound cards as
modules in order to control the load order.

I've booted after erasing the /lib/module driver, but with kmemcheck enabled,
and it was far better. I got only one kmemecheck error:

ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
ERROR: kmemcheck: Fatal error

Pid: 1914, comm: gtk-update-icon Not tainted 2.6.36-rc6 #13 K8 Combo-Z/K8
Combo-Z
RIP: 0010:[<ffffffff81010b90>] [<ffffffff81010b90>]
x86_perf_event_update+0x0/0x80
RSP: 0018:ffff880002607d50 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000ffffffff
RDX: ffff88000260b5c0 RSI: 0000000000000400 RDI: ffff88003f808800
RBP: ffff880002607df8 R08: 0000000000000000 R09: 0000000000000004
R10: 0000000000000001 R11: 0000000000000038 R12: ffff88000260b7c8
R13: 0000000000000000 R14: ffff88000260b7c0 R15: ffff88003f808800
FS: 00007fdb70d5e700(0000) GS:ffff880002600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88003f836420 CR3: 000000003da52000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[<ffffffff8102bc34>] kmemcheck_error_save_bug+0xb4/0xe0
[<ffffffff8102c658>] kmemcheck_access+0xe8/0x4a0
[<ffffffff8102ca82>] kmemcheck_fault+0x72/0x80
[<ffffffff81026f89>] do_page_fault+0x359/0x440
[<ffffffff8159711f>] page_fault+0x1f/0x30
[<ffffffff81011528>] perf_event_nmi_handler+0x78/0x140
[<ffffffff810665d0>] notifier_call_chain+0x50/0xa0
[<ffffffff81066b81>] __atomic_notifier_call_chain+0x71/0xd0
[<ffffffff81066bf1>] atomic_notifier_call_chain+0x11/0x20
[<ffffffff81066c2e>] notify_die+0x2e/0x30
[<ffffffff81004343>] do_nmi+0x193/0x2b0
[<ffffffff8159740a>] nmi+0x1a/0x2c
[<ffffffff8102c5c7>] kmemcheck_access+0x57/0x4a0
[<ffffffff8102ca82>] kmemcheck_fault+0x72/0x80
[<ffffffff81026f89>] do_page_fault+0x359/0x440
[<ffffffff8159711f>] page_fault+0x1f/0x30
[<ffffffff810dab55>] do_lookup+0x55/0x170
[<ffffffff810dcb10>] link_path_walk+0x130/0xbd0
[<ffffffff810dd720>] path_walk+0x60/0xe0
[<ffffffff810dd7f3>] do_path_lookup+0x53/0x60
[<ffffffff810de433>] user_path_at+0x53/0xa0
[<ffffffff810d4c17>] vfs_fstatat+0x37/0x70
[<ffffffff810d4c86>] vfs_stat+0x16/0x20
[<ffffffff810d4e5f>] sys_newstat+0x1f/0x50
[<ffffffff810023ab>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

The good news is all other errors are gone. Here is the lsmod output without
kmemcheck, just to tell the modules that could cause more grief:
Module Size Used by
snd_intel8x0 29098 1
snd_cmipci 31770 4
snd_opl3_lib 9332 1 snd_cmipci
snd_hwdep 6314 1 snd_opl3_lib
snd_mpu401_uart 6360 1 snd_cmipci
snd_rawmidi 20342 1 snd_mpu401_uart
snd_ac97_codec 117688 1 snd_intel8x0
ac97_bus 1290 1 snd_ac97_codec

The .config file is already attached to this bug report (please note I've also
some other unused drivers configured, for hardware I do not use for now).

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-10-04 18:51:33

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Random kmemcheck errors and kernel freeze on 2.6.36-rc*

https://bugzilla.kernel.org/show_bug.cgi?id=17361


Christian Casteyde <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|Watchdog detected hard |Random kmemcheck errors and
|LOCKUP in |kernel freeze on 2.6.36-rc*
|jbd2_journal_get_write_acce |
|ss |




--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-10-10 19:19:33

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Random kmemcheck errors and kernel freeze on 2.6.36-rc*

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #16 from Christian Casteyde <[email protected]> 2010-10-10 19:19:29 ---
I also append .config file for my laptop, since it also has the problem as my
main computer.
I've tried to git bisect the problem on this laptop:

0f477dd0851bdcee82923da66a7fc4a44cb1bc3d is the first bad commit

However, I don't know if this is the same bug as described here, as I said I
cannot be sure.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-10-10 19:20:47

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Random kmemcheck errors and kernel freeze on 2.6.36-rc*

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #17 from Christian Casteyde <[email protected]> 2010-10-10 19:20:45 ---
Created an attachment (id=33052)
--> (https://bugzilla.kernel.org/attachment.cgi?id=33052)
git bisect log result

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-10-10 19:22:31

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Random kmemcheck errors and kernel freeze on 2.6.36-rc*

https://bugzilla.kernel.org/show_bug.cgi?id=17361





--- Comment #18 from Christian Casteyde <[email protected]> 2010-10-10 19:22:28 ---
Created an attachment (id=33062)
--> (https://bugzilla.kernel.org/attachment.cgi?id=33062)
.config file for the laptop

Common hardware I'm aware is:
Athlon 64 in 64bits,
soundcard intel8x0

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-10-10 19:38:43

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Random kmemcheck errors and kernel freeze on 2.6.36-rc*

https://bugzilla.kernel.org/show_bug.cgi?id=17361


Rafael J. Wysocki <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #19 from Rafael J. Wysocki <[email protected]> 2010-10-10 19:38:36 ---
0f477dd0851bdcee82923da66a7fc4a44cb1bc3d is a merge commit, so it surely didn't
introduce the problem by itself, although it's quite probable that one of the
commits in this merge did that.

Adding Peter to the CC list.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-10-11 20:47:43

by Andreas Dilger

[permalink] [raw]
Subject: Re: [Bug 17361] Watchdog detected hard LOCKUP in jbd2_journal_get_write_access


On 2010-10-02, at 09:42, Theodore Tso <[email protected]> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=17361
>
> Except for the initial bug report, none of the other stack traces have anything to do with ext4/jbd2. And in the initial ext4 trace, we see the complaint that we're calling might_sleep() in ext4_mark_inode_dirty(), in a code path where we are manifestly not taking any spinlocks. And in fact we don't see any spinlocks being taken at the point where the complaint is mode in ext4_mark_inode_dirty(). Yet preempt_count > 1.

In my experience, this is typically caused by stack overflow smashing the task struct and randomly setting preempt_count to a non-zero value.

Cheers, Andreas






2010-10-20 20:10:40

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Random kmemcheck errors and kernel freeze on 2.6.36-rc*

https://bugzilla.kernel.org/show_bug.cgi?id=17361


Christian Casteyde <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |INVALID




--- Comment #20 from Christian Casteyde <[email protected]> 2010-10-20 20:10:37 ---
Hello
My bisection was obviously false, sorry.

It took me some time, but I've finally understood why my bisection failed, and
what was happening.

The problem is that with kmemcheck, the kernel boots so slowly (except if there
is a warning at boot in swapper, as it occured in several revision pre rc1),
that the hard and soft lockup detection mechanism was triggering (often in udev
or hal or ldconfig or other slow commands). Moreover, it triggered sometimes at
a place the kernel cannot cope with it, and crashes.

Before bisecting, I took a fresh 2.6.35 .config file and reactivated the
debugging options, but I fogot the lockup detector. So I couldn't see the
errors anymore. Nevertheless, I checked other warnings that used to occur
before -rc1, and that were fixed later. As I was looking for a range between
2.6.35 and rc1, I got false results on already fixed bugs.

Finally, I do not have any problem anymore with 2.6.36-rc8. Except that the
lockup timeout is either too low for my computers, or crashes the kernel
sometimes whent it triggers. But that's not critical.

I'm closing the bug.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-10-20 20:38:14

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 17361] Random kmemcheck errors and kernel freeze on 2.6.36-rc*

https://bugzilla.kernel.org/show_bug.cgi?id=17361


Christian Casteyde <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |CLOSED




--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.