2007-06-26 10:30:29

by Thomas Sattler

[permalink] [raw]
Subject: BUG: held lock freed!

Hi there ...

I removed xfs from my system. The first reboot after replacing xfs with
ext3 brought be

Jun 26 08:43:17 pearl =========================
Jun 26 08:43:17 pearl [ BUG: held lock freed! ]
Jun 26 08:43:17 pearl -------------------------
Jun 26 08:43:17 pearl udevd/3064 is freeing memory c16fbe40-c16fbe7f,
with a lock still held there!
Jun 26 08:43:17 pearl (&sbinfo->stat_lock){--..}, at: [<c0157eb1>]
shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl 1 lock held by udevd/3064:
Jun 26 08:43:17 pearl #0: (&sbinfo->stat_lock){--..}, at: [<c0157eb1>]
shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl
Jun 26 08:43:17 pearl stack backtrace:
Jun 26 08:43:17 pearl [<c0135331>] debug_check_no_locks_freed+0xe7/0x11a
Jun 26 08:43:17 pearl [<c0158649>] kfree+0x45/0x7f
Jun 26 08:43:17 pearl [<c016c3d2>] free_fdtable_rcu+0x3a/0x70
Jun 26 08:43:17 pearl [<c012a95f>] __rcu_process_callbacks+0xfd/0x165
Jun 26 08:43:17 pearl [<c012a9d6>] rcu_process_callbacks+0xf/0x1e
Jun 26 08:43:17 pearl [<c0120c69>] tasklet_action+0x3d/0x68
Jun 26 08:43:17 pearl [<c0120b9e>] __do_softirq+0x41/0x92
Jun 26 08:43:17 pearl [<c0120c16>] do_softirq+0x27/0x3d
Jun 26 08:43:17 pearl [<c0120fae>] irq_exit+0x35/0x64
Jun 26 08:43:17 pearl [<c0106108>] do_IRQ+0x7e/0x92
Jun 26 08:43:17 pearl [<c0104634>] common_interrupt+0x24/0x34
Jun 26 08:43:17 pearl [<c010463e>] common_interrupt+0x2e/0x34
Jun 26 08:43:17 pearl [<c0136513>] lock_acquire+0x68/0x6e
Jun 26 08:43:17 pearl [<c0157eb1>] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [<c02e4daa>] _spin_lock+0x29/0x34
Jun 26 08:43:17 pearl [<c0157eb1>] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [<c0157eb1>] shmem_delete_inode+0xc1/0xda
Jun 26 08:43:17 pearl [<c0157df0>] shmem_delete_inode+0x0/0xda
Jun 26 08:43:17 pearl [<c016b2a8>] generic_delete_inode+0x8c/0xf4
Jun 26 08:43:17 pearl [<c016aa33>] iput+0x60/0x62
Jun 26 08:43:17 pearl [<c01636d7>] do_unlinkat+0xbe/0x132
Jun 26 08:43:17 pearl [<c0103bfa>] sysenter_past_esp+0x8f/0x99
Jun 26 08:43:17 pearl [<c0135227>] trace_hardirqs_on+0x11e/0x141
Jun 26 08:43:17 pearl [<c0103bca>] sysenter_past_esp+0x5f/0x99
Jun 26 08:43:17 pearl =======================

But it only came once, several reboots after that were ok. I changed my
kernel config today: e1000 is now "=y" (was "=m"), I removed PCMCIA as I
do not use it and some other modules complained about it, and I added
CONFIG_HIGHMEM4G=y (was CONFIG_NOHIGHMEM=y)

The running kernel is 2.6.22.5 +cfs +squashfs. My distribution is gentoo
(x86), quite up to date, udev is 104-r12.

Please CC me as I'm not subscribed to the list.

Thomas

--
keep mailinglists in english, feel free to send PM in german


2007-06-26 15:19:20

by Hugh Dickins

[permalink] [raw]
Subject: Re: BUG: held lock freed!

On Tue, 26 Jun 2007, Thomas Sattler wrote:

> Hi there ...
>
> I removed xfs from my system. The first reboot after replacing xfs with
> ext3 brought be

Perhaps this is a curse that falls on those who desert XFS ;)

>
> Jun 26 08:43:17 pearl =========================
> Jun 26 08:43:17 pearl [ BUG: held lock freed! ]
> Jun 26 08:43:17 pearl -------------------------
> Jun 26 08:43:17 pearl udevd/3064 is freeing memory c16fbe40-c16fbe7f,
> with a lock still held there!
> Jun 26 08:43:17 pearl (&sbinfo->stat_lock){--..}, at: [<c0157eb1>]
> shmem_delete_inode+0xc1/0xda
> Jun 26 08:43:17 pearl 1 lock held by udevd/3064:
> Jun 26 08:43:17 pearl #0: (&sbinfo->stat_lock){--..}, at: [<c0157eb1>]
> shmem_delete_inode+0xc1/0xda
> Jun 26 08:43:17 pearl
> Jun 26 08:43:17 pearl stack backtrace:
> Jun 26 08:43:17 pearl [<c0135331>] debug_check_no_locks_freed+0xe7/0x11a
> Jun 26 08:43:17 pearl [<c0158649>] kfree+0x45/0x7f
> Jun 26 08:43:17 pearl [<c016c3d2>] free_fdtable_rcu+0x3a/0x70
> Jun 26 08:43:17 pearl [<c012a95f>] __rcu_process_callbacks+0xfd/0x165
> Jun 26 08:43:17 pearl [<c012a9d6>] rcu_process_callbacks+0xf/0x1e
> Jun 26 08:43:17 pearl [<c0120c69>] tasklet_action+0x3d/0x68
> Jun 26 08:43:17 pearl [<c0120b9e>] __do_softirq+0x41/0x92
> Jun 26 08:43:17 pearl [<c0120c16>] do_softirq+0x27/0x3d
> Jun 26 08:43:17 pearl [<c0120fae>] irq_exit+0x35/0x64
> Jun 26 08:43:17 pearl [<c0106108>] do_IRQ+0x7e/0x92
> Jun 26 08:43:17 pearl [<c0104634>] common_interrupt+0x24/0x34
> Jun 26 08:43:17 pearl [<c010463e>] common_interrupt+0x2e/0x34
> Jun 26 08:43:17 pearl [<c0136513>] lock_acquire+0x68/0x6e
> Jun 26 08:43:17 pearl [<c0157eb1>] shmem_delete_inode+0xc1/0xda
> Jun 26 08:43:17 pearl [<c02e4daa>] _spin_lock+0x29/0x34
> Jun 26 08:43:17 pearl [<c0157eb1>] shmem_delete_inode+0xc1/0xda
> Jun 26 08:43:17 pearl [<c0157eb1>] shmem_delete_inode+0xc1/0xda
> Jun 26 08:43:17 pearl [<c0157df0>] shmem_delete_inode+0x0/0xda
> Jun 26 08:43:17 pearl [<c016b2a8>] generic_delete_inode+0x8c/0xf4
> Jun 26 08:43:17 pearl [<c016aa33>] iput+0x60/0x62
> Jun 26 08:43:17 pearl [<c01636d7>] do_unlinkat+0xbe/0x132
> Jun 26 08:43:17 pearl [<c0103bfa>] sysenter_past_esp+0x8f/0x99
> Jun 26 08:43:17 pearl [<c0135227>] trace_hardirqs_on+0x11e/0x141
> Jun 26 08:43:17 pearl [<c0103bca>] sysenter_past_esp+0x5f/0x99
> Jun 26 08:43:17 pearl =======================
>
> But it only came once, several reboots after that were ok. I changed my
> kernel config today: e1000 is now "=y" (was "=m"), I removed PCMCIA as I
> do not use it and some other modules complained about it, and I added
> CONFIG_HIGHMEM4G=y (was CONFIG_NOHIGHMEM=y)
>
> The running kernel is 2.6.22.5 +cfs +squashfs. My distribution is gentoo
> (x86), quite up to date, udev is 104-r12.
>
> Please CC me as I'm not subscribed to the list.

Odd. I can't see any error at the shmem_delete_inode end nor at the
free_fdtable_rcu end. It seems to be some kind of corruption, whereby
free_fdtable_rcu is kfree'ing some memory (perhaps fdt->open_fds),
but the address kfreed is that of the shmem_sb_info in which it has
just acquired a spinlock at the top of the stack.

I've not found any kfreeing of uninitialized pointer in fs/file.c.

It could come about through a single-bit error, and I was going to
suggest that you give memtest86+ a good run overnight. And still do
suggest that, though we seem to have rather too much of a coincidence
for it to be a likely explanation. But I've no other ideas, sorry.

Hugh

2007-06-26 17:52:29

by Thomas Sattler

[permalink] [raw]
Subject: Re: BUG: held lock freed!

>> I removed xfs from my system. The first reboot after replacing xfs with
>> ext3 brought be
>
> Perhaps this is a curse that falls on those who desert XFS ;)

My laptop sometimes 'kicks' his keyboard. That means no key is working
any more, mouse is ok and I can copy and paste single characters in
xterms to enter some commands. Pasting 'reboot' in a root-xterm does not
work.

AFAIR it always happens when a DVB-T recording ends. But, even on daily
recordings, only about once a month. Ingo Molnar told me to activate
CONFIG_PROVE_LOCKING. Since then I had

=============================================
[ INFO: possible recursive locking detected ]
2.6.21.5-cfs-v17 #5
---------------------------------------------
xauth/6510 is trying to acquire lock:
(&(&ip->i_lock)->mr_lock){----}, at: [<e16e0715>] xfs_ilock+0x47/0x67 [xfs]

but task is already holding lock:
(&(&ip->i_lock)->mr_lock){----}, at: [<e16e0715>] xfs_ilock+0x47/0x67 [xfs]

other info that might help us debug this:
2 locks held by xauth/6510:
#0: (&inode->i_mutex){--..}, at: [<c016477c>] open_namei+0xe2/0x555
#1: (&(&ip->i_lock)->mr_lock){----}, at: [<e16e0715>]
xfs_ilock+0x47/0x67 [xfs]

stack backtrace:
[<c0136026>] __lock_acquire+0x11e/0xb23
[<c0136dd9>] lock_acquire+0x56/0x6e
[<e16e0715>] xfs_ilock+0x47/0x67 [xfs]
[<c012fe10>] down_write+0x2e/0x46
[<e16e0715>] xfs_ilock+0x47/0x67 [xfs]
[<e16e0715>] xfs_ilock+0x47/0x67 [xfs]
[<e16e0fa3>] xfs_iget_core+0x291/0x579 [xfs]
[<e16e1312>] xfs_iget+0x87/0xfd [xfs]
[<e16f7665>] xfs_trans_iget+0xe6/0x151 [xfs]
[<e16e4a53>] xfs_ialloc+0xb2/0x479 [xfs]
[<e16f7fda>] xfs_dir_ialloc+0x7b/0x29d [xfs]
[<c012fe10>] down_write+0x2e/0x46
[<e16fda9b>] xfs_create+0x31c/0x5d6 [xfs]
[<e1707676>] xfs_vn_mknod+0x19b/0x2ce [xfs]
[<c01621f6>] vfs_create+0xa5/0xeb
[<c0164811>] open_namei+0x177/0x555
[<c015a8ab>] get_unused_fd+0x1f/0xb4
[<c015ab5c>] do_filp_open+0x25/0x39
[<c02e57da>] _spin_unlock+0x14/0x1c
[<c015a936>] get_unused_fd+0xaa/0xb4
[<c015abb2>] do_sys_open+0x42/0xc3
[<c015ac6c>] sys_open+0x1c/0x1e
[<c0103bca>] sysenter_past_esp+0x5f/0x99
=======================

or

=============================================
[ INFO: possible recursive locking detected ]
2.6.21.5-cfs-v17 #5
---------------------------------------------
dotlockfile/6467 is trying to acquire lock:
(&(&ip->i_lock)->mr_lock){----}, at: [<e15a6715>] xfs_ilock+0x47/0x67 [xfs]

but task is already holding lock:
(&(&ip->i_lock)->mr_lock){----}, at: [<e15a6715>] xfs_ilock+0x47/0x67 [xfs]

other info that might help us debug this:
2 locks held by dotlockfile/6467:
#0: (&inode->i_mutex){--..}, at: [<c016477c>] open_namei+0xe2/0x555
#1: (&(&ip->i_lock)->mr_lock){----}, at: [<e15a6715>] xfs_ilock+0x47/0x67

stack backtrace:
[<c0136026>] __lock_acquire+0x11e/0xb23
[<c0136dd9>] lock_acquire+0x56/0x6e
[<e15a6715>] xfs_ilock+0x47/0x67 [xfs]
[<c012fe10>] down_write+0x2e/0x46
[<e15a6715>] xfs_ilock+0x47/0x67 [xfs]
[<e15a6715>] xfs_ilock+0x47/0x67 [xfs]
[<e15a6fa3>] xfs_iget_core+0x291/0x579 [xfs]
[<e15a7312>] xfs_iget+0x87/0xfd [xfs]
[<e15bd665>] xfs_trans_iget+0xe6/0x151 [xfs]
[<e15aaa53>] xfs_ialloc+0xb2/0x479 [xfs]
[<e15bdfda>] xfs_dir_ialloc+0x7b/0x29d [xfs]
[<c012fe10>] down_write+0x2e/0x46
[<e15c3a9b>] xfs_create+0x31c/0x5d6 [xfs]
[<e15cd676>] xfs_vn_mknod+0x19b/0x2ce [xfs]
[<c01621f6>] vfs_create+0xa5/0xeb
[<c0164811>] open_namei+0x177/0x555
[<c015a8ab>] get_unused_fd+0x1f/0xb4
[<c015ab5c>] do_filp_open+0x25/0x39
[<c02e57da>] _spin_unlock+0x14/0x1c
[<c015a936>] get_unused_fd+0xaa/0xb4
[<c015abb2>] do_sys_open+0x42/0xc3
[<c015ac6c>] sys_open+0x1c/0x1e
[<c0103bca>] sysenter_past_esp+0x5f/0x99
=======================

in dmesg right after bootup. Ingo said that xfs used to have problems
with lockdep, but that this doesn't mean there's anything wrong with
XFS, but that lockdep turns itself off after it finds the first locking
problem. So I formated the data partition as ext3, which gave me the
formerly posted info.

> Odd. I can't see any error at the shmem_delete_inode end nor at the
> free_fdtable_rcu end. It seems to be some kind of corruption, whereby
> free_fdtable_rcu is kfree'ing some memory (perhaps fdt->open_fds),
> but the address kfreed is that of the shmem_sb_info in which it has
> just acquired a spinlock at the top of the stack.

There was a typo in my mail, I'm running 2.6.21.5-cfs-v18, not 2.6.22,
sorry for that.

> It could come about through a single-bit error, and I was going to
> suggest that you give memtest86+ a good run overnight. And still do
> suggest that, though we seem to have rather too much of a coincidence
> for it to be a likely explanation. But I've no other ideas, sorry.

I'll run memtest86+ this night and post the results tomorrow.

Please remember to CC me as I'm not subscribed to the list.

Thomas

--
keep mailinglists in english, feel free to send PM in german

2007-06-27 09:12:40

by Thomas Sattler

[permalink] [raw]
Subject: Re: BUG: held lock freed!

> I'll run memtest86+ this night and post the results tomorrow.

Memtest86+ did not show any problems:

time 8h
pass 24
errors 0

Please remember to CC me as I'm not subscribed to the list.

Thomas

--
keep mailinglists in english, feel free to send PM in german