LinuxLists.cc - Re: next-20090220: XFS: inconsistent lock state

2009-03-03 16:00:29

Subject: Re: next-20090220: XFS: inconsistent lock state

Christoph Hellwig wrote:
> On Fri, Feb 20, 2009 at 08:52:59PM +0300, Alexander Beregalov wrote:
>> Hi
>>
>> [ INFO: inconsistent lock state ]
>> 2.6.29-rc5-next-20090220 #2
>> ---------------------------------
>> inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
>> kswapd0/324 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> (&(&ip->i_lock)->mr_lock){+++++?}, at: [<ffffffff803ca60a>]
>> xfs_ilock+0xaa/0x120
>> {RECLAIM_FS-ON-W} state was registered at:
>
> That's a false positive. While the ilock can be taken in reclaim the
> allocation here is done before the inode is added to the inode cache.
>
> The patch below should help avoiding the warning:

Seems ok to me. I hate to see the BUG() added but I guess in this case
something truly bizarre would have to happen for the ilock to fail on
this inode.

on irc you sugggested ASSERT(0); instead of BUG(); I might prefer that
but either way:

Reviewed-by: Eric Sandeen <[email protected]>

>
> Index: xfs/fs/xfs/xfs_iget.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_iget.c 2009-02-24 20:56:00.716027739 +0100
> +++ xfs/fs/xfs/xfs_iget.c 2009-02-24 20:56:46.089031360 +0100
> @@ -246,9 +246,6 @@ xfs_iget_cache_miss(
> goto out_destroy;
> }
>
> - if (lock_flags)
> - xfs_ilock(ip, lock_flags);
> -
> /*
> * Preload the radix tree so we can insert safely under the
> * write spinlock. Note that we cannot sleep inside the preload
> @@ -259,6 +256,15 @@ xfs_iget_cache_miss(
> goto out_unlock;
> }
>
> + /*
> + * Because the inode hasn't been added to the radix-tree yet it can't
> + * be found by another thread, so we can do the non-sleeping lock here.
> + */
> + if (lock_flags) {
> + if (!xfs_ilock_nowait(ip, lock_flags))
> + BUG();
> + }
> +
> mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1);
> first_index = agino & mask;
> write_lock(&pag->pag_ici_lock);
>
> _______________________________________________
> xfs mailing list
> [email protected]
> http://oss.sgi.com/mailman/listinfo/xfs
>

2009-03-03 16:57:20

by Felix Blyakher

[permalink] [raw]

Subject: Re: next-20090220: XFS: inconsistent lock state

On Mar 3, 2009, at 10:00 AM, Eric Sandeen wrote:

> Christoph Hellwig wrote:
>> On Fri, Feb 20, 2009 at 08:52:59PM +0300, Alexander Beregalov wrote:
>>> Hi
>>>
>>> [ INFO: inconsistent lock state ]
>>> 2.6.29-rc5-next-20090220 #2
>>> ---------------------------------
>>> inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
>>> kswapd0/324 [HC0[0]:SC0[0]:HE1:SE1] takes:
>>> (&(&ip->i_lock)->mr_lock){+++++?}, at: [<ffffffff803ca60a>]
>>> xfs_ilock+0xaa/0x120
>>> {RECLAIM_FS-ON-W} state was registered at:
>>
>> That's a false positive. While the ilock can be taken in reclaim the
>> allocation here is done before the inode is added to the inode cache.
>>
>> The patch below should help avoiding the warning:
>
> Seems ok to me. I hate to see the BUG() added but I guess in this
> case
> something truly bizarre would have to happen for the ilock to fail on
> this inode.
>
> on irc you sugggested ASSERT(0); instead of BUG();

That would mean that instead of bombing out here, we do it
in xfs debug kernels only, which is a good thing. However,
do we just silently ignore it in non debug kernels, and
later try to unlock without locking first?
Maybe the following be better:

if (lock_flags) {
if (!xfs_ilock_nowait(ip, lock_flags)) {
ASSERT(0);
error = EAGAIN;
goto out_destroy;
}
}

Or just keep the BUG(); , as it shouldn't happen (we hope).

Reviewed-by: Felix Blyakher <[email protected]>

> I might prefer that
> but either way:
>
> Reviewed-by: Eric Sandeen <[email protected]>
>
>>
>> Index: xfs/fs/xfs/xfs_iget.c
>> ===================================================================
>> --- xfs.orig/fs/xfs/xfs_iget.c 2009-02-24 20:56:00.716027739 +0100
>> +++ xfs/fs/xfs/xfs_iget.c 2009-02-24 20:56:46.089031360 +0100
>> @@ -246,9 +246,6 @@ xfs_iget_cache_miss(
>> goto out_destroy;
>> }
>>
>> - if (lock_flags)
>> - xfs_ilock(ip, lock_flags);
>> -
>> /*
>> * Preload the radix tree so we can insert safely under the
>> * write spinlock. Note that we cannot sleep inside the preload
>> @@ -259,6 +256,15 @@ xfs_iget_cache_miss(
>> goto out_unlock;
>> }
>>
>> + /*
>> + * Because the inode hasn't been added to the radix-tree yet it
>> can't
>> + * be found by another thread, so we can do the non-sleeping lock
>> here.
>> + */
>> + if (lock_flags) {
>> + if (!xfs_ilock_nowait(ip, lock_flags))
>> + BUG();
>> + }
>> +
>> mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) -
>> 1);
>> first_index = agino & mask;
>> write_lock(&pag->pag_ici_lock);
>>
>> _______________________________________________
>> xfs mailing list
>> [email protected]
>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-03-03 17:02:59

by Christoph Hellwig

[permalink] [raw]

Subject: Re: next-20090220: XFS: inconsistent lock state

On Tue, Mar 03, 2009 at 10:57:07AM -0600, Felix Blyakher wrote:
> if (lock_flags) {
> if (!xfs_ilock_nowait(ip, lock_flags)) {
> ASSERT(0);
> error = EAGAIN;
> goto out_destroy;
> }
> }
>
> Or just keep the BUG(); , as it shouldn't happen (we hope).

Ok, let's keep the BUG for now and I'll throw in your error undwinding
fix. Will resend the series for 2.6.29 patches after QAing them.

2009-03-06 09:31:17

by Alexander Beregalov

[permalink] [raw]

Subject: Re: next-20090220: XFS: inconsistent lock state

Hi

Is it the same issue?

[ INFO: inconsistent lock state ]
2.6.29-rc7-next-20090305 #8
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
kswapd0/318 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&(&ip->i_lock)->mr_lock){+++++?}, at: [<ffffffff803cbc4a>]
xfs_ilock+0xaa/0x120
{RECLAIM_FS-ON-W} state was registered at:
[<ffffffff8026db49>] mark_held_locks+0x69/0x90
[<ffffffff8026dbb1>] lockdep_trace_alloc+0x41/0xb0
[<ffffffff802c6cbd>] kmem_cache_alloc+0x2d/0x100
[<ffffffff803f37e7>] kmem_zone_alloc+0x97/0xe0
[<ffffffff803f3849>] kmem_zone_zalloc+0x19/0x50
[<ffffffff803b4955>] xfs_da_state_alloc+0x15/0x20
[<ffffffff803c0387>] xfs_dir2_node_lookup+0x17/0x110
[<ffffffff803b9688>] xfs_dir_lookup+0x1c8/0x1e0
[<ffffffff803f08af>] xfs_lookup+0x4f/0xe0
[<ffffffff803fcb99>] xfs_vn_lookup+0x49/0x90
[<ffffffff802d5556>] do_lookup+0x1b6/0x250
[<ffffffff802d5885>] __link_path_walk+0x295/0xec0
[<ffffffff802d66ee>] path_walk+0x6e/0xe0
[<ffffffff802d6866>] do_path_lookup+0xa6/0x1d0
[<ffffffff802d6a65>] path_lookup_open+0x65/0xd0
[<ffffffff802d796a>] do_filp_open+0xaa/0x8f0
[<ffffffff802c8a48>] do_sys_open+0x78/0x110
[<ffffffff802c8b0b>] sys_open+0x1b/0x20
[<ffffffff8020bbdb>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
irq event stamp: 531011
hardirqs last enabled at (531011): [<ffffffff80284194>]
__rcu_read_unlock+0xa4/0xc0
hardirqs last disabled at (531010): [<ffffffff80284149>]
__rcu_read_unlock+0x59/0xc0
softirqs last enabled at (524334): [<ffffffff80249719>]
__do_softirq+0x139/0x150
softirqs last disabled at (524329): [<ffffffff8020cd5c>] call_softirq+0x1c/0x30

other info that might help us debug this:
2 locks held by kswapd0/318:
#0: (shrinker_rwsem){++++..}, at: [<ffffffff802a4c12>] shrink_slab+0x32/0x1c0
#1: (iprune_mutex){+.+.-.}, at: [<ffffffff802dfbc4>]
shrink_icache_memory+0x84/0x2a0

stack backtrace:
Pid: 318, comm: kswapd0 Not tainted 2.6.29-rc7-next-20090305 #8
Call Trace:
[<ffffffff8026ca5d>] print_usage_bug+0x17d/0x190
[<ffffffff8026d9ad>] mark_lock+0x31d/0x450
[<ffffffff8026cbd0>] ? check_usage_forwards+0x0/0xc0
[<ffffffff8026ebcd>] __lock_acquire+0x40d/0x12a0
[<ffffffff8026faf1>] lock_acquire+0x91/0xc0
[<ffffffff803cbc4a>] ? xfs_ilock+0xaa/0x120
[<ffffffff8025f7d0>] down_read_nested+0x50/0x90
[<ffffffff803cbc4a>] ? xfs_ilock+0xaa/0x120
[<ffffffff803cbc4a>] xfs_ilock+0xaa/0x120
[<ffffffff803f09c4>] xfs_free_eofblocks+0x84/0x280
[<ffffffff8026ea8c>] ? __lock_acquire+0x2cc/0x12a0
[<ffffffff803f144e>] xfs_inactive+0xee/0x540
[<ffffffff803fea57>] xfs_fs_clear_inode+0x67/0x70
[<ffffffff802df7fa>] clear_inode+0x9a/0x120
[<ffffffff802dfa60>] dispose_list+0x30/0x110
[<ffffffff802dfd88>] shrink_icache_memory+0x248/0x2a0
[<ffffffff802a4d3c>] shrink_slab+0x15c/0x1c0
[<ffffffff802a6cba>] kswapd+0x56a/0x6b0
[<ffffffff802374b6>] ? finish_task_switch+0x46/0x110
[<ffffffff802a4120>] ? isolate_pages_global+0x0/0x270
[<ffffffff8025b450>] ? autoremove_wake_function+0x0/0x40
[<ffffffff802a6750>] ? kswapd+0x0/0x6b0
[<ffffffff8025af66>] kthread+0x56/0x90
[<ffffffff8020cc5a>] child_rip+0xa/0x20
[<ffffffff802374f9>] ? finish_task_switch+0x89/0x110
[<ffffffff8063da36>] ? _spin_unlock_irq+0x36/0x60
[<ffffffff8020c640>] ? restore_args+0x0/0x30
[<ffffffff8025af10>] ? kthread+0x0/0x90
[<ffffffff8020cc50>] ? child_rip+0x0/0x20