2006-05-30 10:22:31

by Janos Haar

[permalink] [raw]
Subject: Re: How to send a break? - dump from frozen 64bit linux

[cut]

>
> >
> > 2) You should try the latest stable kernel. Currently that's 2.6.16.18
> > (http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.16.18.tar.bz2).
> > There have been lots of fixes added since 2.6.15.x and perhaps you are
> > lucky that whatever is giving you trouble has already been fixed in
> > that kernel.
>

This time i try the 2.6.16.18 kernel, but the issue is the same!

Here is the logs:
http://download.netcenter.hu/bughunt/20060530/dump.txt (The frozen system,
540KB)
http://download.netcenter.hu/bughunt/20060530/261618-good.txt (After
reboot, the working system, 300KB, uptime 54 min)
http://download.netcenter.hu/bughunt/20060530/dmesg.txt (The boot dmesg
file)

Can somebody tell me, whats wrong?

It seems like some part of the fs died.
(The "top", "watch df" hangs on the ssh window, in the "mc" the line is
moving, but if i try to step in/out from/to dir, it hangs too, ping reply is
working. )

I use only 3 fs:
- the root FS on NFS.
- one XFS mount point from sata drive (200GB)
- one huge XFS mount point from NBD. (14TB)

Cheers,
Janos


2006-05-30 19:04:04

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: How to send a break? - dump from frozen 64bit linux

On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:

> http://download.netcenter.hu/bughunt/20060530/dump.txt (The frozen system,
> 540KB)

> Can somebody tell me, whats wrong?

kblockd/1 D ffff81011f641778 0 25 19 26 24 (L-TLB)
ffff81011f641778 0000000000000000 0000000000000009 ffff81011f735358
ffff81011f735140 ffff81011fc79100 000014a00f9a0ef2 00000000000410dd
0000000102866d40 ffff810003900280
Call Trace: <ffffffff8026d72a>{xfs_qm_shake+135} <ffffffff804e6046>{__mutex_lock_slowpath+424}
<ffffffff804e62e4>{mutex_lock+41} <ffffffff8026d72a>{xfs_qm_shake+135}
<ffffffff80157cfd>{shrink_slab+100} <ffffffff801584d9>{try_to_free_pages+372}
<ffffffff80153c3f>{__alloc_pages+432} <ffffffff8046aef3>{tcp_sendmsg+1373}
<ffffffff804848ad>{inet_sendmsg+70} <ffffffff8043f619>{sock_sendmsg+270}
<ffffffff8013d3e0>{autoremove_wake_function+0} <ffffffff80440db3>{kernel_sendmsg+61}
<ffffffff8802c111>{:nbd:sock_xmit+273} <ffffffff8015195d>{mempool_alloc_slab+17}
<ffffffff80169b1b>{poison_obj+39} <ffffffff8015195d>{mempool_alloc_slab+17}
<ffffffff80169c11>{cache_alloc_debugcheck_after+235}
<ffffffff8015195d>{mempool_alloc_slab+17} <ffffffff802da471>{as_remove_queued_request+267}
<ffffffff8802c472>{:nbd:nbd_send_req+517} <ffffffff8802c712>{:nbd:do_nbd_request+329}
<ffffffff802d9b45>{as_work_handler+46} <ffffffff80139d30>{run_workqueue+168}
<ffffffff802d9b17>{as_work_handler+0} <ffffffff8013a27f>{worker_thread+0}
<ffffffff8013a383>{worker_thread+260} <ffffffff80123fa4>{default_wake_function+0}
<ffffffff8013a27f>{worker_thread+0} <ffffffff8013d29f>{kthread+219}
<ffffffff8012590d>{schedule_tail+70} <ffffffff8010bba6>{child_rip+8}
<ffffffff8013d1c4>{kthread+0} <ffffffff8010bb9e>{child_rip+0}

Half the processes on the box seem wedged at that same mutex_lock. I can't
seem to find an xfs_qm_shake in my source tree though.


Attachments:
(No filename) (226.00 B)

2006-05-31 01:20:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: How to send a break? - dump from frozen 64bit linux

Added all those listed in the MAINTAINERS file for XFS.

On Tue, 2006-05-30 at 15:03 -0400, [email protected] wrote:
> On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:
>
> > http://download.netcenter.hu/bughunt/20060530/dump.txt (The frozen system,
> > 540KB)
>
> > Can somebody tell me, whats wrong?
>
> kblockd/1 D ffff81011f641778 0 25 19 26 24 (L-TLB)
> ffff81011f641778 0000000000000000 0000000000000009 ffff81011f735358
> ffff81011f735140 ffff81011fc79100 000014a00f9a0ef2 00000000000410dd
> 0000000102866d40 ffff810003900280
> Call Trace: <ffffffff8026d72a>{xfs_qm_shake+135} <ffffffff804e6046>{__mutex_lock_slowpath+424}
> <ffffffff804e62e4>{mutex_lock+41} <ffffffff8026d72a>{xfs_qm_shake+135}
> <ffffffff80157cfd>{shrink_slab+100} <ffffffff801584d9>{try_to_free_pages+372}
> <ffffffff80153c3f>{__alloc_pages+432} <ffffffff8046aef3>{tcp_sendmsg+1373}
> <ffffffff804848ad>{inet_sendmsg+70} <ffffffff8043f619>{sock_sendmsg+270}
> <ffffffff8013d3e0>{autoremove_wake_function+0} <ffffffff80440db3>{kernel_sendmsg+61}
> <ffffffff8802c111>{:nbd:sock_xmit+273} <ffffffff8015195d>{mempool_alloc_slab+17}
> <ffffffff80169b1b>{poison_obj+39} <ffffffff8015195d>{mempool_alloc_slab+17}
> <ffffffff80169c11>{cache_alloc_debugcheck_after+235}
> <ffffffff8015195d>{mempool_alloc_slab+17} <ffffffff802da471>{as_remove_queued_request+267}
> <ffffffff8802c472>{:nbd:nbd_send_req+517} <ffffffff8802c712>{:nbd:do_nbd_request+329}
> <ffffffff802d9b45>{as_work_handler+46} <ffffffff80139d30>{run_workqueue+168}
> <ffffffff802d9b17>{as_work_handler+0} <ffffffff8013a27f>{worker_thread+0}
> <ffffffff8013a383>{worker_thread+260} <ffffffff80123fa4>{default_wake_function+0}
> <ffffffff8013a27f>{worker_thread+0} <ffffffff8013d29f>{kthread+219}
> <ffffffff8012590d>{schedule_tail+70} <ffffffff8010bba6>{child_rip+8}
> <ffffffff8013d1c4>{kthread+0} <ffffffff8010bb9e>{child_rip+0}
>
> Half the processes on the box seem wedged at that same mutex_lock. I can't
> seem to find an xfs_qm_shake in my source tree though.

What everyone is waiting for is being blocked here:

kswapd0 D ffff81011fe03c38 0 297 1 1287 19 (L-TLB)
ffff81011fe03c38 0000000000000004 000000000000000a ffff81011f92ba68
ffff81011f92b850 ffffffff805a23a0 0000149f99fa7d7c 000000000003bcde
000000002f2c46e0 ffff81008bc37180
Call Trace: <ffffffff804e5522>{schedule_timeout+34}
<ffffffff80269f87>{xfs_qm_dqunpin_wait+220} <ffffffff80140e74>{debug_mutex_free_waiter+141}
<ffffffff80123fa4>{default_wake_function+0} <ffffffff80268ca5>{xfs_qm_dqflush+70}
<ffffffff8026d7a7>{xfs_qm_shake+260} <ffffffff80157cfd>{shrink_slab+100}
<ffffffff8015801e>{balance_pgdat+559} <ffffffff801582e8>{kswapd+283}
<ffffffff8013d3e0>{autoremove_wake_function+0} <ffffffff804e6a80>{_spin_unlock_irq+9}
<ffffffff8012590d>{schedule_tail+70} <ffffffff8010bba6>{child_rip+8}
<ffffffff801581cd>{kswapd+0} <ffffffff8010bb9e>{child_rip+0}


Seems that the kswapd0 has the lock in questing and has put itself to
sleep waiting to be woken up. I don't know the xfs code very well, but
the kswapd0 seems to be in this function:

xfs_qm_dqunpin_wait(
xfs_dquot_t *dqp)
{
SPLDECL(s);

ASSERT(XFS_DQ_IS_LOCKED(dqp));
if (dqp->q_pincount == 0) {
return;
}

/*
* Give the log a push so we don't wait here too long.
*/
xfs_log_force(dqp->q_mount, (xfs_lsn_t)0, XFS_LOG_FORCE);
s = XFS_DQ_PINLOCK(dqp);
if (dqp->q_pincount == 0) {
XFS_DQ_PINUNLOCK(dqp, s);
return;
}
sv_wait(&(dqp->q_pinwait), PINOD,
&(XFS_DQ_TO_QINF(dqp)->qi_pinlock), s);
}


Where sv_wait is:

#define sv_wait(sv, pri, lock, s) \
_sv_wait(sv, lock, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)

And our macro hell goes further ...

static inline void _sv_wait(sv_t *sv, spinlock_t *lock, int state,
unsigned long timeout)
{
DECLARE_WAITQUEUE(wait, current);

add_wait_queue_exclusive(&sv->waiters, &wait);
__set_current_state(state);
spin_unlock(lock);

schedule_timeout(timeout);

remove_wait_queue(&sv->waiters, &wait);
}


So it is now waiting to be woken up by something that calls:

xfs_qm_dquot_logitem_unpin which seems to be the function to wake it
up.

And decyphering all the macro crap it seems that the function that wakes
it up is xfs_trans_chunk_committed, or xfs_trans_uncommit.


The above xfs_qm_dqunpin_wait still looks awfully racy, and the
xfs_log_force, which I'm assuming wakes up whoever is suppose to wake up
kswapd0, doesn't have a return code check. So if it failed to do
whatever the hell it's doing (that code gives me a headache), it looks
like this guy might sleep forever holding a lock that will prevent
others from freeing kernel memory.

Well that's about all I can figure out.

Good luck,

-- Steve


2006-05-31 04:39:10

by Nathan Scott

[permalink] [raw]
Subject: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)

On Tue, May 30, 2006 at 09:20:31PM -0400, Steven Rostedt wrote:
> Added all those listed in the MAINTAINERS file for XFS.

Thanks Steve.

> On Tue, 2006-05-30 at 15:03 -0400, [email protected] wrote:
> > On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:
> > Half the processes on the box seem wedged at that same mutex_lock. I can't
> > seem to find an xfs_qm_shake in my source tree though.

Its in fs/xfs/quota/xfs_qm.c.

> kswapd0 D ffff81011fe03c38 0 297 1 1287 19 (L-TLB)
> ffff81011fe03c38 0000000000000004 000000000000000a ffff81011f92ba68
> ffff81011f92b850 ffffffff805a23a0 0000149f99fa7d7c 000000000003bcde
> 000000002f2c46e0 ffff81008bc37180
> Call Trace: <ffffffff804e5522>{schedule_timeout+34}
> <ffffffff80269f87>{xfs_qm_dqunpin_wait+220} <ffffffff80140e74>{debug_mutex_free_waiter+141}

So, we're waiting here on a synchronisation variable that'll
be released once the dquot metadata buffer write completes.

> So it is now waiting to be woken up by something that calls:
>
> xfs_qm_dquot_logitem_unpin which seems to be the function to wake it
> up.

Mhmm, that'd be called by the I/O completion handler on the buffer
containing that dquot.

> And decyphering all the macro crap it seems that the function that wakes
> it up is xfs_trans_chunk_committed, or xfs_trans_uncommit.

Right (the former, at this point in the code).

> The above xfs_qm_dqunpin_wait still looks awfully racy, and the
> xfs_log_force, which I'm assuming wakes up whoever is suppose to wake up
> kswapd0, doesn't have a return code check. So if it failed to do

The logforce isn't race-critical here - its ensuring writeout
of previously logged buffers is started before we go to sleep
waiting for the driver to wake us up when its done.

An earlier I/O error on the journal is the only thing the log
force can return as an error there, which isnt useful at that
point anyway (we're in a kernel thread trying to free mem).

> whatever the hell it's doing (that code gives me a headache), it looks

Heh, likewise. I have voodoo dolls of one or two of the early
XFS folks that I like to poke with needles occasionally.. :)

> like this guy might sleep forever holding a lock that will prevent
> others from freeing kernel memory.

It will sleep until the previously initiated buffer write is done.
AFAICT, we aren't seeing the I/O completion here for some reason...
which points more to a possible device driver or h/ware issue (that
is the usual root cause of this sort of hang, anyway).

cheers.

--
Nathan

2006-05-31 08:01:06

by Janos Haar

[permalink] [raw]
Subject: Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)


----- Original Message -----
From: "Nathan Scott" <[email protected]>
To: "Janos Haar" <[email protected]>
Cc: <[email protected]>; <[email protected]>
Sent: Wednesday, May 31, 2006 6:38 AM
Subject: XFS related hang (was Re: How to send a break? - dump from frozen
64bit linux)


> On Tue, May 30, 2006 at 09:20:31PM -0400, Steven Rostedt wrote:
> > Added all those listed in the MAINTAINERS file for XFS.
>
> Thanks Steve.
>
> > On Tue, 2006-05-30 at 15:03 -0400, [email protected] wrote:
> > > On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:
> > > Half the processes on the box seem wedged at that same mutex_lock. I
can't
> > > seem to find an xfs_qm_shake in my source tree though.
>
> Its in fs/xfs/quota/xfs_qm.c.
>
> > kswapd0 D ffff81011fe03c38 0 297 1 1287 19
(L-TLB)
> > ffff81011fe03c38 0000000000000004 000000000000000a ffff81011f92ba68
> > ffff81011f92b850 ffffffff805a23a0 0000149f99fa7d7c
000000000003bcde
> > 000000002f2c46e0 ffff81008bc37180
> > Call Trace: <ffffffff804e5522>{schedule_timeout+34}
> > <ffffffff80269f87>{xfs_qm_dqunpin_wait+220}
<ffffffff80140e74>{debug_mutex_free_waiter+141}
>
> So, we're waiting here on a synchronisation variable that'll
> be released once the dquot metadata buffer write completes.
>
> > So it is now waiting to be woken up by something that calls:
> >
> > xfs_qm_dquot_logitem_unpin which seems to be the function to wake it
> > up.
>
> Mhmm, that'd be called by the I/O completion handler on the buffer
> containing that dquot.
>
> > And decyphering all the macro crap it seems that the function that wakes
> > it up is xfs_trans_chunk_committed, or xfs_trans_uncommit.
>
> Right (the former, at this point in the code).
>
> > The above xfs_qm_dqunpin_wait still looks awfully racy, and the
> > xfs_log_force, which I'm assuming wakes up whoever is suppose to wake up
> > kswapd0, doesn't have a return code check. So if it failed to do
>
> The logforce isn't race-critical here - its ensuring writeout
> of previously logged buffers is started before we go to sleep
> waiting for the driver to wake us up when its done.
>
> An earlier I/O error on the journal is the only thing the log
> force can return as an error there, which isnt useful at that
> point anyway (we're in a kernel thread trying to free mem).
>
> > whatever the hell it's doing (that code gives me a headache), it looks
>
> Heh, likewise. I have voodoo dolls of one or two of the early
> XFS folks that I like to poke with needles occasionally.. :)
>
> > like this guy might sleep forever holding a lock that will prevent
> > others from freeing kernel memory.
>
> It will sleep until the previously initiated buffer write is done.
> AFAICT, we aren't seeing the I/O completion here for some reason...
> which points more to a possible device driver or h/ware issue (that
> is the usual root cause of this sort of hang, anyway).
>
> cheers.

Hey, i think i found something.
My quota on my huge device is broken.
(inferno -- 18014398504855404 0 0 18446744073709551519
0 0)
I cant found a way to re-initialize it.
But anyway, at this point i dont need it, trying to disable the quota usage.
We will see....

Thanks a lot!

Janos

>
> --
> Nathan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-05-31 21:55:14

by Jan Engelhardt

[permalink] [raw]
Subject: Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)

>
>Hey, i think i found something.
>My quota on my huge device is broken.

That should not be a problem. I ran into that "problem" too but had no
lockups back then (2.6.16-rc1).

>(inferno -- 18014398504855404 0 0 18446744073709551519
>0 0)
>I cant found a way to re-initialize it.

Reinit:

quotaoff /mntpt
umount /mntpt
mount /mntpt

>But anyway, at this point i dont need it, trying to disable the quota usage.
>We will see....


Jan Engelhardt
--