2008-01-07 16:18:40

by Brice Figureau

[permalink] [raw]
Subject: Strange freeze on 2.6.22 (deadlock?)

Hi,

I'm seeing a strange complete server freeze/lock-up on an bi-Xeon HT
amd64 server running standard debian 2.6.22 (and before that vanilla
2.6.19.x and 2.6.20.x which exhibited the same issue).

I'm only reporting it now, since I could get a full sysrq-t only this
morning.

The symptoms are that every 5 to 7 days, the server (which acts as a MX
along with a few low traffic websites) locks-up. The ipmi watchdog is
unable to reboot the server (and doesn't even trigger, since there is no
evidence in the esmlog), the machine is still pingable. I can't ssh to
it, but I can enter my login & password on a serial console, but no
shell is started.

Pressing sysrq-t produced the trace hosted here:
http://www.daysofwonder.com/bug/crash-server1.txt.gz

It happened one time when I was connected to the server through ssh and
I could see that the load started to increase well above 100. It was
then impossible to launch new process from the command-line (and I had
to reboot manually).
It happened also last week, and the server was stuck for about 6 hours.
When I started investigating what was wrong, it slowly came back to life
(with an avg 1-min load of more than 1500, and tons of cron processes
running in parallel).

I'm not really familiar with kernel development so I can't really find
the issue in the aforementioned trace output.
What I think is that for some reason there is a race/deadlock that
finally prevents new processes to really start (which in turns produces
the high load).

What seems suspect in the aforementioned trace is:
*) lot of processes stacktrace ends in __mod_timer+0xc3/0xd3
which seems to be this line from kernel/timer.c

415 timer->expires = expires;
416 internal_add_timer(base, timer);
--> spin_unlock_irqrestore(&base->lock, flags);

419 return ret;
420 }

*) lot of processes stacktrace ends in __mutex_lock_slowpath and/or zone_statistics

Anyway, I will soon reboot to a 2.6.23.x to see if that symptom
persists.
More information (config, server specs) are available on request.

I'm not subscribed to the list, so please CC: me for any anwser.
Many thanks,
--
Brice Figureau <[email protected]>


2008-01-07 17:22:41

by Randy Dunlap

[permalink] [raw]
Subject: Re: Strange freeze on 2.6.22 (deadlock?)

On Mon, 07 Jan 2008 16:48:54 +0100 Brice Figureau wrote:

> Hi,
>
> I'm seeing a strange complete server freeze/lock-up on an bi-Xeon HT
> amd64 server running standard debian 2.6.22 (and before that vanilla
> 2.6.19.x and 2.6.20.x which exhibited the same issue).
>
> I'm only reporting it now, since I could get a full sysrq-t only this
> morning.
>
> The symptoms are that every 5 to 7 days, the server (which acts as a MX
> along with a few low traffic websites) locks-up. The ipmi watchdog is
> unable to reboot the server (and doesn't even trigger, since there is no
> evidence in the esmlog), the machine is still pingable. I can't ssh to
> it, but I can enter my login & password on a serial console, but no
> shell is started.
>
> Pressing sysrq-t produced the trace hosted here:
> http://www.daysofwonder.com/bug/crash-server1.txt.gz
>
> It happened one time when I was connected to the server through ssh and
> I could see that the load started to increase well above 100. It was
> then impossible to launch new process from the command-line (and I had
> to reboot manually).
> It happened also last week, and the server was stuck for about 6 hours.
> When I started investigating what was wrong, it slowly came back to life
> (with an avg 1-min load of more than 1500, and tons of cron processes
> running in parallel).
>
> I'm not really familiar with kernel development so I can't really find
> the issue in the aforementioned trace output.
> What I think is that for some reason there is a race/deadlock that
> finally prevents new processes to really start (which in turns produces
> the high load).
>
> What seems suspect in the aforementioned trace is:
> *) lot of processes stacktrace ends in __mod_timer+0xc3/0xd3
> which seems to be this line from kernel/timer.c
>
> 415 timer->expires = expires;
> 416 internal_add_timer(base, timer);
> --> spin_unlock_irqrestore(&base->lock, flags);
>
> 419 return ret;
> 420 }
>
> *) lot of processes stacktrace ends in __mutex_lock_slowpath and/or zone_statistics

There are also lots of processes in D state (usually waiting
for I/O to complete). And jbd is in their stack traces.

How is/are the ext3 filesystems mounted? I mean what data=xyz
mode? data=journal (the heaviest duty mode) has at least one
known deadlock. If you are using data=journal, you could try
switching to data=ordered...


> Anyway, I will soon reboot to a 2.6.23.x to see if that symptom
> persists.
> More information (config, server specs) are available on request.
>
> I'm not subscribed to the list, so please CC: me for any anwser.
> Many thanks,


---
~Randy

2008-01-07 19:06:21

by Brice Figureau

[permalink] [raw]
Subject: Re: Strange freeze on 2.6.22 (deadlock?)


On Mon, January 7, 2008 18:20, Randy Dunlap wrote:
> On Mon, 07 Jan 2008 16:48:54 +0100 Brice Figureau wrote:
>
>> Hi,
>>
>> I'm seeing a strange complete server freeze/lock-up on an bi-Xeon HT
>> amd64 server running standard debian 2.6.22 (and before that vanilla
>> 2.6.19.x and 2.6.20.x which exhibited the same issue).
>>
>> I'm only reporting it now, since I could get a full sysrq-t only this
>> morning.
>>
>> The symptoms are that every 5 to 7 days, the server (which acts as a MX
>> along with a few low traffic websites) locks-up. The ipmi watchdog is
>> unable to reboot the server (and doesn't even trigger, since there is no
>> evidence in the esmlog), the machine is still pingable. I can't ssh to
>> it, but I can enter my login & password on a serial console, but no
>> shell is started.
>>
>> Pressing sysrq-t produced the trace hosted here:
>> http://www.daysofwonder.com/bug/crash-server1.txt.gz
>>
>> It happened one time when I was connected to the server through ssh and
>> I could see that the load started to increase well above 100. It was
>> then impossible to launch new process from the command-line (and I had
>> to reboot manually).
>> It happened also last week, and the server was stuck for about 6 hours.
>> When I started investigating what was wrong, it slowly came back to life
>> (with an avg 1-min load of more than 1500, and tons of cron processes
>> running in parallel).
>>
>> I'm not really familiar with kernel development so I can't really find
>> the issue in the aforementioned trace output.
>> What I think is that for some reason there is a race/deadlock that
>> finally prevents new processes to really start (which in turns produces
>> the high load).
>>
>> What seems suspect in the aforementioned trace is:
>> *) lot of processes stacktrace ends in __mod_timer+0xc3/0xd3
>> which seems to be this line from kernel/timer.c
>>
>> 415 timer->expires = expires;
>> 416 internal_add_timer(base, timer);
>> --> spin_unlock_irqrestore(&base->lock, flags);
>>
>> 419 return ret;
>> 420 }
>>
>> *) lot of processes stacktrace ends in __mutex_lock_slowpath and/or
>> zone_statistics
>
> There are also lots of processes in D state (usually waiting
> for I/O to complete). And jbd is in their stack traces.
>
> How is/are the ext3 filesystems mounted? I mean what data=xyz
> mode? data=journal (the heaviest duty mode) has at least one
> known deadlock. If you are using data=journal, you could try
> switching to data=ordered...

Thanks for the answer.
I'm using whatever is the default mount option (which I think is
data=ordered). The only other mount option I use is nodiratime,noatime.

Note that a large part of the processes in D state are "waiting" in
__mutex_lock from generic_file_aio_write.
Another large part is coming from balance_dirty_pages_ratelimited_nr.

It seems that there was some writeback congestion to the block device.
All the /proc/sys/vm/* files are at their defaults.

This looks like if it wasn't possible to write to the block device anymore.
Could a block device write error (ie hardware failure) be the root cause?

Any other idea?
What should I try the next time it freezes?
--
Brice Figureau

2008-01-07 21:14:38

by Brice Figureau

[permalink] [raw]
Subject: Re: Strange freeze on 2.6.22 (deadlock?)


On Mon, January 7, 2008 20:06, Brice Figureau wrote:
>
> On Mon, January 7, 2008 18:20, Randy Dunlap wrote:
>> On Mon, 07 Jan 2008 16:48:54 +0100 Brice Figureau wrote:
>>
>>> Hi,
>>>
>>> I'm seeing a strange complete server freeze/lock-up on an bi-Xeon HT
>>> amd64 server running standard debian 2.6.22 (and before that vanilla
>>> 2.6.19.x and 2.6.20.x which exhibited the same issue).
>>>
>>> I'm only reporting it now, since I could get a full sysrq-t only this
>>> morning.
>>>
>>> The symptoms are that every 5 to 7 days, the server (which acts as a MX
>>> along with a few low traffic websites) locks-up. The ipmi watchdog is
>>> unable to reboot the server (and doesn't even trigger, since there is
>>> no
>>> evidence in the esmlog), the machine is still pingable. I can't ssh to
>>> it, but I can enter my login & password on a serial console, but no
>>> shell is started.
>>>
>>> Pressing sysrq-t produced the trace hosted here:
>>> http://www.daysofwonder.com/bug/crash-server1.txt.gz
>>>
>>> It happened one time when I was connected to the server through ssh and
>>> I could see that the load started to increase well above 100. It was
>>> then impossible to launch new process from the command-line (and I had
>>> to reboot manually).
>>> It happened also last week, and the server was stuck for about 6 hours.
>>> When I started investigating what was wrong, it slowly came back to
>>> life
>>> (with an avg 1-min load of more than 1500, and tons of cron processes
>>> running in parallel).
>>>
>>> I'm not really familiar with kernel development so I can't really find
>>> the issue in the aforementioned trace output.
>>> What I think is that for some reason there is a race/deadlock that
>>> finally prevents new processes to really start (which in turns produces
>>> the high load).
>>>
>>> What seems suspect in the aforementioned trace is:
>>> *) lot of processes stacktrace ends in __mod_timer+0xc3/0xd3
>>> which seems to be this line from kernel/timer.c
>>>
>>> 415 timer->expires = expires;
>>> 416 internal_add_timer(base, timer);
>>> --> spin_unlock_irqrestore(&base->lock, flags);
>>>
>>> 419 return ret;
>>> 420 }
>>>
>>> *) lot of processes stacktrace ends in __mutex_lock_slowpath and/or
>>> zone_statistics
>>
>> There are also lots of processes in D state (usually waiting
>> for I/O to complete). And jbd is in their stack traces.
>>
>> How is/are the ext3 filesystems mounted? I mean what data=xyz
>> mode? data=journal (the heaviest duty mode) has at least one
>> known deadlock. If you are using data=journal, you could try
>> switching to data=ordered...
>
> Thanks for the answer.
> I'm using whatever is the default mount option (which I think is
> data=ordered). The only other mount option I use is nodiratime,noatime.

I'm sorry, I stand corrected, there is a data=journal option but not on a
mount point, as an attribute of a bunch of directories (postfix queues):
# lsattr /var/spool/postfix/
-------A-----j---- /var/spool/postfix/active
-------A-----j---- /var/spool/postfix/bounce
-------A-----j---- /var/spool/postfix/corrupt
-------A-----j---- /var/spool/postfix/defer
-------A-----j---- /var/spool/postfix/deferred
-------A-----j---- /var/spool/postfix/flush
-------A-----j---- /var/spool/postfix/hold
-------A-----j---- /var/spool/postfix/incoming
-------A-----j---- /var/spool/postfix/private
-------A-----j---- /var/spool/postfix/maildrop
-------A-----j---- /var/spool/postfix/public
-------A-----j---- /var/spool/postfix/pid
-------A-----j---- /var/spool/postfix/saved
-------A-----j---- /var/spool/postfix/trace

The idea was that queue file are short-lived files and thus they could
only be stored in the journal and never hit the regular data area.

So I might have hit the deadlock you were talking about.
I just disabled this setup, we'll see if that changes anything.

Note that I was using this setup since 2.4 time, the problem appeared only
a few kernel revisions ago (around 2.6.18 I think).

Thanks
--
Brice Figureau

2008-01-08 23:18:17

by Chuck Ebbert

[permalink] [raw]
Subject: Re: Strange freeze on 2.6.22 (deadlock?)

On 01/07/2008 02:06 PM, Brice Figureau wrote:

> Thanks for the answer.
> I'm using whatever is the default mount option (which I think is
> data=ordered). The only other mount option I use is nodiratime,noatime.
>
> Note that a large part of the processes in D state are "waiting" in
> __mutex_lock from generic_file_aio_write.
> Another large part is coming from balance_dirty_pages_ratelimited_nr.
>
> It seems that there was some writeback congestion to the block device.
> All the /proc/sys/vm/* files are at their defaults.
>
> This looks like if it wasn't possible to write to the block device anymore.
> Could a block device write error (ie hardware failure) be the root cause?
>
> Any other idea?
> What should I try the next time it freezes?

Same bug as http://lkml.org/lkml/2007/8/1/469 ??

2008-01-09 16:02:28

by Brice Figureau

[permalink] [raw]
Subject: Re: Strange freeze on 2.6.22 (deadlock?)

Hi,

On Tue, 2008-01-08 at 18:16 -0500, Chuck Ebbert wrote:
> On 01/07/2008 02:06 PM, Brice Figureau wrote:
>
> > Thanks for the answer.
> > I'm using whatever is the default mount option (which I think is
> > data=ordered). The only other mount option I use is nodiratime,noatime.
> >
> > Note that a large part of the processes in D state are "waiting" in
> > __mutex_lock from generic_file_aio_write.
> > Another large part is coming from balance_dirty_pages_ratelimited_nr.
> >
> > It seems that there was some writeback congestion to the block device.
> > All the /proc/sys/vm/* files are at their defaults.
> >
> > This looks like if it wasn't possible to write to the block device anymore.
> > Could a block device write error (ie hardware failure) be the root cause?
> >
> > Any other idea?
> > What should I try the next time it freezes?
>
> Same bug as http://lkml.org/lkml/2007/8/1/469 ??

Some of the stacktrace of the OP looks like the ones I posted.
The timeframe also matches since I noticed the first problems in 2.6.19
like the OP.

I remounted the fs without noatime and we'll see if it still freezes
next week.

Unfortunately, there doesn't seem to be any bug fix for this problem yet
(except the upcoming per device write throttling patch).

Thanks for pointing me in this direction,
--
Brice Figureau <[email protected]>