2010-11-09 05:07:21

by Luke Hutchison

[permalink] [raw]
Subject: "BUG: soft lockup - CPU#0 stuck for 61s! [kswapd0:184]"

Hi,

I just wanted to report a bug upstream that is affecting the latest
versions of at least both Fedora and Ubuntu. CPUs somehow lock up
under load, producing errors of the form "BUG: soft lockup - CPU#0
stuck for 61s! [kswapd0:184]"

The Fedora Bug report is here:
https://bugzilla.redhat.com/show_bug.cgi?id=649694 -- however you can
find lots of references to the error message on other distributions
(including Ubuntu) by googling "bug soft lockup cpu stuck".

Lockups seem to happen on server-class hardware under heavy loads when
the machine is swapping. This can lead to the entire machine locking
up in some reported cases (although so far only individual CPUs seem
to have locked up in my case, not the entire machine). The point at
which the CPU hangs varies -- see the dmesg output I attached to the
Fedora bug report above.

My machine is a 12-way Xeon X5680 system with ext3, AFS and XFS
filesystems (XFS is running on hardware RAID). Please let me know if
you need other info that would be helpful to diagnosing the problem.

Thank you,
Luke Hutchison


2010-11-09 05:33:17

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: "BUG: soft lockup - CPU#0 stuck for 61s! [kswapd0:184]"

> Hi,
>
> I just wanted to report a bug upstream that is affecting the latest
> versions of at least both Fedora and Ubuntu. CPUs somehow lock up
> under load, producing errors of the form "BUG: soft lockup - CPU#0
> stuck for 61s! [kswapd0:184]"
>
> The Fedora Bug report is here:
> https://bugzilla.redhat.com/show_bug.cgi?id=649694 -- however you can
> find lots of references to the error message on other distributions
> (including Ubuntu) by googling "bug soft lockup cpu stuck".
>
> Lockups seem to happen on server-class hardware under heavy loads when
> the machine is swapping. This can lead to the entire machine locking
> up in some reported cases (although so far only individual CPUs seem
> to have locked up in my case, not the entire machine). The point at
> which the CPU hangs varies -- see the dmesg output I attached to the
> Fedora bug report above.
>
> My machine is a 12-way Xeon X5680 system with ext3, AFS and XFS
> filesystems (XFS is running on hardware RAID). Please let me know if
> you need other info that would be helpful to diagnosing the problem.

AFAIK, This isssue was already fixed by Mel.

http://kerneltrap.org/mailarchive/linux-kernel/2010/10/27/4637977


2010-11-09 05:51:23

by Luke Hutchison

[permalink] [raw]
Subject: Re: "BUG: soft lockup - CPU#0 stuck for 61s! [kswapd0:184]"

On Tue, Nov 9, 2010 at 12:33 AM, KOSAKI Motohiro
<[email protected]> wrote:
> AFAIK, This isssue was already fixed by Mel.
>
> http://kerneltrap.org/mailarchive/linux-kernel/2010/10/27/4637977

Yes, based on where the CPU lockups were occurring
(zone_nr_free_pages, zone_watermark_ok), this fix does seem to address
the problem I described. I assume the other lockup points
(_raw_spin_unlock_irqrestore, find_next_bit, sleeping_prematurely,
test_tsk_thread_flag) are also caused by the NR_FREE_PAGES problem?

Thank you for the link, I'll put it into the Fedora bug report and
hopefully a fix will be pushed out sometime soon.

Luke