2004-04-04 12:17:51

by Oleg Drokin

[permalink] [raw]
Subject: [2.4] NMI WD detected lockup during page alloc

Hello!

One of my servers started to experience mystic hangs after upgrade to
dual P4 Xeon (before that it was running on UP kernel) (HT enabled now).
So I enabled NMI watchdog and finally it triggered recently.
The kernel is 2.4.25+ (pulled from 2.4 bitkeeper tree on XX/XX, but
it seems related files in mm/ have not changed since at least January 2004
anyway).
So the HW is Duap P4-Xeon on some Intel-branded server (E7501-based or
something), 2G E?? RAM (highmem enabled).

That's what I got on the serial console:
NMI Watchdog detected LOCKUP on CPU2, eip c013b527, registers:
CPU: 2
EIP: 0010:[<c013b527>] Not tainted
EFLAGS: 00000086
eax: 00000000 ebx: c02dca38 ecx: 000048ce edx: c02dca38
esi: c02dca74 edi: 00000000 ebp: d34b1e5c esp: d34b1e30
ds: 0018 es: 0018 ss: 0018
Process mrtg (pid: 14663, stackpage=d34b1000)
Stack: 00038000 00000282 00000000 00015006 00015006 00000286 00000000 c02dca38
c02dca38 c02dcb38 00000002 d34b1ea0 c013adfa c0139395 d34b1ea0 00000202
c02dcaec 32353530 d34b1e7c c02dca38 c02dca38 c02dcb34 00000000 000001d2
Call Trace: [<c013adfa>] [<c0139395>] [<c012dc0d>] [<c012e6d7>] [<c0119330>
]
[<c014bca5>] [<c0159301>] [<c014ee56>] [<c014bd1b>] [<c0153b3b>] [<c0118f70>
]
[<c01076b0>]
Code: f3 90 7e f9 e9 11 f4 ff ff 80 3f 00 f3 90 7e f9 e9 8e fd ff
>>EIP; c013b527 <.text.lock.page_alloc+f/28> <=====
Trace; c013adfa <__alloc_pages+6a/270>
Trace; c0139395 <lru_cache_del+15/20>
Trace; c012dc0d <do_wp_page+6d/2e0>
Trace; c012e6d7 <handle_mm_fault+f7/110>
Trace; c0119330 <do_page_fault+3c0/586>
Trace; c014bca5 <cp_new_stat64+e5/110>
Trace; c0159301 <dput+31/190>
Trace; c014ee56 <path_release+16/40>
Trace; c014bd1b <sys_stat64+4b/80>
Trace; c0153b3b <sys_fcntl64+5b/c0>
Trace; c0118f70 <do_page_fault+0/586>
Trace; c01076b0 <error_code+34/3c>


So it seems it was blocked trying to take zone->lock in
mm/page_alloc.c::rmqueue()
The actual calltrace seems to be (lots of stale entries seems to be on
actual stack).

rmqueue
__alloc_pages+6a
do_wp_page+6d
handle_mm_fault+f7 (this is in fact handle_pte_fault())
do_page_fault+3c0
error_code+34

I fail to see a path where we can take lock on the same zone twice on same
CPU, so may be the zone structure was somehow corrupted (I do not have
spinlock debugging enabled yet). I do not think there are problems with
memory in that box that might explain this as well.
Probability of hangs vary over time, I got the first one on the next day after
upgrade (not even sure if it was the same as this one since I had no traces
from it), but this second one happened after 2-3 weeks of uptime.

May be it will help someone to find out what happens.

Bye,
Oleg


2004-04-05 21:07:05

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4] NMI WD detected lockup during page alloc

On Sun, Apr 04, 2004 at 03:17:56PM +0300, Oleg Drokin wrote:
> Hello!
>
> One of my servers started to experience mystic hangs after upgrade to
> dual P4 Xeon (before that it was running on UP kernel) (HT enabled now).
> So I enabled NMI watchdog and finally it triggered recently.
> The kernel is 2.4.25+ (pulled from 2.4 bitkeeper tree on XX/XX, but
> it seems related files in mm/ have not changed since at least January 2004
> anyway).
> So the HW is Duap P4-Xeon on some Intel-branded server (E7501-based or
> something), 2G E?? RAM (highmem enabled).
>
> That's what I got on the serial console:
> NMI Watchdog detected LOCKUP on CPU2, eip c013b527, registers:
> CPU: 2
> EIP: 0010:[<c013b527>] Not tainted
> EFLAGS: 00000086
> eax: 00000000 ebx: c02dca38 ecx: 000048ce edx: c02dca38
> esi: c02dca74 edi: 00000000 ebp: d34b1e5c esp: d34b1e30
> ds: 0018 es: 0018 ss: 0018
> Process mrtg (pid: 14663, stackpage=d34b1000)
> Stack: 00038000 00000282 00000000 00015006 00015006 00000286 00000000 c02dca38
> c02dca38 c02dcb38 00000002 d34b1ea0 c013adfa c0139395 d34b1ea0 00000202
> c02dcaec 32353530 d34b1e7c c02dca38 c02dca38 c02dcb34 00000000 000001d2
> Call Trace: [<c013adfa>] [<c0139395>] [<c012dc0d>] [<c012e6d7>] [<c0119330>
> ]
> [<c014bca5>] [<c0159301>] [<c014ee56>] [<c014bd1b>] [<c0153b3b>] [<c0118f70>
> ]
> [<c01076b0>]
> Code: f3 90 7e f9 e9 11 f4 ff ff 80 3f 00 f3 90 7e f9 e9 8e fd ff
> >>EIP; c013b527 <.text.lock.page_alloc+f/28> <=====
> Trace; c013adfa <__alloc_pages+6a/270>
> Trace; c0139395 <lru_cache_del+15/20>
> Trace; c012dc0d <do_wp_page+6d/2e0>
> Trace; c012e6d7 <handle_mm_fault+f7/110>
> Trace; c0119330 <do_page_fault+3c0/586>
> Trace; c014bca5 <cp_new_stat64+e5/110>
> Trace; c0159301 <dput+31/190>
> Trace; c014ee56 <path_release+16/40>
> Trace; c014bd1b <sys_stat64+4b/80>
> Trace; c0153b3b <sys_fcntl64+5b/c0>
> Trace; c0118f70 <do_page_fault+0/586>
> Trace; c01076b0 <error_code+34/3c>
>
>
> So it seems it was blocked trying to take zone->lock in
> mm/page_alloc.c::rmqueue()
> The actual calltrace seems to be (lots of stale entries seems to be on
> actual stack).
>
> rmqueue
> __alloc_pages+6a
> do_wp_page+6d
> handle_mm_fault+f7 (this is in fact handle_pte_fault())
> do_page_fault+3c0
> error_code+34
>
> I fail to see a path where we can take lock on the same zone twice on same
> CPU, so may be the zone structure was somehow corrupted (I do not have
> spinlock debugging enabled yet). I do not think there are problems with
> memory in that box that might explain this as well.

I also fail to see how zone->lock could be left locked. The only users of it are
rmqueue and __free_pages_ok() and the codepaths which lock them are not prone to
problems.

> Probability of hangs vary over time, I got the first one on the next day after
> upgrade (not even sure if it was the same as this one since I had no traces
> from it), but this second one happened after 2-3 weeks of uptime.
>
> May be it will help someone to find out what happens.

Can you send me your config file and description of workload? I have a similar E7501
around (with MPT fusion).

What drivers are you using, btw?

2004-04-05 21:31:19

by Oleg Drokin

[permalink] [raw]
Subject: Re: [2.4] NMI WD detected lockup during page alloc

Hello!

On Mon, Apr 05, 2004 at 05:43:17PM -0300, Marcelo Tosatti wrote:
> > Probability of hangs vary over time, I got the first one on the next day after
> > upgrade (not even sure if it was the same as this one since I had no traces
> > from it), but this second one happened after 2-3 weeks of uptime.
> > May be it will help someone to find out what happens.
> Can you send me your config file and description of workload? I have a similar E7501
> around (with MPT fusion).

I have some Adaptec stuff, not MPT fusion.

Config is attached.
Workload is boring http proxy (squid) for some 200+ clients on not very fat
channel, some ip accounting using firewall (btw that reminds me that ipt_state
cannot be unloaded on this box, just spins doing something in kernel if I try
to unload it, and ipt_state then shown as being in deleted state. I
always reboot in this case, so it has no influence in the main problem).
Some home-grown accounting software (that leaks a bit so over time swap is
used more and more), mail server (sendmail) with may be a 100-200 incoming
mails per minute, DNS, modem pool (ppp), samba3.

> What drivers are you using, btw?

In addition to what I have compiled in:
# lsmod
Module Size Used by Not tainted
ppp_deflate 4568 1 (autoclean)
zlib_inflate 20868 0 (autoclean) [ppp_deflate]
zlib_deflate 21272 0 (autoclean) [ppp_deflate]
ppp_async 10208 2 (autoclean)
ppp_generic 27404 6 (autoclean) [ppp_deflate ppp_async]
slhc 6384 0 (autoclean) [ppp_generic]
tulip 45760 0 (unused)
e1000 75844 1
ipt_MARK 1368 81 (autoclean)
iptable_mangle 2776 1 (autoclean)
ipt_MASQUERADE 2360 1 (autoclean)
iptable_nat 22278 1 (autoclean) [ipt_MASQUERADE]
ipt_state 1016 4 (autoclean)
ip_conntrack 30432 0 (autoclean) [ipt_MASQUERADE iptable_nat ipt_state]
ipt_unclean 7672 2 (autoclean)
iptable_filter 2412 1 (autoclean)
ip_tables 15872 9 [ipt_MARK iptable_mangle ipt_MASQUERADE iptable_nat ipt_state ipt_unclean iptable_filter]
microcode 7936 0 (autoclean)
raid1 18704 1 (autoclean)
md 66048 2 [raid1]
loop 13080 0 (autoclean)
lvm-mod 63040 13


Bye,
Oleg


Attachments:
(No filename) (2.40 kB)
mordor-config (5.39 kB)
Download all attachments

2004-04-05 22:19:25

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [2.4] NMI WD detected lockup during page alloc

On Tue, Apr 06, 2004 at 12:27:34AM +0300, Oleg Drokin wrote:
> In addition to what I have compiled in:
> # lsmod
> Module Size Used by Not tainted
> ppp_deflate 4568 1 (autoclean)

you may want to disable compression, this sounds like mm corruption and
compression isn't trivial to handle in kernel skbs (though I doubt this
is the problem but it's easy to disable).

> ipt_state 1016 4 (autoclean)

the hang while unloading this module may also be a sign of a bug in the
module so it would be nice if you could reproduce also w/o the above
ips_state.

If this still doesn't help then you can try to go UP again, SMP is
harder at stressing the memory bus and see if it stabilizes. Other thing
you can do is to remove half of the ram and see if it stabilizes to try
to identify buggy ram slots.

Overall it's unlikely the oops is useful unfortunately since that piece
of the kernel is the most stressed ever, and it just signals random mm
corruption. I assume this is the first time you've got the nmi watchdog
oops, if you could get it again it would be more interesting, I'd expect
next time you would get it in another place.

hope this helps.

2004-04-06 07:03:20

by Oleg Drokin

[permalink] [raw]
Subject: Re: [2.4] NMI WD detected lockup during page alloc

Hello!

On Tue, Apr 06, 2004 at 12:12:55AM +0200, Andrea Arcangeli wrote:
> > In addition to what I have compiled in:
> > # lsmod
> > Module Size Used by Not tainted
> > ppp_deflate 4568 1 (autoclean)
> you may want to disable compression, this sounds like mm corruption and
> compression isn't trivial to handle in kernel skbs (though I doubt this
> is the problem but it's easy to disable).

Ok.

> > ipt_state 1016 4 (autoclean)
> the hang while unloading this module may also be a sign of a bug in the
> module so it would be nice if you could reproduce also w/o the above
> ips_state.

Unfortunatelly this is not as easy to do, though I believe there is just some
sort or race on unload that is not being hit until module is unloaded and
therefore it is completely not related.

> If this still doesn't help then you can try to go UP again, SMP is
> harder at stressing the memory bus and see if it stabilizes. Other thing
> you can do is to remove half of the ram and see if it stabilizes to try
> to identify buggy ram slots.

There I have ECC RAM, passed 14 days of memtest (yes, I know memtest uses
only 1 CPU), so I do not think I have memory problems, though this is not
absolute guarantee against that of course.
Also running in UP mode for weeks is not all that funny and still proves nothing
as I do not have clear way to reproduce it in certain time.

> Overall it's unlikely the oops is useful unfortunately since that piece
> of the kernel is the most stressed ever, and it just signals random mm
> corruption. I assume this is the first time you've got the nmi watchdog
> oops, if you could get it again it would be more interesting, I'd expect
> next time you would get it in another place.

Well, I had a hang before this oops and that was main reason I enabled NMI
watchdog. At that first hang nothing get to serial console so I guessed
it was similar spinlock deadlock.
We'll see what I get when another NMI watchdog thing occurs. I run
with spinlock debug this time, so hopefully if spinlock is really just
corrupted, its magic would be corrupted as well and I get clear warning about
that.

Thank you.

Bye,
Oleg