LinuxLists.cc - Re: Crazy load average & unkillable processes

2003-08-28 09:00:07

Subject: Re: Crazy load average & unkillable processes

Very interesting..
with the test4 I experiene the same/similar problems on my laptop..
all of sudden yesterday several programs died -> Out of Memory.
I ran
Xfree
dhcpcd
opera
several xterms (about 6)
qmail
named

first opera was Out of Memory, then died the whole X system with all
xterms and X beeing Out of Memory.

MemTotal: 385600 kB

which should be more than enough!

Nico

Ross Clarke [Thu, Aug 28, 2003 at 12:41:30AM +0200]:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Brandon wrote:
>
> |Hi Everyone,
> |
> | I'm having some bothersome problems with a couple servers of
> |mine. I'm hoping some of you have some advice on how to trouble shoot
> |this, because my little brain is running out of ideas.
> |
> |All the servers are running Redhat 7.3, 2.4.20-19smp kernels,
> |apache-1.3.27, and Soft Raid-1.
> |
> |Here is what is happening, all of a sudden the server load average
> |climbs real high. It climbs to 100+ within a few minutes, then
> |constantly grows after that. The last server that had this happen was
> |at 375 avg when I rebooted it, which always needs to be a hard reboot -
> |because the shutdown -r now command doesn't do anything.
> |
> |While this is happening, I can not run commands like 'ps fax', 'pstree',
> |'top', 'killall' etc without them hanging . Most other commands work. I
> |can SSH to the server no problem. If I do a 'ps ax' I can see a list of
> |processes, but it always hangs before displaying them all. I narrowed it
> |down to anything that needs a full process list hangs.
> |
> |I wrote a script that runs 'ls -la /proc/$P', and 'cat /proc/$P/cmdline'
> |on each process in /proc.
> |
> |What I found is the processes that hang ps and whatnot are all owned by
> |apache. The script hangs on the ls -la /proc/$P whenever it hits an
> |apache process. The processes it hangs on can not be killed with kill
> |-9. The number of apache owned processes was at 250, while on a regular
> |server it is only at 20 or so.
> |
> |Running sar -v shows the dentunusd grow huge at about the time of the
> |issues:
> |
> |04:30:00 PM dentunusd file-sz %file-sz inode-sz super-sz %super-sz
> |dquot-sz %dquot-sz rtsig-sz %rtsig-sz
> |05:30:00 PM 38823 25900 12.35 24755 0 0.00
> |0 0.00 7 0.68
> |05:40:00 PM 39757 25854 12.33 25054 0 0.00
> |0 0.00 7 0.68
> |05:50:00 PM 4294967057 23526 11.22 4303 0 0.00
> |0 0.00 18 1.76
> |
> |Also, the number of sockets grows by about 3X:
> |
> |4:30:00 PM totsck tcpsck udpsck rawsck ip-frag
> |04:40:00 PM 136 60 5 0 0
> |04:50:00 PM 112 35 5 0 0
> |05:00:00 PM 121 40 7 0 0
> |05:10:00 PM 126 44 5 0 0
> |05:20:00 PM 115 38 5 0 0
> |05:30:00 PM 119 36 8 0 0
> |05:40:00 PM 120 42 6 0 0
> |05:50:00 PM 526 236 5 0 1
> |06:00:00 PM 531 224 5 0 0
> |06:10:00 PM 535 224 5 0 0
> |
> |
> |That is just about all I have come up so far. If anyone has seen this,
> |or can recommend on what steps I should take next, I could certainly us
> |the advice.
> |
> |Thank you all
> |
> |Brandon Belshaw
> |
> |
> |
> |
> |
> |-
> |To unsubscribe from this list: send the line "unsubscribe linux-admin" in
> |the body of a message to [email protected]
> |More majordomo info at http://vger.kernel.org/majordomo-info.html
> |
>
> I just had the same similiar problem twice with 2.6.0-test4, I also used
> to experience it on 2.4.18. I managed to get ps to list tho, before all
> commands stopped working, and I noticed many of the proccesses went into
> D and Z states. I beleive they were getting stuck in the I/O subsystem,
> my other filesystems were still responding since my XMMS didnt die till
> it hit an mp3 on my main filesystem, which was about 30 minutes after
> the problem started. Any currently open application was still working,
> until I tried to do anything that required I/O, then they died aswell.
> That last happened to me about 12 hours ago, and I had to recover my
> entire /home directory. I couldnt find out what cuased it, the first
> time it was MozillaFirebird that died first, the 2nd time it was vim.
> Also both times I tried hitting the power button to see if I could get
> any form of shutdown where the data would sync, both times the kernel
> OOPS'ed on the apmd event.
>
> Anybody got any ideas?
>
> Regards,
> Ross Clarke
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQE/TTOa1+7fkD/L8TgRAkmdAJ9ciSYT6tAQGT0Uk+RD7Y8gkbmEIwCffLIT
> z2SGntQl8+1sI1QRVFZtxho=
> =utNU
> -----END PGP SIGNATURE-----
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-admin" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
quote: there are two time a day you should do nothing: before 12 and after 12
(Nico Schottelius after writin' a very senseless email)
cmd: echo God bless America | sed 's/.*$A.*$$/Why \1?/'
pgp: new id: 0x8D0E27A4 | ftp.schottelius.org/pub/familiy/nico/pgp-key.new
url: http://nerd-hosting.net - domains for nerds (from a nerd)

Attachments:

(No filename) (5.53 kB)
(No filename) (189.00 B)
Download all attachments

2003-08-28 10:08:54

by Nick Piggin

[permalink] [raw]

Subject: Re: Crazy load average & unkillable processes

Nico Schottelius wrote:

>Very interesting..
>with the test4 I experiene the same/similar problems on my laptop..
>all of sudden yesterday several programs died -> Out of Memory.
>I ran
> Xfree
> dhcpcd
> opera
> several xterms (about 6)
> qmail
> named
>
>first opera was Out of Memory, then died the whole X system with all
>xterms and X beeing Out of Memory.
>
>MemTotal: 385600 kB
>
>which should be more than enough!
>

You might have a process with a memory leak. How much free memory do
you have before everything dies? How much swapping activity is going
on? What do /proc/meminfo and /proc/slabinfo say?

2003-08-29 09:36:57

by Nico Schottelius

[permalink] [raw]

Subject: Re: Crazy load average & unkillable processes

I am attaching /proc/meminfo,slapinfo,uptime from now.
The system is f*** slow..
And I am currently just able to write this, moving windows
in X is more than painful!

Nico

Nick Piggin [Thu, Aug 28, 2003 at 07:33:25PM +1000]:
> Nico Schottelius wrote:
>
> >Very interesting..
> >with the test4 I experiene the same/similar problems on my laptop..
> >all of sudden yesterday several programs died -> Out of Memory.
> >I ran
> > Xfree
> > dhcpcd
> > opera
> > several xterms (about 6)
> > qmail
> > named
> >
> >first opera was Out of Memory, then died the whole X system with all
> >xterms and X beeing Out of Memory.
> >
> >MemTotal: 385600 kB
> >
> >which should be more than enough!
> >
>
> You might have a process with a memory leak. How much free memory do
> you have before everything dies? How much swapping activity is going
> on? What do /proc/meminfo and /proc/slabinfo say?
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-admin" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
quote: there are two time a day you should do nothing: before 12 and after 12
(Nico Schottelius after writin' a very senseless email)
cmd: echo God bless America | sed 's/.*$A.*$$/Why \1?/'
pgp: new id: 0x8D0E27A4 | ftp.schottelius.org/pub/familiy/nico/pgp-key.new
url: http://nerd-hosting.net - domains for nerds (from a nerd)

Attachments:

(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2003-08-29 11:01:55

by Nico Schottelius

[permalink] [raw]

Subject: Re: Crazy load average & unkillable processes

Btw, the only thing nico@flapp:~/archiv $ dmesg
spurious 8259A interrupt: IRQ7.

not more..
and now I have to reboot very slow. with no load!
Nico

Nico Schottelius [Fri, > I am attaching > The system is f*** slow..
> And I am currently > in X is more than painful!
>
> Nico
>
> Nick Piggin [Thu, > > Nico Schottelius wrote:
> >
> > >Very interesting..
> > >with the > > >all of > > >I ran
> > > Xfree
> > > dhcpcd
> > > opera
> > > several > > > qmail
> > > named
> > >
> > >first > > >xterms > > >
> > >MemTotal: > > >
> > >which > > >
> >
> > You might > > you have before > > on? What do > >
> >
> > -
> > To unsubscribe > > the body of > > More majordomo info at > >
>
> --
> quote: there > (Nico > cmd: echo God > pgp: new id: > url: as typing this mail is very hard as the system is
(i am even using the preempt patch..)
Aug 29, 2003 at 11:01:29AM +0200]:
/proc/meminfo,slapinfo,uptime from now.
just able to write this, moving windows
Aug 28, 2003 at 07:33:25PM +1000]:
test4 I experiene the same/similar problems on my laptop..
sudden yesterday several programs died -> Out of Memory.
xterms (about 6)
opera was Out of Memory, then died the whole X system with all
and X beeing Out of Memory.
385600 kB
should be more than enough!
have a process with a memory leak. How much free memory do
everything dies? How much swapping activity is going
/proc/meminfo and /proc/slabinfo say?
from this list: send the line "unsubscribe linux-admin" in
a message to [email protected]
http://vger.kernel.org/majordomo-info.html
are two time a day you should do nothing: before 12 and after 12
Schottelius after writin' a very senseless email)
bless America | sed 's/.*$A.*$$/Why \1?/'
0x8D0E27A4 | ftp.schottelius.org/pub/familiy/nico/pgp-key.new
//nerd-hosting.net">http://nerd-hosting.net - domains for nerds (from a nerd)
<active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
5 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
5 17 224 17 1 : tunables 120 60 0 : slabdata 1 1 0
1 24 160 24 1 : tunables 120 60 0 : slabdata 1 1 0
0 0 544 7 1 : tunables 54 27 0 : slabdata 0 0 0
1 7 544 7 1 : tunables 54 27 0 : slabdata 1 1 0
11 12 960 4 1 : tunables 54 27 0 : slabdata 3 3 0
9 202 16 202 1 : tunables 120 60 0 : slabdata 1 1 0
0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0
28 44 352 11 1 : tunables 54 27 0 : slabdata 4 4 0
0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
1 30 128 30 1 : tunables 120 60 0 : slabdata 1 1 0
18 202 16 202 1 : tunables 120 60 0 : slabdata 1 1 0
0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
1 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 288 13 1 : tunables 54 27 0 : slabdata 0 0 0
98 104 288 13 1 : tunables 54 27 0 : slabdata 8 8 0
4 30 128 30 1 : tunables 120 60 0 : slabdata 1 1 0
0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0
3 9 416 9 1 : tunables 54 27 0 : slabdata 1 1 0
22 36 832 9 2 : tunables 54 27 0 : slabdata 4 4 0
0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 304 13 1 : tunables 54 27 0 : slabdata 0 0 0
0 0 20 169 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 140 28 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 56 67 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 260 15 1 : tunables 54 27 0 : slabdata 0 0 0
0 0 260 15 1 : tunables 54 27 0 : slabdata 0 0 0
0 0 148 26 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 16 202 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 336 11 1 : tunables 54 27 0 : slabdata 0 0 0
0 0 592 13 2 : tunables 54 27 0 : slabdata 0 0 0
0 0 368 10 1 : tunables 54 27 0 : slabdata 0 0 0
0 0 132 29 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 12 253 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 224 17 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0
0 0 384 10 1 : tunables 54 27 0 : slabdata 0 0 0
0 0 20 169 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0
0 0 48 78 1 : tunables 120 60 0 : slabdata 0 0 0
8 126 28 126 1 : tunables 120 60 0 : slabdata 1 1 0
45 156 48 78 1 : tunables 120 60 0 : slabdata 2 2 0
2 253 12 253 1 : tunables 120 60 0 : slabdata 1 1 0
0 0 16 202 1 : tunables 120 60 0 : slabdata 0 0 0
5656 5656 480 8 1 : tunables 54 27 0 : slabdata 707 707 0
0 0 48 78 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 36 101 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 160 24 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 160 24 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 20 169 1 : tunables 120 60 0 : slabdata 0 0 0
18 42 92 42 1 : tunables 120 60 0 : slabdata 1 1 0
1 202 16 202 1 : tunables 120 60 0 : slabdata 1 1 0
4 9 416 9 1 : tunables 54 27 0 : slabdata 1 1 0
0 0 136 28 1 : tunables 120 60 0 : slabdata 0 0 0
0 0 80 48 1 : tunables 120 60 0 : slabdata 0 0 0
5 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
0 0 52 72 1 : tunables 120 60 0 : slabdata 0 0 0
59 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
48 48 160 24 1 : tunables 120 60 0 : slabdata 2 2 0
256 260 3072 5 4 : tunables 24 12 0 : slabdata 52 52 0
256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
260 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0
256 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
256 295 64 59 1 : tunables 120 60 0 : slabdata 5 5 0
308 404 16 202 1 : tunables 120 60 0 : slabdata 2 2 0
317 354 64 59 1 : tunables 120 60 0 : slabdata 6 6 0
65 80 384 10 1 : tunables 54 27 0 : slabdata 8 8 0
180 180 192 20 1 : tunables 120 60 0 : slabdata 9 9 0
3 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0
63 77 352 11 1 : tunables 54 27 0 : slabdata 7 7 0
27 27 144 27 1 : tunables 120 60 0 : slabdata 1 1 0
3045 3045 260 15 1 : tunables 54 27 0 : slabdata 203 203 0
2 40 96 40 1 : tunables 120 60 0 : slabdata 1 1 0
15 59 64 59 1 : tunables 120 60 0 : slabdata 1 1 0
823 836 352 11 1 : tunables 54 27 0 : slabdata 76 76 0
7988 7992 160 24 1 : tunables 120 60 0 : slabdata 333 333 0
666 750 128 30 1 : tunables 120 60 0 : slabdata 25 25 0
1 1 4096 1 1 : tunables 24 12 0 : slabdata 1 1 0
4870 4896 52 72 1 : tunables 120 60 0 : slabdata 68 68 0
50 66 352 11 1 : tunables 54 27 0 : slabdata 6 6 0
1236 1534 64 59 1 : tunables 120 60 0 : slabdata 26 26 0
51 113 32 113 1 : tunables 120 60 0 : slabdata 1 1 0
49 63 416 9 1 : tunables 54 27 0 : slabdata 7 7 0
72 118 64 59 1 : tunables 120 60 0 : slabdata 2 2 0
56 66 1312 3 1 : tunables 24 12 0 : slabdata 22 22 0
80 80 1536 5 2 : tunables 24 12 0 : slabdata 16 16 0
7773 8814 32 113 1 : tunables 120 60 0 : slabdata 78 78 0
50 50 4096 1 1 : tunables 24 12 0 : slabdata 50 50 0
0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
71 71 8192 1 2 : tunables 8 4 0 : slabdata 71 71 0
0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0
98 98 4096 1 1 : tunables 24 12 0 : slabdata 98 98 0
0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0
134 146 2048 2 1 : tunables 24 12 0 : slabdata 73 73 0
0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0
72 80 1024 4 1 : tunables 54 27 0 : slabdata 20 20 0
0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0
169 184 512 8 1 : tunables 54 27 0 : slabdata 23 23 0
0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
90 90 256 15 1 : tunables 120 60 0 : slabdata 6 6 0
0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0
120 120 192 20 1 : tunables 120 60 0 : slabdata 6 6 0
0 0 128 30 1 : tunables 120 60 0 : slabdata 0 0 0
172 180 128 30 1 : tunables 120 60 0 : slabdata 6 6 0
0 0 96 40 1 : tunables 120 60 0 : slabdata 0 0 0
1538 1560 96 40 1 : tunables 120 60 0 : slabdata 39 39 0
0 0 64 59 1 : tunables 120 60 0 : slabdata 0 0 0
7375 7375 64 59 1 : tunables 120 60 0 : slabdata 125 125 0
0 0 32 113 1 : tunables 120 60 0 : slabdata 0 0 0
1967 2147 32 113 1 : tunables 120 60 0 : slabdata 19 19 0
132 132 116 33 1 : tunables 120 60 0 : slabdata 4 4 0
15 users, load average: 0.71, 0.42, 0.33
time a day you should do nothing: before 12 and after 12
after writin' a very senseless email)
America | sed 's/.*$A.*$$/Why \1?/'
| ftp.schottelius.org/pub/familiy/nico/pgp-key.new
d-hosting.net">http://nerd-hosting.net - domains for nerds (from a nerd)

Attachments:

(No filename) (16.05 kB)
(No filename) (189.00 B)
Download all attachments

2003-08-29 11:17:43

by Nick Piggin

[permalink] [raw]

Subject: Re: Crazy load average & unkillable processes

Looks like you still have quite a lot of free memory left, so its
not that. Maybe you have runaway processes? Look in top. Although
if its only happening with test4, I guess its probably kernel
related. Maybe ACPI? Maybe your video card driver? Try booting with
acpi=off. Post a dmesg too. Thanks.

Nico Schottelius wrote:

>I am attaching /proc/meminfo,slapinfo,uptime from now.
>The system is f*** slow..
>And I am currently just able to write this, moving windows
>in X is more than painful!
>
>Nico
>
>Nick Piggin [Thu, Aug 28, 2003 at 07:33:25PM +1000]:
>
>>Nico Schottelius wrote:
>>
>>
>>>Very interesting..
>>>with the test4 I experiene the same/similar problems on my laptop..
>>>all of sudden yesterday several programs died -> Out of Memory.
>>>I ran
>>> Xfree
>>> dhcpcd
>>> opera
>>> several xterms (about 6)
>>> qmail
>>> named
>>>
>>>first opera was Out of Memory, then died the whole X system with all
>>>xterms and X beeing Out of Memory.
>>>
>>>MemTotal: 385600 kB
>>>
>>>which should be more than enough!
>>>
>>>
>>You might have a process with a memory leak. How much free memory do
>>you have before everything dies? How much swapping activity is going
>>on? What do /proc/meminfo and /proc/slabinfo say?
>>
>>
>>
>

2003-09-01 13:58:43

by Bill Davidsen

[permalink] [raw]

Subject: Re: Crazy load average & unkillable processes

I have never tried running 2.6 without swap, are there tuning values you
need to avoid performance issues. You have adequate memory, have you
played with swappiness?

I'll try no swap on a machine when I get back from the weekend.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-09-15 23:37:39

by Nico Schottelius

[permalink] [raw]

Subject: Re: Crazy load average & unkillable processes

Hello!

Once again my system died.
first it gets slow.
then the cpu cooler starts to cool permanently.
then some processes die.
then the system becomes unusable.
i hit SAK, then sysrq+i. then reboot.
attached all available information.

anyone any idea?

Nico

Nick Piggin [Thu, Aug 28, 2003 at 07:33:25PM +1000]:
>
>
> Nico Schottelius wrote:
>
> >Very interesting..
> >with the test4 I experiene the same/similar problems on my laptop..
> >all of sudden yesterday several programs died -> Out of Memory.
> >I ran
> > Xfree
> > dhcpcd
> > opera
> > several xterms (about 6)
> > qmail
> > named
> >
> >first opera was Out of Memory, then died the whole X system with all
> >xterms and X beeing Out of Memory.
> >
> >MemTotal: 385600 kB
> >
> >which should be more than enough!
> >
>
> You might have a process with a memory leak. How much free memory do
> you have before everything dies? How much swapping activity is going
> on? What do /proc/meminfo and /proc/slabinfo say?
>
>
>

--
quote: there are two time a day you should do nothing: before 12 and after 12
(Nico Schottelius after writin' a very senseless email)
cmd: echo God bless America | sed 's/.*$A.*$$/Why \1?/'
pgp: new id: 0x8D0E27A4 | ftp.schottelius.org/pub/family/nico/pgp-key.new
url: http://nerd-hosting.net - domains for nerds (from a nerd)

Attachments:

(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments