2001-11-13 08:05:41

by Mika Yrjola

[permalink] [raw]
Subject: /proc/<pidnumber>/stat hangs reading process

Hello,

basically this posting is about the same problem as one I posted in
September:

http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.0/0764.html

It's essentially the same situation: I was running mozilla and it stopped
responding to any input. I tried to kill it with control-c, kill and
finally with kill -9, but none helped. When I tried to look at the output
of top and ps, the exactly same symptons appeared; those processes didn't
finish and can't be killed either. When I do strace ps the output ends at:

stat64("/proc/16515", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/16515/stat", O_RDONLY) = 7
read(7,

Kernel information at this time:

Linux renttu.lnet.lut.fi 2.4.12 #1 Wed Oct 17 20:09:21 EEST 2001 i686
unknown

(in September I was running 2.4.8)

Additional differences compared to the previous post:

- the machine has now 512 MB of DDR memory
- the /var/log/messages entry didn't appear this time, so I guess it was
probably something unrelated last time.

I'll keep the machine running until evening and upgrade to newer kernel
in case someone has any suggestions to try for getting more information
about the problem. Anyone knows if this problem is fixed in newer kernels?
(Any suggestions for the kernel to go - 2.4.13-ac, 2.4.14, 2.4.15-pre4 ?)

I've not subscribed to the list, because my mail traffic is already quite
overwhelming, so I appreciate if replies will be CC'ed to me, althought I
follow the list now and then with the archive on the web.

--
/-------------------------------------------------------------------------\
I Fantasy, Sci-fi, Linux, Amiga, Telecommunications, Oldfield, Vangelis I
I Seti@Home, Steady relationship, more at http://www.lut.fi/%7emyrjola/ I
\-------------------------------------------------------------------------/


2001-11-13 15:37:32

by Marcelo Roberto Jimenez

[permalink] [raw]
Subject: Re: /proc/<pidnumber>/stat hangs reading process

Mika,

> Hello,
> basically this posting is about the same problem as one I posted in
> September:
>
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.0/0764.html
>
> It's essentially the same situation: I was running mozilla and it stopped
> responding to any input. I tried to kill it with control-c, kill and
> finally with kill -9, but none helped. When I tried to look at the output
> of top and ps, the exactly same symptons appeared; those processes didn't
> finish and can't be killed either. When I do strace ps the output ends
> at:
>
> stat64("/proc/16515", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
> open("/proc/16515/stat", O_RDONLY) = 7
> read(7,

I'm having this problem too, for a long time. It's usually associated with big loads ( for my machine, of course, a PII-233 ). It has happened while opening lot's of pages with opera, but has also happened while compiling 3 kernels at the same time and playing a video with xine or aviplay.

The behavior is the same: ps blocks, gtop blocks, killall blocks, anything that tries to get the process information blocks too.

The machine can be used as long as a program does not try to call the problematic function, whitch I wasn't able to trace down.

I haven't had this problem for a while, basically because I try not to stress these ``hanging'' applications anymore, so that I can work, but I'll try to see if I can reproduce the bug with the new VM.

The problem is: what can we do, to investigate the problem, once ps starts to block?

Regards,

Marcelo.


2001-11-18 20:58:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: /proc/<pidnumber>/stat hangs reading process

"Marcelo Roberto Jimenez" <[email protected]> writes:

> Mika,
>
> > Hello,
> > basically this posting is about the same problem as one I posted in
> > September:
> >
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.0/0764.html
> >
> > It's essentially the same situation: I was running mozilla and it stopped
> > responding to any input. I tried to kill it with control-c, kill and
> > finally with kill -9, but none helped. When I tried to look at the output
> > of top and ps, the exactly same symptons appeared; those processes didn't
> > finish and can't be killed either. When I do strace ps the output ends
> > at:
> >
> > stat64("/proc/16515", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
> > open("/proc/16515/stat", O_RDONLY) = 7
> > read(7,
>
> I'm having this problem too, for a long time. It's usually associated with big
> loads ( for my machine, of course, a PII-233 ). It has happened while opening
> lot's of pages with opera, but has also happened while compiling 3 kernels at
> the same time and playing a video with xine or aviplay.
>
>
> The behavior is the same: ps blocks, gtop blocks, killall blocks, anything that
> tries to get the process information blocks too.
>
>
> The machine can be used as long as a program does not try to call the
> problematic function, whitch I wasn't able to trace down.
>
>
> I haven't had this problem for a while, basically because I try not to stress
> these ``hanging'' applications anymore, so that I can work, but I'll try to see
> if I can reproduce the bug with the new VM.
>
>
> The problem is: what can we do, to investigate the problem, once ps starts to
> block?

Try using Alt-Sysrq and find the address in the kernel where the processes
are blocking. The you should be able to trace back and figure out which
lock things are blocking on.

I have only seen this once on buggy hardware. (At least on a recent kernel).
Earlier kernels had a case where they contended with process that in
certain circumstances had locks normally held. And the ps never managed
to grab the lock.

Additionally there are a few other pieces like spin lock debugging in 2.4.14
that you might want to compile in as well.

Eric