2001-07-29 07:12:36

by csmall

[permalink] [raw]
Subject: Re: strange problem with reiserfs and /proc fs

On Sat, Jul 28, 2001 at 10:04:05PM +0200, Massimo Dal Zotto wrote:
> I've found a strange problem with reiserfs. In some situations it interferes
> with the /proc filesystem and makes all processes unreadable to top. After
> a few seconds the situation returns normal. To verify the problem try the
> following procedure:
Have to be one of the strangest bugs I've seen. Makes be a bit lucky
that reiser will oops on my machine so I cannot use it...
I have also passed this bug onto the procps author, who may be able to
shed a bit more light on the problem.

> 3) type a few characters and save the file with C-x C-s. After the
> file is saved top will show 0 processes. Sometimes it will show
> only a few processes for an istant and then nothing. Sometimes
> it will work fine. After a few seconds the missing processes
> will show again. Modifying and saving the file again will show
> the same behavior.
When you say top prints nothing do you mean it only prints the header
and no processes in the list? Does this problem happen with any other
program, say vi, or only in emacs? Does ps have this bevhavour?

> In the attachments you will find two traces of the running top, one behaving
> normally and one exhibiting the problem, and my kernel config.
The interesting difference is that the good program does
stat64,open,read,close...
But the bad program does is just stat64.
I get 96 stat64s for both programs in that loop.

So obviously top doesn't like whatever stat64 is telling it.
Looking at the code (in readproc() in proc/readproc.c if anyone is
interested) I cannot see much that should upset it. We know stat is
returning 0 so that is ok, about the only other thing is a alloc.

If you like, you can submit this as a bug report into the Debian Bug
Tracking System, but I suspect there is a kernel problem here giving
wierd stat returns for proc.
- Craig
--
Craig Small VK2XLZ GnuPG:1C1B D893 1418 2AF4 45EE 95CB C76C E5AC 12CA DFA5
Eye-Net Consulting http://www.eye-net.com.au/ <[email protected]>
MIEEE <[email protected]> Debian developer <[email protected]>


2001-07-29 13:25:24

by Massimo Dal Zotto

[permalink] [raw]
Subject: Re: strange problem with reiserfs and /proc fs

> On Sat, Jul 28, 2001 at 10:04:05PM +0200, Massimo Dal Zotto wrote:
> > I've found a strange problem with reiserfs. In some situations it interferes
> > with the /proc filesystem and makes all processes unreadable to top. After
> > a few seconds the situation returns normal. To verify the problem try the
> > following procedure:
> Have to be one of the strangest bugs I've seen. Makes be a bit lucky
> that reiser will oops on my machine so I cannot use it...
> I have also passed this bug onto the procps author, who may be able to
> shed a bit more light on the problem.
>
> > 3) type a few characters and save the file with C-x C-s. After the
> > file is saved top will show 0 processes. Sometimes it will show
> > only a few processes for an istant and then nothing. Sometimes
> > it will work fine. After a few seconds the missing processes
> > will show again. Modifying and saving the file again will show
> > the same behavior.
> When you say top prints nothing do you mean it only prints the header
> and no processes in the list? Does this problem happen with any other
> program, say vi, or only in emacs? Does ps have this bevhavour?

It prints the header with 0 processes and 100% idle:

09:40:24 up 24 min, 10 users, load average: 0.16, 0.26, 0.34
0 processes: 0 sleeping, 0 running, 0 zombie, 0 stopped
CPU states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle
Mem: 126332K total, 121072K used, 5260K free, 38592K buffers
Swap: 257032K total, 73508K used, 183524K free, 24860K cached

I have been able to reproduce the bug only with emacs. Another thing I have
discovered is that if there is an intense disk activity (for example a find)
the problem disappears, so the fact that it disappears by itself after a
few seconds is probably caused by some other process accessing the disk.
Also ps shows the same behavior:

$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
$

>
> > In the attachments you will find two traces of the running top, one behaving
> > normally and one exhibiting the problem, and my kernel config.
> The interesting difference is that the good program does
> stat64,open,read,close...
> But the bad program does is just stat64.
> I get 96 stat64s for both programs in that loop.
>
> So obviously top doesn't like whatever stat64 is telling it.
> Looking at the code (in readproc() in proc/readproc.c if anyone is
> interested) I cannot see much that should upset it. We know stat is
> returning 0 so that is ok, about the only other thing is a alloc.
>
> If you like, you can submit this as a bug report into the Debian Bug
> Tracking System, but I suspect there is a kernel problem here giving
> wierd stat returns for proc.

I haven't submitted a bug because I'am not sure it is a procps problem.

--
Massimo Dal Zotto

+----------------------------------------------------------------------+
| Massimo Dal Zotto email: [email protected] |
| Via Marconi, 141 phone: ++39-0461534251 |
| 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ |
| Italy pgp: see my www home page |
+----------------------------------------------------------------------+

2001-07-30 09:24:18

by Massimo Dal Zotto

[permalink] [raw]
Subject: Re: strange problem with reiserfs and /proc fs

> On Sat, Jul 28, 2001 at 10:04:05PM +0200, Massimo Dal Zotto wrote:
> > I've found a strange problem with reiserfs. In some situations it interferes
> > with the /proc filesystem and makes all processes unreadable to top. After
> > a few seconds the situation returns normal. To verify the problem try the
> > following procedure:
> Have to be one of the strangest bugs I've seen. Makes be a bit lucky
> that reiser will oops on my machine so I cannot use it...
> I have also passed this bug onto the procps author, who may be able to
> shed a bit more light on the problem.
>
> > 3) type a few characters and save the file with C-x C-s. After the
> > file is saved top will show 0 processes. Sometimes it will show
> > only a few processes for an istant and then nothing. Sometimes
> > it will work fine. After a few seconds the missing processes
> > will show again. Modifying and saving the file again will show
> > the same behavior.


I have added some debugging code to ps. I get the following output:

$ /tmp/procps-2.0.7 > ./ps/ps # ok
PID TTY TIME CMD
5360 pts/1 00:00:00 bash
5783 pts/1 00:00:33 xterm
10642 pts/1 00:00:00 ps
ps_readproc: !ent

$ /tmp/procps-2.0.7 > ./ps/ps # error
ps_readproc: stat(/proc/1)=-1
ps_readproc: stat(/proc/2)=-1
ps_readproc: stat(/proc/3)=-1
...
ps_readproc: !ent
PID TTY TIME CMD

I have also mounted the procfs on an ext2 partition (with / on reiserfs)
and it gives the same error. It seems that the problem is having / on the
reiserfs.

--
Massimo Dal Zotto

+----------------------------------------------------------------------+
| Massimo Dal Zotto email: [email protected] |
| Via Marconi, 141 phone: ++39-0461534251 |
| 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ |
| Italy pgp: see my www home page |
+----------------------------------------------------------------------+