2001-10-19 00:05:57

by David E. Weekly

[permalink] [raw]
Subject: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

Hey all,

I was trying to speed up kernel compiles experimentally by moving the source
tree into tmpfs and compiling there. It seemed to work okay and crunched
through the dep phase and most of the main build phase just fine, but then
it hit a file, got an internal segfault, and stopped. I tried again -- this
time make itself segfaulted. Three more times of make segfaulting -- a
strace on make didn't reveal what was failing. Then strace started
segfaulting. Eventually "ls" segfaulted and the machine needed to be
manually rebooted. Ouch!

I ran the full memtest86 suite on the machine, and it passed with flying
colors. So the memory proper is okay.

I come to one of two conclusions: this is a wierd problem with my north
bridge, or there's something funky going on with tmpfs.

Is tmpfs stable?

Yours,
-david



2001-10-19 00:15:57

by Davide Libenzi

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Thu, 18 Oct 2001, David E. Weekly wrote:

> Hey all,
>
> I was trying to speed up kernel compiles experimentally by moving the source
> tree into tmpfs and compiling there. It seemed to work okay and crunched
> through the dep phase and most of the main build phase just fine, but then
> it hit a file, got an internal segfault, and stopped. I tried again -- this
> time make itself segfaulted. Three more times of make segfaulting -- a
> strace on make didn't reveal what was failing. Then strace started
> segfaulting. Eventually "ls" segfaulted and the machine needed to be
> manually rebooted. Ouch!
>
> I ran the full memtest86 suite on the machine, and it passed with flying
> colors. So the memory proper is okay.
>
> I come to one of two conclusions: this is a wierd problem with my north
> bridge, or there's something funky going on with tmpfs.
>
> Is tmpfs stable?

Or, is /dev/epoll stable ? :)
I'm running it both on UP and 2 way SMP w/o problems from July.
Just try w/o /dev/epoll



- Davide


2001-10-19 01:26:31

by Ed Sweetman

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Thursday 18 October 2001 20:22, Davide Libenzi wrote:
> On Thu, 18 Oct 2001, David E. Weekly wrote:
> > Hey all,
> >
> > I was trying to speed up kernel compiles experimentally by moving the
> > source tree into tmpfs and compiling there. It seemed to work okay and
> > crunched through the dep phase and most of the main build phase just
> > fine, but then it hit a file, got an internal segfault, and stopped. I
> > tried again -- this time make itself segfaulted. Three more times of make
> > segfaulting -- a strace on make didn't reveal what was failing. Then
> > strace started segfaulting. Eventually "ls" segfaulted and the machine
> > needed to be manually rebooted. Ouch!
> >
> > I ran the full memtest86 suite on the machine, and it passed with flying
> > colors. So the memory proper is okay.
> >
> > I come to one of two conclusions: this is a wierd problem with my north
> > bridge, or there's something funky going on with tmpfs.
> >
> > Is tmpfs stable?
>
> Or, is /dev/epoll stable ? :)
> I'm running it both on UP and 2 way SMP w/o problems from July.
> Just try w/o /dev/epoll
>
>
>
> - Davide

It works fine here, cept i get that damn
ld: bvmlinux: Not enough room for program headers (allocated 2, need 3)
ld: final link failed: Bad value
error now when i compile on tmpfs that i didn't get when i compiled on the
hdd with 2.4.10-acX. It's only started happening since using 2.4.12-ac3.
I've only used this kernel so i dont know if it's 2.4.12 or the ac3 part.
anyways it got to that point in about 3:30 seconds . . which is about 5
seconds faster than disk.

2001-10-19 05:12:20

by Dan Chen

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

Bug in yesterday's build of binutils (2.11.92.0.5-3), which is fixed in
today's build (2.11.92.0.7-1).

---
Dan Chen [email protected]
GPG key: http://www.cs.unc.edu/~chenda/pubkey.gpg.asc

On Thu, 18 Oct 2001, safemode wrote:

> It works fine here, cept i get that damn
> ld: bvmlinux: Not enough room for program headers (allocated 2, need 3)
> ld: final link failed: Bad value
> error now when i compile on tmpfs that i didn't get when i compiled on the
> hdd with 2.4.10-acX. It's only started happening since using 2.4.12-ac3.
> I've only used this kernel so i dont know if it's 2.4.12 or the ac3 part.
> anyways it got to that point in about 3:30 seconds . . which is about 5
> seconds faster than disk.

2001-10-19 08:28:46

by Christoph Rohland

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

Hi David,

On Thu, 18 Oct 2001, David E. Weekly wrote:
> Hey all,
>
> I was trying to speed up kernel compiles experimentally by moving
> the source tree into tmpfs and compiling there. It seemed to work
> okay and crunched through the dep phase and most of the main build
> phase just fine, but then it hit a file, got an internal segfault,
> and stopped. I tried again -- this time make itself
> segfaulted. Three more times of make segfaulting -- a strace on make
> didn't reveal what was failing. Then strace started
> segfaulting. Eventually "ls" segfaulted and the machine needed to be
> manually rebooted. Ouch!
>
> I ran the full memtest86 suite on the machine, and it passed with
> flying colors. So the memory proper is okay.
>
> I come to one of two conclusions: this is a wierd problem with my
> north bridge, or there's something funky going on with tmpfs.
>
> Is tmpfs stable?

I merged the tmpfs from -ac into the stock kernel. So there was
something changed which maybe is broken. Were there any kernel
messages, oopses?

Exactly the parallel kernel make is one of my regression tests for
tmpfs. Further on I do not see how tmpfs should interfere with the
library pages. make, ls etc use tmpfs pages. So I suspect it's
something else.

Greetings
Christoph


2001-10-20 15:17:42

by Jan-Frode Myklebust

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Fri, Oct 19, 2001 at 10:28:37AM +0200, Christoph Rohland wrote:
> >
> > Is tmpfs stable?
>
> I merged the tmpfs from -ac into the stock kernel. So there was
> something changed which maybe is broken. Were there any kernel
> messages, oopses?
>
> Exactly the parallel kernel make is one of my regression tests for
> tmpfs. Further on I do not see how tmpfs should interfere with the
> library pages. make, ls etc use tmpfs pages. So I suspect it's
> something else.
>


Running BitKeeper regression tests fails for me on tmpfs /tmp/. I have
reported it to the bitkeeper bugtracking, but am not sure if this is a
bitkeeper or tmpfs bug. Any insight?

http://bitkeeper.bkserver.com/cgi-bin/bugview?open/2001-09-11-001

Last tested with Bitkeeper 2.0 on linux 2.4.10-xfs.


-jf

2001-10-21 08:18:48

by Christoph Rohland

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

Hi JF,

Jan-Frode Myklebust wrote:

> Running BitKeeper regression tests fails for me on tmpfs /tmp/. I have
> reported it to the bitkeeper bugtracking, but am not sure if this is a
> bitkeeper or tmpfs bug. Any insight?
>
> http://bitkeeper.bkserver.com/cgi-bin/bugview?open/2001-09-11-001
>
> Last tested with Bitkeeper 2.0 on linux 2.4.10-xfs.


Can you test it with 2.4.12?

Greetings
Christoph



2001-10-21 10:07:52

by Jan-Frode Myklebust

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Sun, Oct 21, 2001 at 10:25:23AM +0200, Christoph Rohland wrote:
> >
> > Last tested with Bitkeeper 2.0 on linux 2.4.10-xfs.
>
>
> Can you test it with 2.4.12?
>

It's a bit painfull getting older versions of the XFS cvs-server (no
tagging), so I fetched the latest 2.4.13-pre5-xfs kernel. Shows exactly the
same problem.


-jf

2001-10-21 10:30:57

by Kalyan

[permalink] [raw]
Subject: wild pointer!!!!!

hi all,
I recently ported the linux kernel v 2.4.11-pre5 to MDPPro (
MPC860T processor ) board..the kernel dies with an Oops and everytime at a
different place....

i use ppcboot to load the kernel...

i am attaching a sample message below....

upon back tracing i found that at some point the kernel tries to
execute code in Letext ( according to System.map)....

it'd be great if someone can explain me what this Letext is and why
is the control going there???

thanx in advance
_kalyan.
Linux version 2.4.11-pre5 ([email protected]) (gcc version 2.95.3
20010315
(release)) #1 Sun Oct 21 13:27:55 IST 2001
On node 0 totalpages: 4096
zone(0): 4096 pages.
zone(1): 0 pages.
zone(2): 0 pages.
Kernel command line: root=/dev/nfs rw nfsroot=192.168.200.244:/tftpboot
nfsaddrs
=192.168.200.36:192.168.200.244
Decrementer Frequency = 187500000/60
Calibrating delay loop... 49.76 BogoMIPS
Memory: 14908k available (848k kernel code, 300k data, 44k init, 0k highmem)
Dentry-cache hash table entries: 2048 (order: 2, 16384 bytes)
Inode-cache hash table entries: 1024 (order: 1, 8192 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)
Page-cache hash table entries: 4096 (order: 2, 16384 bytes)
POSIX conformance testing by UNIFIX
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Starting kswapd
CPM UART driver version 0.03
ttyS00 at 0x0280 is a SMC
pty: 256 Unix98 ptys configured
block: 64 slots per queue, batch=8
eth0: CPM ENET Version 0.2 on SCC1, 00:12:23:34:34:45
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 1024 bind 1024)
IP-Config: Guessing netmask 255.255.255.0
IP-Config: Complete:
device=eth0, addr=192.168.200.36, mask=255.255.255.0,
gw=255.255.255.255,
host=192.168.200.36, domain=, nis-domain=(none),
bootserver=192.168.200.244, rootserver=192.168.200.244, rootpath=
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
Looking up port of RPC 100003/2 on 192.168.200.244
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.200.244
Root-NFS: Unable to get mountd port number from server, using default
Oops: kernel access of bad area, sig: 11
NIP: C0014768 XER: 2000FF28 LR: C00145B4 SP: C00F5DC0 REGS: c00f5d10 TRAP:
0300
Not tainted
MSR: 00001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 7C9FF303, DSISR: 0000092A
TASK = c00f3fa0[0] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
GPR00: C010F300 C00F5DC0 C00F3FA0 00000001 00001032 00000001 C0100000
C0107180
GPR08: 00000002 7FFBFDF7 7C9FF2FF C030F30F 22158222 FEE7DF32 00FFC700
007FFF5A
GPR16: 00000000 00000001 007FFF00 FFFFFFFF 00001032 000F5E70 C00D7060
C010F2F0
GPR24: C0110000 C0100000 C00FBD70 00000000 C0110000 00000001 D173C7FC
7FB97ED3
Call backtrace:
C0014538 C0012314 C00121D4 C0011E14 C0003FAC C00026E0 C0003CF0
C0003D04 C0002270 C00FC744 C0002138
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
<0>Rebooting in 180 seconds..




2001-10-21 12:15:17

by Ed Sweetman

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Sunday 21 October 2001 06:07, Jan-Frode Myklebust wrote:
> On Sun, Oct 21, 2001 at 10:25:23AM +0200, Christoph Rohland wrote:
> > > Last tested with Bitkeeper 2.0 on linux 2.4.10-xfs.
> >
> > Can you test it with 2.4.12?
>
> It's a bit painfull getting older versions of the XFS cvs-server (no
> tagging), so I fetched the latest 2.4.13-pre5-xfs kernel. Shows exactly the
> same problem.
>
>
> -jf

Like someone said before a while ago. This is a binutils problem. Update to
a newer version.

2001-10-21 12:22:00

by Paul Mackerras

[permalink] [raw]
Subject: Re: wild pointer!!!!!

Kalyan writes:

> I recently ported the linux kernel v 2.4.11-pre5 to MDPPro (
> MPC860T processor ) board..the kernel dies with an Oops and everytime at a
> different place....

What code base did you start from? Linus' official 2.4.11-pre5
release, or the linuxppc_2_4_devel PPC development tree at
ppc.bkserver.net? See http://www.penguinppc.org/dev/kernel.shtml for
details on how to access the PPC development tree. That tree would
have the most up-to-date support for the MPC860T processor.

> upon back tracing i found that at some point the kernel tries to
> execute code in Letext ( according to System.map)....

Hmmm, I don't know where that symbol would have come from, sorry. It
doesn't appear anywhere in the kernel source that I can find. Did you
add it?

> it'd be great if someone can explain me what this Letext is and why
> is the control going there???

I suggest you ask on the [email protected] mailing list,
you will find other hackers working on Linux for MPC8xx processors
there.

Paul.

2001-10-21 12:34:53

by Jan-Frode Myklebust

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

>
> Like someone said before a while ago. This is a binutils problem. Update to
> a newer version.
>

Upgraded to binutils 2.11.92.0.7, but it didn't help.


-jf

2001-10-21 12:53:28

by Ed Sweetman

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Sunday 21 October 2001 08:34, Jan-Frode Myklebust wrote:
> > Like someone said before a while ago. This is a binutils problem.
> > Update to a newer version.
>
> Upgraded to binutils 2.11.92.0.7, but it didn't help.
>
>
> -jf
2.11.92.0.7-2 works fine
and just to let you know, You wont see any gain in compile time unless your
drive is running on Pio Mode. In which case I think compile time for the
kernel is the least bit of your hdd load time worries.

2001-10-21 13:10:22

by Jan-Frode Myklebust

[permalink] [raw]
Subject: bk regression fails on tmpfs /tmp, was: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Sun, Oct 21, 2001 at 08:53:38AM -0400, safemode wrote:
> On Sunday 21 October 2001 08:34, Jan-Frode Myklebust wrote:
> > > Like someone said before a while ago. This is a binutils problem.
> > > Update to a newer version.
> >
> > Upgraded to binutils 2.11.92.0.7, but it didn't help.
> >

[snip]


> 2.11.92.0.7-2 works fine
> and just to let you know, You wont see any gain in compile time unless your
> drive is running on Pio Mode. In which case I think compile time for the
> kernel is the least bit of your hdd load time worries.
>


oh.., I think you're confusing my problem with the problem mentioned in
the subject, sorry for not changing it before. Anyway, can't find any
2.11.92.0.7-2 on
ftp://ftp.no.kernel.org/linux/kernel.org/pub/linux/devel/binutils/


-jf

2001-10-21 13:36:46

by Ed Sweetman

[permalink] [raw]
Subject: Re: bk regression fails on tmpfs /tmp, was: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Sunday 21 October 2001 09:10, Jan-Frode Myklebust wrote:
> On Sun, Oct 21, 2001 at 08:53:38AM -0400, safemode wrote:
> > On Sunday 21 October 2001 08:34, Jan-Frode Myklebust wrote:
> > > > Like someone said before a while ago. This is a binutils problem.
> > > > Update to a newer version.
> > >
> > > Upgraded to binutils 2.11.92.0.7, but it didn't help.
>
> [snip]
>
> > 2.11.92.0.7-2 works fine
> > and just to let you know, You wont see any gain in compile time unless
> > your drive is running on Pio Mode. In which case I think compile time for
> > the kernel is the least bit of your hdd load time worries.
>
> oh.., I think you're confusing my problem with the problem mentioned in
> the subject, sorry for not changing it before. Anyway, can't find any
> 2.11.92.0.7-2 on
> ftp://ftp.no.kernel.org/linux/kernel.org/pub/linux/devel/binutils/
>
>

Since you were having trouble with the current build of binutils i assumed
you were using debian since a fix in the build fixed the problem for me. I
was having the exact same error you submitted in your first mail.

2001-10-21 16:35:36

by Larry McVoy

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

> Jan-Frode Myklebust wrote:
>
> > Running BitKeeper regression tests fails for me on tmpfs /tmp/. I have
> > reported it to the bitkeeper bugtracking, but am not sure if this is a
> > bitkeeper or tmpfs bug. Any insight?
> >
> > http://bitkeeper.bkserver.com/cgi-bin/bugview?open/2001-09-11-001
> >
> > Last tested with Bitkeeper 2.0 on linux 2.4.10-xfs.

One of the engineers here has also seen this. The root cause is that
readdir() is returning a file multiple times. We've seen it on tmpfs.
We also have seen in in NFS and had a workaround, the workaround
depended that the file would be returned twice right next to each other
and that's not the case in tmpfs. [email protected] can provide you
with the details of his machine config, here's the mail he sent a while
back about it:

> Date: Tue, 16 Oct 2001 17:32:32 -0500 (EST)
> To: [email protected]
> Subject: bug in tmpfs found by bitkeeper
> From: Wayne Scott <[email protected]>
> X-Mailer: Mew version 2.0.56 on Emacs 20.7 / Mule 4.1 (AOI)
>
>
> My new machine has a reiserfs filesystem for /tmp.
>
> Since BK has a bug that prevents it from working correctly on reiserfs
> that I explained list week I can't run the regressions locally.
>
> I thought I would work around the problem by mounting the kernel
> 'tmpfs' filesystem on /tmp. Now the regressions again fail, but this
> time I thing the filesystem is to blame. A readdir() call is
> returning the same files multiple times.
>
> Look at this patch:
>
> --- /tmp/geta4199 Tue Oct 16 17:24:55 2001
> +++ sfiles.c Tue Oct 16 17:24:10 2001
> @@ -659,11 +659,13 @@
> return;
> }
> if (base[-1] != '/') *base++ = '/';
> + fprintf(stderr, "dir = %s\n", path);
> while ((e = readdir(d)) != NULL) {
> #ifndef WIN32 /* Linux 2.3.x NFS bug, skip repeats. */
> if (lastInode == e->d_ino) continue;
> lastInode = e->d_ino;
> #endif
> + fprintf(stderr, "%s: %x\n", e->d_name, e->d_ino);
> if (streq(e->d_name, ".") || streq(e->d_name, "..")) {
> continue;
> }
>
> At this output from running t.bk_basic
>
> dir = ./BitKeeper/etc/SCCS/.bk_skip
> .: 1f72
> ..: 1f71
> s.config: 1f85
> x.dfile: 1f86
> s.ignore: 1f7f
> s.logging_ok: 1f7d
> s.ignore: 1f7f
> ROOTKEY
> [email protected]|BitKeeper/etc/ignore|20011016222415|54740|3065f497fd7
> +ed3bd
> used by BitKeeper/etc/SCCS/s.ignore
> and by BitKeeper/etc/SCCS/s.ignore
>
> The file s.ignore occurs more than once. An unlike the old 2.3 NFS
> bug I see that already has a workaround, these files are not ever
> adjecent.
>
> However the tests that do complete do so very very fast. :)
> (Yes I know the value of fast and broken!)

--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-10-21 17:52:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

In article <[email protected]>,
Larry McVoy <[email protected]> wrote:
>
>One of the engineers here has also seen this. The root cause is that
>readdir() is returning a file multiple times. We've seen it on tmpfs.

Yes. "tmpfs" will consider the position in the dentry lists to be the
"offset" in the file, and if you remove files from the directory as you
do a readdir(), you can get the same file twice (or you can fail to see
files).

If somebody has a good suggestion for what could be used as a reasonably
efficient "cookie" for virtual filesystems like tmpfs, speak up. In the
meantime, one way to _mostly_ avoid this should be to give a big buffer
to readdir(), so that you end up getting all entries in one go (which
will be protected by the semaphore inside the kernel), rather than
having to do multiple readdir() calls.

(But we don't have an EOF cookie either, so..)

The logic, in case people care is just "dcache_readdir()" in
fs/readdir.c, and that logic is used for all virtual filesystems, so
fixing that will fix not just tmpfs..

Now, that said it might be worthwhile to be more robust on an
application layer by simply just sorting the directory. As you point
out, NFS to some servers can have the same issues, for very similar
reasons - on many filesystems a directory "position" is not a stable
thing if you remove or add files at the same time.

So I would consider the current tmpfs behaviour a beauty wart and
something to be fixed, but at the same time I also think you're
depending on behaviour that is not in any way guaranteed, and I would
argue that the tmpfs behaviour (while bad) is not actually strictly a
bug but more a quality-of-implementation issue.

Linus

2001-10-21 20:15:52

by Daniel Phillips

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On October 21, 2001 07:50 pm, Linus Torvalds wrote:
> In article <[email protected]>,
> Larry McVoy <[email protected]> wrote:
> >
> >One of the engineers here has also seen this. The root cause is that
> >readdir() is returning a file multiple times. We've seen it on tmpfs.
>
> Yes. "tmpfs" will consider the position in the dentry lists to be the
> "offset" in the file, and if you remove files from the directory as you
> do a readdir(), you can get the same file twice (or you can fail to see
> files).
>
> If somebody has a good suggestion for what could be used as a reasonably
> efficient "cookie" for virtual filesystems like tmpfs, speak up. In the
> meantime, one way to _mostly_ avoid this should be to give a big buffer
> to readdir(), so that you end up getting all entries in one go (which
> will be protected by the semaphore inside the kernel), rather than
> having to do multiple readdir() calls.

Assuming the "cookie" is an ordinal position in some predictable traversal of
a directory, e.g., storage order, then the problem can be resolved with some
cooperation from create and delete. For a create, the cookie should be
incremented if the position of the create is before the cookie. Likewise,
for a delete, the cookie should be decremented if the delete is before the
cookie.

> (But we don't have an EOF cookie either, so..)
>
> The logic, in case people care is just "dcache_readdir()" in
> fs/readdir.c, and that logic is used for all virtual filesystems, so
> fixing that will fix not just tmpfs..
>
> Now, that said it might be worthwhile to be more robust on an
> application layer by simply just sorting the directory. As you point
> out, NFS to some servers can have the same issues, for very similar
> reasons - on many filesystems a directory "position" is not a stable
> thing if you remove or add files at the same time.
>
> So I would consider the current tmpfs behaviour a beauty wart and
> something to be fixed, but at the same time I also think you're
> depending on behaviour that is not in any way guaranteed, and I would
> argue that the tmpfs behaviour (while bad) is not actually strictly a
> bug but more a quality-of-implementation issue.
>
> Linus

--
Daniel

2001-10-22 09:45:31

by Christoph Rohland

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

Hi Larry,

On Sun, 21 Oct 2001, Larry McVoy wrote:
> One of the engineers here has also seen this. The root cause is
> that readdir() is returning a file multiple times. We've seen it on
> tmpfs. We also have seen in in NFS and had a workaround, the
> workaround depended that the file would be returned twice right next
> to each other and that's not the case in tmpfs. [email protected]
> can provide you with the details of his machine config, here's the
> mail he sent a while back about it:

tmpfs does not know anything about directory handling. It uses
generic_read_dir and dcache_readdir. So this must be a bug in the vfs
layer. Al, what do you say?

Greetings
Christoph


2001-10-22 10:01:23

by Alexander Viro

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch



On 22 Oct 2001, Christoph Rohland wrote:

> Hi Larry,
>
> On Sun, 21 Oct 2001, Larry McVoy wrote:
> > One of the engineers here has also seen this. The root cause is
> > that readdir() is returning a file multiple times. We've seen it on
> > tmpfs. We also have seen in in NFS and had a workaround, the
> > workaround depended that the file would be returned twice right next
> > to each other and that's not the case in tmpfs. [email protected]

That's not guaranteed for NFS, BTW.

> > can provide you with the details of his machine config, here's the
> > mail he sent a while back about it:
>
> tmpfs does not know anything about directory handling. It uses
> generic_read_dir and dcache_readdir. So this must be a bug in the vfs
> layer. Al, what do you say?

If you are changing directory between the calls of getdents(2) - you have
no warranty that offsets will stay stable. It's not just Linux.

Frankly, I don't see what could be done, short of doing qsort() by inumber
or something equivalent...

2001-10-22 13:29:53

by Wayne Scott

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

From: Alexander Viro <[email protected]>
> > tmpfs does not know anything about directory handling. It uses
> > generic_read_dir and dcache_readdir. So this must be a bug in the vfs
> > layer. Al, what do you say?
>
> If you are changing directory between the calls of getdents(2) - you have
> no warranty that offsets will stay stable. It's not just Linux.
>
> Frankly, I don't see what could be done, short of doing qsort() by inumber
> or something equivalent...

So if I am adding files while reading the directory the directory
structure gets rewritten and I might return files more than once?
What happens if files are being deleted? Can files be skipped?!?

Any reason we have never seen this on ext2 on other filesystems on 10+
versions of UNIX? BitKeeper is pretty paranoid and includes a lot of
sanity checks.

Does this only happen when the subdirectory I am reading changes, or
on tmpfs will changing any directory cause this?

I am looking at coding a workaround, but I need to know how bad the
problem can be.

-Wayne

2001-10-22 17:02:58

by Bill Davidsen

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

In article <[email protected]> [email protected] wrote:

| Yes. "tmpfs" will consider the position in the dentry lists to be the
| "offset" in the file, and if you remove files from the directory as you
| do a readdir(), you can get the same file twice (or you can fail to see
| files).
|
| If somebody has a good suggestion for what could be used as a reasonably
| efficient "cookie" for virtual filesystems like tmpfs, speak up. In the
| meantime, one way to _mostly_ avoid this should be to give a big buffer
| to readdir(), so that you end up getting all entries in one go (which
| will be protected by the semaphore inside the kernel), rather than
| having to do multiple readdir() calls.

Generally "do it all at one go" solutions don't scale, and sooner of
later break on a large case. It's a useful work-around, but likely to
bite. And the semiphore being locked for too long, such as a slow
machine and large list, is prebably not desirable either.

Short of having an entry in a linked list (if I glanced at the code
correctly) include a synthetic position, I don't see any better
solution. Failing to see files is less of a problem, that seems possible
with any directory structure which reuses entries, but seeing the same
entry twice is not expected behaviour.

--
bill davidsen <[email protected]>
His first management concern is not solving the problem, but covering
his ass. If he lived in the middle ages he'd wear his codpiece backward.

2001-10-22 17:11:58

by Larry McVoy

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Mon, Oct 22, 2001 at 01:03:16PM -0400, bill davidsen wrote:
> In article <[email protected]> [email protected] wrote:
> | If somebody has a good suggestion for what could be used as a reasonably
> | efficient "cookie" for virtual filesystems like tmpfs, speak up. In the
> | meantime, one way to _mostly_ avoid this should be to give a big buffer
> | to readdir(), so that you end up getting all entries in one go (which
> | will be protected by the semaphore inside the kernel), rather than
> | having to do multiple readdir() calls.
>
> Generally "do it all at one go" solutions don't scale, and sooner of
> later break on a large case.

OK, here's what we are proposing to do in BitKeeper as a work around:

replace readdir() with an internal getdir() function

getdir() returns all directory entries in a sorted list
getdir() works by doing

for (;;) {
lstat(dir);
while (e = readdir(..)) save(e->d_name);
lstat(dir)
if (dir size && dir mtime have NOT changed) break;
cleanup the array and go start over
}
sort entries
return sorted list

The basic idea being that we first of all narrow the race window and
second of all detect the race in all cases where the mods to the dir
result in either a changed mtime or a changed size. So yes, that leaves
us open to cases where the size didn't change but the contents did but
I'll be ding danged if I can see a way around that.

As for the sorting, we want deterministic ordering of the entries for
our own reasons. It also means that we can do the duplicate suppression
in the list processing.

Anyone see a fixable flaw in this approach?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-10-22 17:29:18

by Alexander Viro

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch



On Mon, 22 Oct 2001, Larry McVoy wrote:

> for (;;) {
> lstat(dir);
> while (e = readdir(..)) save(e->d_name);
> lstat(dir)
> if (dir size && dir mtime have NOT changed) break;
> cleanup the array and go start over
> }
> sort entries
> return sorted list
>
> The basic idea being that we first of all narrow the race window and
> second of all detect the race in all cases where the mods to the dir
> result in either a changed mtime or a changed size. So yes, that leaves
> us open to cases where the size didn't change but the contents did but
> I'll be ding danged if I can see a way around that.

There's none. Notice that if you want a coherent view of directory,
you need something like the variant above anyway - even with FFS. New
entry could've been added into slack in one of the entries you've
already seen.

2001-10-22 17:30:58

by Bill Davidsen

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

In article <[email protected]> [email protected] asked:

| So if I am adding files while reading the directory the directory
| structure gets rewritten and I might return files more than once?
| What happens if files are being deleted? Can files be skipped?!?
|
| Any reason we have never seen this on ext2 on other filesystems on 10+
| versions of UNIX? BitKeeper is pretty paranoid and includes a lot of
| sanity checks.

I think you can see this on any typical UNIX directory. Assume that a
directory is created with some number of entries. Assume then that the
first ten or so are deleted. Then start your program doing readdir().
After reading a few entries, create a file in that directory. When the
deleted directory entry is reused you have already read past it, and
will not see it until the next time you read the directory.

Obviously there is a timing window here, but I used to see this with
usenet news when each article was in a separate file and files were
created and deleted in large numbers. Missing a file doesn't seem to be
a problem, because the behaviour is the same as if the file were created
after the readdir() pass, but getting the same directory entry more than
once is likely to produce anything from confusing output to serious
malfunction, depending on what's done with the informaction.

--
bill davidsen <[email protected]>
His first management concern is not solving the problem, but covering
his ass. If he lived in the middle ages he'd wear his codpiece backward.

2001-10-23 05:25:17

by Keith Owens

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Mon, 22 Oct 2001 10:12:12 -0700,
Larry McVoy <[email protected]> wrote:
>The basic idea being that we first of all narrow the race window and
>second of all detect the race in all cases where the mods to the dir
>result in either a changed mtime or a changed size.

Aren't there some file systems where the directory size is always 0? I
vaguely remember a change to make mrproper to work round that "feature".

2001-10-31 05:37:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Kernel Compile in tmpfs crumples in 2.4.12 w/epoll patch

On Mon, Oct 22, 2001 at 06:01:32AM -0400, Alexander Viro wrote:
>
> If you are changing directory between the calls of getdents(2) - you have
> no warranty that offsets will stay stable. It's not just Linux.
>

Umm... it's not Linux, but it is POSIX. POSIX states that if a file
is removed or created in a directory in the middle of a readir() scan,
that it's undefined whether or not that file which has been removed or
created will be returned by readdir(). But you're not allowed to
randomly shuffle things around and make files disappear or be returned
multiple times. Otherwise, it becomes impossible for readdir() to be
used reliably --- after all, even if an individual process isn't
deleting or creating files while doing a readdir(), it can't protect
itself from other processes happening to create or delete files while
it's doing an readdir() scan.

> Frankly, I don't see what could be done, short of doing qsort() by inumber
> or something equivalent...

Yup, that's what you'd have to do. Readdir() semantics are a bitch,
and a pain in the *ss for filesystems that are doing something other
than a FFS-style linear directory. Telldir()/seedir() semantics makes
things even worse....

- Ted