2001-02-11 20:17:31

by Olaf Hering

[permalink] [raw]
Subject: race in autofs / nfs

Hi,

there is a race in 2.4.1 and 2.4.2-pre3 in autofs/nfs.
When the cwd is on the nfs mounted server (== busy) and you try to
reboot the shutdown hangs in "rcautofs stop". I can reproduce it everytime.

I attach a screen log and a decoded sysrq output. It is not related to
that killproc, it hangs right now in automount <-> kupdate (with kernel
2.4.1):

kupdate 6 0 0 -1 ?00000000 c307a000 0 current
automoun 1525 31480 503680 54 u0176a368 df3e6000 1 current

there is a spinlock message:
_spin_lock(c021f6e4) CPU#1 NIP c01b8220 holder: cpu 1 pc E126BF28

c021f6e4 is not in the System.map, but c01b8220 is
c01b817c T memparse
c01b8200 T atomic_dec_and_lock
c01b9a5c t cleanup_proc_rtas


The attached log is kernel 2.4.2-pre3.
killproc 1701 1030738 16491808 106 u0178d5c8 c7de6000 0 current
automoun 1705 1036120 16577920 54 u0176a368 c7b46000 1 current
_spin_lock(c022279c) CPU#1 NIP c01b7c5c holder: cpu 1 pc E126BF28
SysRq: Emergency Sync

_spin_lock(c022279c) CPU#0 NIP c005d920 holder: cpu 1 pc E126BF28
_spin_lock(c022279c) CPU#1 NIP c01b7c5c holder: cpu 1 pc E126BF28

c01b7c5c is again atomic_dec_and_lock and c005d920 is d_lookup.




The system is a 2way RS/6000, 2gig ram.

Linux cantaloupe 2.4.1-SMP #1 SMP Sun Feb 11 13:02:05 GMT 2001 ppc
unknown
Kernel modules 2.4.1
Gnu C 2.95.2
Gnu Make 3.79.1
Binutils 2.10.0.33
Linux C Library x 1 root root 1499361 Feb 9 02:50
/lib/libc.so.6
Dynamic linker ldd (GNU libc) 2.2
Procps 2.0.7
Mount 2.10q
Net-tools 1.57
Kbd 1.02
Sh-utils 2.0
Modules Loaded nfsd autofs4 ipv6





Any ideas what could be wrong?


Gruss Olaf

--
$ man clone

BUGS
Main feature not yet implemented...


Attachments:
(No filename) (1.82 kB)
screenlog.0.crash.gz (8.48 kB)
blah.txt.gz (1.33 kB)
Download all attachments

2001-02-11 20:32:32

by H. Peter Anvin

[permalink] [raw]
Subject: Re: race in autofs / nfs

Olaf Hering wrote:
>
> Hi,
>
> there is a race in 2.4.1 and 2.4.2-pre3 in autofs/nfs.
> When the cwd is on the nfs mounted server (== busy) and you try to
> reboot the shutdown hangs in "rcautofs stop". I can reproduce it everytime.
>

Sounds like an NFS bug in umount.

-=hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2001-02-12 08:57:43

by Trond Myklebust

[permalink] [raw]
Subject: Re: race in autofs / nfs

>>>>> " " == H Peter Anvin <[email protected]> writes:

> Olaf Hering wrote:
>>
>> Hi,
>>
>> there is a race in 2.4.1 and 2.4.2-pre3 in autofs/nfs. When
>> the cwd is on the nfs mounted server (== busy) and you try to
>> reboot the shutdown hangs in "rcautofs stop". I can reproduce
>> it everytime.
>>

> Sounds like an NFS bug in umount.

Or a dcache bug: the above points to a corruption of the mnt_count
which is supposed to be > 0 if the partition is in use. I'm seeing a
similar leak for ext2 partitions (not involving autofs or NFS).

Cheers,
Trond

2001-02-12 09:35:39

by Olaf Hering

[permalink] [raw]
Subject: Re: race in autofs / nfs

On Mon, Feb 12, Trond Myklebust wrote:

> >>>>> " " == H Peter Anvin <[email protected]> writes:
>
> > Olaf Hering wrote:
> >>
> >> Hi,
> >>
> >> there is a race in 2.4.1 and 2.4.2-pre3 in autofs/nfs. When
> >> the cwd is on the nfs mounted server (== busy) and you try to
> >> reboot the shutdown hangs in "rcautofs stop". I can reproduce
> >> it everytime.
> >>
>
> > Sounds like an NFS bug in umount.
>
> Or a dcache bug: the above points to a corruption of the mnt_count
> which is supposed to be > 0 if the partition is in use. I'm seeing a
> similar leak for ext2 partitions (not involving autofs or NFS).

Send me patches :)
autofs is the latest, btw.
http://www.de.kernel.org/pub/linux/daemons/autofs/testing-v4/autofs-4.0.0pre9.tar.bz2


Gruss Olaf

--
$ man clone

BUGS
Main feature not yet implemented...

2001-02-12 10:15:11

by Olaf Hering

[permalink] [raw]
Subject: Re: race in autofs / nfs

On Mon, Feb 12, Trond Myklebust wrote:

> >>>>> " " == H Peter Anvin <[email protected]> writes:
>
> > Olaf Hering wrote:
> >>
> >> Hi,
> >>
> >> there is a race in 2.4.1 and 2.4.2-pre3 in autofs/nfs. When
> >> the cwd is on the nfs mounted server (== busy) and you try to
> >> reboot the shutdown hangs in "rcautofs stop". I can reproduce
> >> it everytime.
> >>
>
> > Sounds like an NFS bug in umount.
>
> Or a dcache bug: the above points to a corruption of the mnt_count
> which is supposed to be > 0 if the partition is in use. I'm seeing a
> similar leak for ext2 partitions (not involving autofs or NFS).

Some more updates:

This machines run the upcoming 7.1-ppc with autofs-4.0.0pre9.
When I install 7.0-ppc it runs rocksolid with the same kernel binary.
7.0 came with autofs-4.0.0pre7:

Linux plum 2.4.1 #1 Sun Feb 11 11:56:01 GMT 2001 ppc unknown
Kernel modules 2.4.1
Gnu C 2.95.3
Binutils 2.9.5.0.24
Linux C Library x 1 root root 4209204 Oct 4 21:42
/lib/libc.so.6
Dynamic linker ldd (GNU libc) 2.1.3
Procps 2.0.6
Mount 2.10m
Net-tools 1.56
Kbd 0.99
Sh-utils 2.0
Modules Loaded snd-card-awacs snd-pcm snd-timer snd-mixer snd
soundcore ipv6 netlink_dev nfsd autofs bmac pcmcia_core

(hmm, it loads autofs and not autofs4 on 7.0?)


It hangs again after a while, in this case:
cc1plus 19126 181495 2903920 4 u01783c18 c44ca000 1 current
automoun 19129 183289 2932624 54 u0176a368 c42ca000 0 current
_spin_lock(c021b754) CPU#1 NIP c0 072f18 holder: cpu 0 pc C0057A18


I will try the older autofs package now.




Gruss Olaf

--
$ man clone

BUGS
Main feature not yet implemented...

2001-02-12 11:51:46

by Olaf Hering

[permalink] [raw]
Subject: Re: race in autofs / nfs

On Mon, Feb 12, Olaf Hering wrote:

> On Mon, Feb 12, Trond Myklebust wrote:
>
> > >>>>> " " == H Peter Anvin <[email protected]> writes:
> >
> > > Olaf Hering wrote:
> > >>
> > >> Hi,
> > >>
> > >> there is a race in 2.4.1 and 2.4.2-pre3 in autofs/nfs. When
> > >> the cwd is on the nfs mounted server (== busy) and you try to
> > >> reboot the shutdown hangs in "rcautofs stop". I can reproduce
> > >> it everytime.
> > >>
> >
> > > Sounds like an NFS bug in umount.
> >
> > Or a dcache bug: the above points to a corruption of the mnt_count
> > which is supposed to be > 0 if the partition is in use. I'm seeing a
> > similar leak for ext2 partitions (not involving autofs or NFS).
>
> (hmm, it loads autofs and not autofs4 on 7.0?)

The autofs4.o is the culprit, it works perfect with autofs.o.

What would happen if I stick with autofs.o now?
The docu recommends autofs4 in modules.conf.



Gruss Olaf

--
$ man clone

BUGS
Main feature not yet implemented...

2001-02-12 17:39:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: race in autofs / nfs

Olaf Hering wrote:
>
> The autofs4.o is the culprit, it works perfect with autofs.o.
>
> What would happen if I stick with autofs.o now?
> The docu recommends autofs4 in modules.conf.
>

I don't know who came up with that idea. You should use the module that
matches your daemon, and not try to hack around so that there is a
module/daemon mismatch.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2001-02-12 18:16:06

by Olaf Hering

[permalink] [raw]
Subject: Re: race in autofs / nfs

On Mon, Feb 12, H. Peter Anvin wrote:

> Olaf Hering wrote:
> >
> > The autofs4.o is the culprit, it works perfect with autofs.o.
> >
> > What would happen if I stick with autofs.o now?
> > The docu recommends autofs4 in modules.conf.
> >
>
> I don't know who came up with that idea. You should use the module that
> matches your daemon, and not try to hack around so that there is a
> module/daemon mismatch.

cantaloupe:~ # /usr/sbin/automount -v
Linux automount version 4.0.0


We had 4.0pre7 in 7.0 and 4.0pre9 in 7.1.
I would really like to know _where_ it hangs, Trond sent me a printk
patch but this one was not called.
I will try to get a i386 SMP machine to see if its ppc specific.


Gruss Olaf

--
$ man clone

BUGS
Main feature not yet implemented...

2001-02-13 15:59:09

by Olaf Hering

[permalink] [raw]
Subject: Re: race in autofs / nfs

On Mon, Feb 12, Olaf Hering wrote:

> On Mon, Feb 12, H. Peter Anvin wrote:
>
> > Olaf Hering wrote:
> > >
> > > The autofs4.o is the culprit, it works perfect with autofs.o.
> > >
> > > What would happen if I stick with autofs.o now?
> > > The docu recommends autofs4 in modules.conf.
> > >
> >
> > I don't know who came up with that idea. You should use the module that
> > matches your daemon, and not try to hack around so that there is a
> > module/daemon mismatch.
>
> cantaloupe:~ # /usr/sbin/automount -v
> Linux automount version 4.0.0
>
>
> We had 4.0pre7 in 7.0 and 4.0pre9 in 7.1.
> I would really like to know _where_ it hangs, Trond sent me a printk
> patch but this one was not called.

Any ideas where to start with the debugging?


> I will try to get a i386 SMP machine to see if its ppc specific.

I'm unable to reproduce it on a Piii 750 with SuSE 7.1.

guillory:/usr/src/OLAF/linux-2.4.2-pre3 # sh scripts/ver_linux
-- Versions installed: (if some fields are empty or look
-- unusual then possibly you have very old versions)
Linux guillory 2.4.2-pre3-SMP #3 SMP Tue Feb 13 14:50:47 CET 2001 i686
unknown
Kernel modules 2.4.1
Gnu C 2.95.2
Gnu Make 3.79.1
Binutils 2.10.0.33
Linux C Library x 1 root root 1382179 Jan 19 07:14
/lib/libc.so.6
Dynamic linker ldd (GNU libc) 2.2
Procps 2.0.7
Mount 2.10q
Net-tools 1.57
Kbd 1.02
Sh-utils 2.0
Modules Loaded nfsd ipv6 mousedev hid input usbcore eepro100


both 2.4.1ac10 and 2.4.2-pre3 boot fine.

This machine oops in the usb stack, but thats another issue.



Gruss Olaf

--
$ man clone

BUGS
Main feature not yet implemented...