2002-01-14 20:48:32

by Dylan Griffiths

[permalink] [raw]
Subject: Linux 2.4 NFS bug (annoying sylmlinx breakage)

I've noticed this bug before. Between two hosts on a 100Mbps switched lan,
symlinks are trashed into garbage. Based on the output. I'm guessing a
string loses its null somewhere.

Client is 2.4.14. Server is 2.4.10. Server has RAID 5 IDE softraid and
an hpt370 driver patch provided by Tim Hockin to fix the deadlocks and
oopsies of the hpt366 driver on my hpt370.

Both are configured with NFSv3 client and server support. Both use Intel
EEPro 100s.

I've been working on Mozilla again lately, but I don't have enough free HD
space sitting on my main workstation to build on. Luckily I have a large
NFS home directory. However, after configure has been run, most of the
symlinks are mangled:

dylang@shadowgate:/builds/mozilla$ ls -l /builds/mozilla/dist/include/nspr
total 12
drwxr-xr-x 2 dylang web 4096 Jan 14 14:24 md/
lrwxrwxrwx 1 dylang web 1184 Jan 14 14:24 nspr.h ->
../../../nsprpub/pr/include/./nspr.h\
$(MOD_DEPTH)/config/autoconf.mk\n\nHEADERS\ \=\ $(wildcard\
$(srcdir)/\*.h)\nCONFIGS\ \=\ $(wildcard\ $(srcdir)/\*.cfg)\n\ninclude\
$(topsrcdir)/config/rules.mk\n\nexport::\ $(MDCPUCFG_H)\n\t$(INSTALL)\ -m\
444\ $(srcdir)/$(MDCPUCFG_H)\ $(dist_includedir)\n\t$(INSTALL)\ -m\ 444\
$(CONFIGS)\ $(HEADERS)\ $(dist_includedir)/md\nifneq\
($(OS_ARCH),OpenVMS)\n\tmv\ -f\ $(dist_includedir)/$(MDCPUCFG_H)\
$(dist_includedir)/prcpucfg.h\nelse\n#\ mv'ing\ a\ link\ causes\ the\
file\ itself\ to\ move,\ not\ the\ link.\n\trm\ -f\
$(dist_includedir)/$(MDCPUCFG_H)\n\trm\ -f\
$(dist_includedir)/prcpucfg.h\n\tln\ -fs\ $(srcdir)/$(MDCPUCFG_H)\
$(dist_includedir)/prcpucfg.h\nendif\n\nreal_install::\n\t$(NSINSTALL)\
-D\ $(DESTDIR)$(includedir)/md\n\tcp\ $(srcdir)/$(MDCPUCFG_H)\
$(DESTDIR)$(includedir)/prcpucfg.h\n\t$(NSINSTALL)\ -t\ -m\ 644\
$(CONFIG)\ $(HEADERS)\ $(DESTDIR)$(includedir)/md\n\nrelease::\
export\n\t\@echo\ "Copying\ machine-dependent\ prcpucfg.h"\n\t\@if\ test\
-z\ "$(BUILD_NUMBER)";\ then\ \\\n\t\techo\ "BUILD_NUMBER\ must\ be\
defined";\ \\\n\t\tfalse;\ \\\n\tfi\n\t\@if\ test\ !\ -d\
$(RELEASE_INCLUDE_DIR);\ then\ \\\n\t\trm\ -rf\ $(RELEASE_INCLUDE_DIR);\
\\\n\t\t$(NSINSTALL)\ -D\ $(RELEASE_INCLUDE_DIR);\\\n\tfi\n\tcp\
$(srcdir)/$(MDCPUCFG_H)\ $(RELEASE_INCLUDE_DIR)/prcpucfg.h\n

...

Some of them are fine:
lrwxrwxrwx 1 dylang web 37 Jan 14 14:24 prenv.h ->
../../../nsprpub/pr/include/./prenv.h
lrwxrwxrwx 1 dylang web 37 Jan 14 14:24 prerr.h ->
../../../nsprpub/pr/include/./prerr.h
lrwxrwxrwx 1 dylang web 39 Jan 14 14:24 prerror.h ->
../../../nsprpub/pr/include/./prerror.h
lrwxrwxrwx 1 dylang web 38 Jan 14 14:24 prinet.h ->
../../../nsprpub/pr/include/./prinet.h
lrwxrwxrwx 1 dylang web 38 Jan 14 14:24 prinit.h ->
../../../nsprpub/pr/include/./prinit.h

I figure it's a script error in the latest code. So I go and start to
manually fix the symlinks. But then, as I was fixing the 4th one:

dylang@shadowgate:/builds/mozilla/dist/include/nspr$ ls -l pripcsem.h
lrwxrwxrwx 1 dylang web 41 Jan 14 14:24 pripcsem.h ->
../../../nsprpub/pr/include/./pripcsem.h\200
dylang@shadowgate:/builds/mozilla/dist/include/nspr$ rm pripcsem.h; ln -s
../../../nsprpub/pr/include/./pripcsem.h

dylang@shadowgate:/builds/mozilla/dist/include/nspr$ ls -l pripcsem.h
lrwxrwxrwx 1 dylang web 234 Jan 14 14:32 pripcsem.h ->
../../../nsprpub/pr/include/./pripcsem.h_PC_PRIO_IO:11,_PC_SOCK_MAXBUF:12,_PC_FILESIZEBITS:13,_PC_REC_INCR_XFER_SIZE:14,_PC_REC_MAX_XFER_SIZE:15,_PC_REC_MIN_XFER_SIZE:16,_PC_REC_XFER_ALIGN:17,_PC_ALLOC_SIZE_MIN:18,_PC_SYMLINK_MAX:19,;

Aha! NFSv3 somehow is not handling this well at all.

Suggestions?

(Note: please CC me as I'm not on the list; BCC will bounce)
--
http://www.kuro5hin.org -- technology and culture, from the trenches.
-=-=-=-=-=-
Those that give up liberty to obtain safety deserve neither.
-- Benjamin Franklin
http://www.zdnet.com/zdnn/stories/news/0,4586,2812463,00.html
http://slashdot.org/article.pl?sid=01/09/16/1647231
-=-=-=-=-=-


2002-01-14 21:13:52

by Andreas Dilger

[permalink] [raw]
Subject: Re: Linux 2.4 NFS bug (annoying sylmlinx breakage)

On Jan 14, 2002 14:50 -0600, Dylan Griffiths wrote:
> I've noticed this bug before. Between two hosts on a 100Mbps switched lan,
> symlinks are trashed into garbage. Based on the output. I'm guessing a
> string loses its null somewhere.
>
> Client is 2.4.14. Server is 2.4.10. Server has RAID 5 IDE softraid and
> an hpt370 driver patch provided by Tim Hockin to fix the deadlocks and
> oopsies of the hpt366 driver on my hpt370.

Upgrade your kernel before reporting such bugs. I'm pretty sure it has
already been fixed. Something about the NFSv3 calling an inappropriate
(but similarly named) function in the symlink path.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2002-01-14 21:54:23

by Dylan Griffiths

[permalink] [raw]
Subject: Re: Linux 2.4 NFS bug (annoying sylmlinx breakage)

Andreas Dilger wrote:
> Upgrade your kernel before reporting such bugs. I'm pretty sure it has
> already been fixed. Something about the NFSv3 calling an inappropriate
> (but similarly named) function in the symlink path.

I've looked at the 2.4.14 nfs fs code as it seemed client side.


/* We place the length at the beginning of the page,
* in host byte order, followed by the string. The
* XDR response verification will NULL terminate it.
*/


I'm guessing nfs3xdr.c does not have this behaviour the code relies on. I
will grab 2.4.17 and see of the code/behaviour is different.

--
http://www.kuro5hin.org -- technology and culture, from the trenches.
-=-=-=-=-=-
Those that give up liberty to obtain safety deserve neither.
-- Benjamin Franklin
http://www.zdnet.com/zdnn/stories/news/0,4586,2812463,00.html
http://slashdot.org/article.pl?sid=01/09/16/1647231
-=-=-=-=-=-