2006-02-16 06:12:00

by Dave Robertson

[permalink] [raw]
Subject: Fwd: for your comments

Hi,

We'd like to report a problem we are seeing with NFS clients ( Fedora
Core 4 using kernel 2.6.15) served by OSX Server 10.4.4.
The clients randomly get an I/O error when attempting to list
the contents of the current directory, and we are also seeing "RPC:
error 5 connecting to server xxx.xxx.xxx.xxx" but

"strace ls" shows the EIO is generated by the call to getdents64 as
shown below.

No errors appear on the server.

The I/O error can be corrected by changing to another directory and
then listing the directory where the error occurred.

The network has been running well for two years prior to the upgrades.
Clients upgraded from FC3 to FC4 and the server from OSX 10.3.9 to
10.4.3.

We are not seeing this on our other client OS (Tru-64)
We've also tried combinations of OSX Server 10.4.5 and FC5 Test 2
(Linux 2.6.15) but the problem still surfaces.

We are about to try regressing kernel versions on the Linux clients
and then the server OS versions to try and isolate when this issue
first appeared, but this will take time as these are operational
systems.

Are there any other tests we could do in the meantime to help isolate
the cause of this?


Thanks,

Dave Robertson
System Administrator
University of Otago, New Zealand


--------------------------------------------------

[/home/students/astudent]$ ls
ls: reading directory .: Input/output error

[/home/students/astudent]$ strace ls
execve("/bin/ls", ["ls"], [/* 26 vars */]) = 0
brk(0) = 0x845e000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f00000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=100498, ...}) = 0
old_mmap(NULL, 100498, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ee7000
close(3) = 0
open("/lib/librt.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\2200
\307"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=49428, ...}) = 0
old_mmap(0xc71000, 81656, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_DENYWRITE, 3, 0) = 0xc71000
old_mmap(0xc79000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x7000) = 0xc79000
old_mmap(0xc7b000, 40696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0xc7b000
close(3) = 0
open("/lib/libacl.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\203"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=25892, ...}) = 0
old_mmap(0x5f7000, 27248, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_DENYWRITE, 3, 0) = 0x5f7000
old_mmap(0x5fd000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x5000) = 0x5fd000
close(3) = 0
open("/lib/libselinux.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\20E\302"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=68864, ...}) = 0
old_mmap(0xc22000, 68592, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_DENYWRITE, 3, 0) = 0xc22000
old_mmap(0xc32000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x10000) = 0xc32000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\212\16"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1485672, ...}) = 0
old_mmap(0x69c000, 1215452, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_DENYWRITE, 3, 0) = 0x69c000
old_mmap(0x7bf000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x123000) = 0x7bf000
old_mmap(0x7c3000, 7132, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x7c3000
close(3) = 0
open("/lib/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\204G
\217"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=101600, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7ee6000
old_mmap(0x8f0000, 70084, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_DENYWRITE, 3, 0) = 0x8f0000
old_mmap(0x8fe000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0xd000) = 0x8fe000
old_mmap(0x900000, 4548, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x900000
close(3) = 0
open("/lib/libattr.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`+\337
\000"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=13532, ...}) = 0
old_mmap(0xdf2000, 14904, PROT_READ|PROT_EXEC, MAP_PRIVATE|
MAP_DENYWRITE, 3, 0) = 0xdf2000
old_mmap(0xdf5000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x2000) = 0xdf5000
close(3) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7ee5000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7ee56c0, limit:
1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1,
seg_not_present:0, useable:1}) = 0
mprotect(0xc79000, 4096, PROT_READ) = 0
mprotect(0x7bf000, 8192, PROT_READ) = 0
mprotect(0x8fe000, 4096, PROT_READ) = 0
mprotect(0x698000, 4096, PROT_READ) = 0
munmap(0xb7ee7000, 100498) = 0
set_tid_address(0xb7ee5708) = 26584
rt_sigaction(SIGRTMIN, {0x8f4340, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x8f43a8, [], SA_RESTART|SA_SIGINFO}, NULL, 8)
= 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024,
rlim_max=RLIM_INFINITY}) = 0
_sysctl({{CTL_KERN, KERN_VERSION}, 2, 0xbfbfdca0, 35, (nil), 0}) = 0
access("/etc/selinux/", F_OK) = 0
brk(0) = 0x845e000
brk(0x847f000) = 0x847f000
open("/etc/selinux/config", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=447, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7eff000
read(3, "# This file controls the state o"..., 4096) = 447
close(3) = 0
munmap(0xb7eff000, 4096) = 0
open("/proc/mounts", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7eff000
read(3, "rootfs / rootfs rw 0 0\n/dev /dev"..., 1024) = 843
read(3, "", 1024) = 0
close(3) = 0
munmap(0xb7eff000, 4096) = 0
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=49610736, ...}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7ce5000
close(3) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon
echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=49, ws_col=100, ws_xpixel=600,
ws_ypixel=686}) = 0
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0750, st_size=1836, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
getdents64(3, 0x8461d64, 32768) = -1 EIO (Input/output
error)
<<<--------------------------------
close(3) = 0
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7eff000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2528
read(3, "", 4096) = 0
close(3) = 0
munmap(0xb7eff000, 4096) = 0
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY)
= -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY)
= -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
write(2, "ls: ", 4ls: ) = 4
write(2, "reading directory .", 19reading directory .) = 19
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
write(2, ": Input/output error", 20: Input/output error) = 20
write(2, "\n", 1
) = 1
exit_group(1) = ?





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-02-16 14:09:12

by Chuck Lever

[permalink] [raw]
Subject: Re: Fwd: for your comments

Dave Robertson wrote:
> Hi,
>
> We'd like to report a problem we are seeing with NFS clients ( Fedora
> Core 4 using kernel 2.6.15) served by OSX Server 10.4.4.
> The clients randomly get an I/O error when attempting to list
> the contents of the current directory, and we are also seeing "RPC:
> error 5 connecting to server xxx.xxx.xxx.xxx" but
>
> "strace ls" shows the EIO is generated by the call to getdents64 as
> shown below.
>
> No errors appear on the server.

i know this is a sporadic error, but capturing a network trace while the
error occurs would show why the client is having trouble connecting to
the server.

i use ethereal, but "tcpdump -s1536 -vv host yourserver" would also work.

> The I/O error can be corrected by changing to another directory and
> then listing the directory where the error occurred.

usually this means that the client is able to reconnect properly to the
server, and continue normal operation.

> The network has been running well for two years prior to the upgrades.
> Clients upgraded from FC3 to FC4 and the server from OSX 10.3.9 to 10.4.3.
>
> We are not seeing this on our other client OS (Tru-64)
> We've also tried combinations of OSX Server 10.4.5 and FC5 Test 2
> (Linux 2.6.15) but the problem still surfaces.


Attachments:
cel.vcf (451.00 B)