2003-01-09 22:49:19

by Nix

[permalink] [raw]
Subject: strange sparc64 -> i586 intermittent but reproducible NFS write errors to one and only one fs

When I rebooted my systems into 2.4.20 (from 2.4.19), I started seeing
EIO write() errors to files in my ext3 home directory (NFS-mounted,
exported async).

So I knocked up a test program (included below) to try to track the
failing writes down, and got more confused.

The properties of the failing writes that I've been able to determine
thus far are as follows; look out, they're weird as hell:

- the failures are definitely from write(), not open().

- writes from sparc64 to one filesystem, and only one filesystem, on
i586, both running 2.4.20, UDP NFSv3; rquotad and quotas are on, but
I am well within my quota. (quota 3.06, nfs-utils 1.0). Writes to
other filesystems on the same machine, even if they too are using
ext3, even if they too have user quotas for the same user.

What differs between filesystems that work and the one that fails I
can't tell; other FSen *on the same block device* work... (the block
device is an un-RAIDed SCSI disk.)

- local writes to the same filesystem, with the same test program,
never fail.

- writes from another IA32 box (all these boxes are near-clones of
each other as far as software is concerned) to the NFS server box
never fail.

- It happens if I mount the fs with -o soft (my default for all NFS
mounts for robustness-in-the-presence-of-machine-failure reasons),
but also if I mount with -o hard :(( besides, the timeouts happen far
too fast for it to be major timeout expiry that casues the EIOs.

- The failure always occurs for writes that cross the 2^21 byte
boundary, but not all such writes fail. You seem to need to have
done a lot of write()s before, perhaps even starting with O_TRUNC
and write()ing like mad from there on up (the WRITES_PER_OPEN
#define is a way to test that; I've never had a failure for a file
opened with O_APPEND, even if it crossed the 2^21 byte boundary).

- It happens whether _LARGEFILE_SOURCE / _FILE_OFFSET_BITS are
defined or not (I'd be amazed if this affected it, actually, but
it never hurts to check).

- Despite the EIO, the write actually *succeeds* most of the time
(perhaps not all the time; again, I'm not sure yet). In fact...

- It is quite thoroughly inconsistent. If you #define REPRODUCE to 1
in the test program and fill out sizes_to_reproduce[] with a set
of write() sizes that have caused the error in the past, the error
happens again, but not always:

done
done
Input/output error writing buffer size 246392, total size 2253462, iteration 9, writes since open() 9
Sizes used: (278831, 368095, 327921, 177734, 341464, 74705, 104763, 266303, 313646, 246392)
done
Input/output error writing buffer size 246392, total size 2253462, iteration 9, writes since open() 9
Sizes used: (278831, 368095, 327921, 177734, 341464, 74705, 104763, 266303, 313646, 246392)
done
Input/output error writing buffer size 246392, total size 2253462, iteration 9, writes since open() 9
Sizes used: (278831, 368095, 327921, 177734, 341464, 74705, 104763, 266303, 313646, 246392)
done
Input/output error writing buffer size 246392, total size 2253462, iteration 9, writes since open() 9
Sizes used: (278831, 368095, 327921, 177734, 341464, 74705, 104763, 266303, 313646, 246392)
done
done
done
done
done
done
done
Input/output error writing buffer size 246392, total size 2253462, iteration 9, writes since open() 9
Sizes used: (278831, 368095, 327921, 177734, 341464, 74705, 104763, 266303, 313646, 246392)
done

(This is pretty much exactly what running the test program looks like if
REPRODUCE is off, as well, except that with it off the failing numbers
differ each time. :) )

The kernel logs are quite empty of anything at the time these errors
occur; there's not even an NFS timeout error.

I'm using gcc-3.2.1 and glibc-2.2.5; the kernel compilers were
egcs64-19980921.1 on the sparc64 side and gcc-2.95.4pre-as-of 2001-10-02
on the i586 side.

Anyway, here's the test program; --std=gnu99 required unless you make a
few tiny changes:

#define _FILE_OFFSET_BITS 64
#define _LARGEFILE_SOURCE 1
#define _GNU_SOURCE 1

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <time.h>

#define MAXSIZE 400000
#define ITERATIONS 10
#define WRITES_PER_OPEN 10
#define FNAME "/home/nix/tmp/test-big-write"
#define REPRODUCE 0

#if REPRODUCE
size_t sizes_to_reproduce[ITERATIONS]={278831, 368095, 327921, 177734, 341464, 74705, 104763, 266303, 313646, 246392};
#endif

int main (void)
{
int foo;

errno=0;

if ((foo=open (FNAME, O_WRONLY | O_CREAT | O_TRUNC | O_LARGEFILE, 0644))<0)
{
perror ("opening file: ");
exit (1);
}

srandom (time (NULL));

int i=0;
int writes_since_open=0;
size_t size;
char *buf = NULL;
size_t sizes_used[ITERATIONS];

for (i=0; i<ITERATIONS; ++i, ++writes_since_open)
{
#ifdef REPRODUCE
size=sizes_to_reproduce[i];
#else
size=(((double)random()/RAND_MAX)*MAXSIZE);
#endif
sizes_used[i]=size;

free (buf);
errno=0;
if ((buf=(char *)calloc (size,1))==NULL)
{
perror ("allocating memory: ");
close (foo);
unlink (FNAME);
exit (2);
}

errno=0;
if ((write (foo, buf, size))<0)
{
struct stat foostat;

fstat (foo, &foostat);
fprintf (stderr,"%s writing buffer size %lu, total size %lli, iteration %i, writes since open() %i\n",strerror (errno),size,foostat.st_size, i, writes_since_open);
fprintf (stderr,"Sizes used: (");
for (int j=0; j<i; ++j)
{
fprintf (stderr, "%lu, ", sizes_used[j]);
}
fprintf (stderr, "%lu)\n", sizes_used[i]);
}
if (writes_since_open > WRITES_PER_OPEN)
{
writes_since_open=0;
close (foo);
if ((foo=open (FNAME, O_WRONLY | O_APPEND | O_LARGEFILE, 0644))<0)
{
perror ("reopening file: ");
exit (1);
}
}
}

close (foo);
unlink (FNAME);

printf ("done\n");

return 0;
}

Change FNAME to point to a file you don't mind losing ;) set ITERATIONS,
MAXSIZE and WRITES_PER_OPEN such that some of the
randomly-sized-sets-of-writes will go over the 2^21-byte limit, and see
if you can reproduce it.

Here's the .config on the sparc64 side (there's more below this, scroll
down):

CONFIG_EXPERIMENTAL=y
CONFIG_MODULES=y
CONFIG_KMOD=y
CONFIG_BBC_I2C=m
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SPARC64=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_SBUS=y
CONFIG_SBUSCHAR=y
CONFIG_BUSMOUSE=y
CONFIG_SUN_MOUSE=y
CONFIG_SERIAL=y
CONFIG_SUN_SERIAL=y
CONFIG_SERIAL_CONSOLE=y
CONFIG_SUN_KEYBOARD=y
CONFIG_SUN_CONSOLE=y
CONFIG_SUN_AUXIO=y
CONFIG_SUN_IO=y
CONFIG_PCI=y
CONFIG_RTC=y
CONFIG_PCI_NAMES=y
CONFIG_SUN_OPENPROMFS=m
CONFIG_NET=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_SPARC32_COMPAT=y
CONFIG_BINFMT_ELF32=y
CONFIG_BINFMT_ELF=m
CONFIG_BINFMT_MISC=m
CONFIG_SOLARIS_EMUL=m
CONFIG_ENVCTRL=m
CONFIG_WATCHDOG_CP1XXX=m
CONFIG_PROM_CONSOLE=y
CONFIG_SUN_OPENPROMIO=m
CONFIG_SUN_MOSTEK_RTC=y
CONFIG_SAB82532=y
CONFIG_OBP_FLASH=m
CONFIG_SUN_AURORA=m
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_RARP=y
CONFIG_IP_MROUTE=y
CONFIG_SYN_COOKIES=y
CONFIG_IDE=m
CONFIG_BLK_DEV_IDE=m
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_NS87415=y
CONFIG_IDEDMA_AUTO=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_SD_EXTRA_DEVS=10
CONFIG_CHR_DEV_SG=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_SYM53C8XX_2=y
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_NET_ETHERNET=y
CONFIG_HAPPYMEAL=y
CONFIG_SUNBMAC=m
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=1024
CONFIG_QUOTA=y
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_EXT2_FS=y
CONFIG_UFS_FS=m
CONFIG_INTERMEZZO_FS=m
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_SUNRPC=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_SUN_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_15=m
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m

And here's the .config on the i586 side:

CONFIG_X86=y
CONFIG_UID16=y
CONFIG_EXPERIMENTAL=y
CONFIG_MODULES=y
CONFIG_KMOD=y
CONFIG_M586MMX=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_USE_STRING_486=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_HAS_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_PPRO_FENCE=y
CONFIG_X86_MCE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
CONFIG_NOHIGHMEM=y
CONFIG_X86_TSC=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_ISA=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_PM=y
CONFIG_APM=y
CONFIG_APM_RTC_IS_GMT=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_PC_CML1=m
CONFIG_PARPORT_PC_FIFO=y
CONFIG_PNP=y
CONFIG_ISAPNP=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_PACKET=m
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_BLK_DEV_PIIX=y
CONFIG_PIIX_TUNING=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_SD_EXTRA_DEVS=10
CONFIG_BLK_DEV_SR=y
CONFIG_SR_EXTRA_DEVS=2
CONFIG_CHR_DEV_SG=y
CONFIG_SCSI_DEBUG_QUEUES=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_SYM53C8XX_2=y
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_NET_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_PLIP=m
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_PROC=y
CONFIG_MOUSE=y
CONFIG_PSMOUSE=y
CONFIG_RTC=y
CONFIG_QUOTA=y
CONFIG_REISERFS_FS=y
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_UMSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_MINIX_FS=m
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_EXT2_FS=y
CONFIG_INTERMEZZO_FS=m
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_SUNRPC=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_15=m
CONFIG_VGA_CONSOLE=y
CONFIG_VIDEO_SELECT=y
CONFIG_SOUND=y
CONFIG_SOUND_OSS=m
CONFIG_SOUND_SB=m
CONFIG_SOUND_AWE32_SYNTH=m
CONFIG_SOUND_YM3812=m
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m

Modules loaded, i586 side:

Module Size Used by Not tainted
opl3 11040 0
sb 7348 1
sb_lib 32334 0 [sb]
uart401 6020 0 [sb_lib]
sound 51756 1 [opl3 sb_lib uart401]
nls_iso8859-1 2844 1 (autoclean)
nls_cp437 4348 1 (autoclean)

(No modules are loaded on the sparc64 side.)


Anyone got any ideas? This is really, really weird.

--
`I knew that there had to be aliens somewhere in the universe. What I
did not know until now was that they read USENET.' --- Mark Hughes,
on those who unaccountably fail to like _A Fire Upon The Deep_


2003-01-10 06:55:30

by Denis Vlasenko

[permalink] [raw]
Subject: Re: strange sparc64 -> i586 intermittent but reproducible NFS write errors to one and only one fs

On 10 January 2003 00:56, Nix wrote:
> When I rebooted my systems into 2.4.20 (from 2.4.19), I started
> seeing EIO write() errors to files in my ext3 home directory
> (NFS-mounted, exported async).
>
> So I knocked up a test program (included below) to try to track the
> failing writes down, and got more confused.
>
> The properties of the failing writes that I've been able to determine
> thus far are as follows; look out, they're weird as hell:
>
> - the failures are definitely from write(), not open().
>
> - writes from sparc64 to one filesystem, and only one filesystem, on
> i586, both running 2.4.20, UDP NFSv3; rquotad and quotas are on,
> but I am well within my quota. (quota 3.06, nfs-utils 1.0). Writes to
> other filesystems on the same machine, even if they too are using
> ext3, even if they too have user quotas for the same user.
>
> What differs between filesystems that work and the one that fails
> I can't tell; other FSen *on the same block device* work... (the
> block device is an un-RAIDed SCSI disk.)
>
> - local writes to the same filesystem, with the same test program,
> never fail.
>
> - writes from another IA32 box (all these boxes are near-clones of
> each other as far as software is concerned) to the NFS server box
> never fail.
>
> - It happens if I mount the fs with -o soft (my default for all NFS
> mounts for robustness-in-the-presence-of-machine-failure reasons),
> but also if I mount with -o hard :(( besides, the timeouts happen
> far too fast for it to be major timeout expiry that casues the EIOs.
>
> - The failure always occurs for writes that cross the 2^21 byte
> boundary, but not all such writes fail. You seem to need to have
> done a lot of write()s before, perhaps even starting with O_TRUNC
> and write()ing like mad from there on up (the WRITES_PER_OPEN
> #define is a way to test that; I've never had a failure for a file
> opened with O_APPEND, even if it crossed the 2^21 byte boundary).
>
> - It happens whether _LARGEFILE_SOURCE / _FILE_OFFSET_BITS are
> defined or not (I'd be amazed if this affected it, actually, but
> it never hurts to check).
>
> - Despite the EIO, the write actually *succeeds* most of the time
> (perhaps not all the time; again, I'm not sure yet). In fact...
>
> - It is quite thoroughly inconsistent. If you #define REPRODUCE to 1
> in the test program and fill out sizes_to_reproduce[] with a set
> of write() sizes that have caused the error in the past, the error
> happens again, but not always:

This beast is most probably Sparc64 or 64-bit arch specific.

Try to pin down the first 2.4.20-preN where it appears.
Then inform NFS and Sparc64 folks.

If you won't get any useful hint by then, you can continue playing
patch-o-rama by reading changelog and slowly mutating last working
2.4.20-preN into first non-working pre.
--
vda

2003-01-19 20:13:30

by Nix

[permalink] [raw]
Subject: Re: strange sparc64 -> i586 intermittent but reproducible NFS write errors to one and only one fs

[Large amount of quoting to provide useful context; see below.]

On Fri, 10 Jan 2003, Denis Vlasenko recommended:
> On 10 January 2003 00:56, Nix wrote:
>> When I rebooted my systems into 2.4.20 (from 2.4.19), I started
>> seeing EIO write() errors to files in my ext3 home directory
>> (NFS-mounted, exported async).
>>
>> So I knocked up a test program (included below) to try to track the
>> failing writes down, and got more confused.
>>
>> The properties of the failing writes that I've been able to determine
>> thus far are as follows; look out, they're weird as hell:
>>
>> - the failures are definitely from write(), not open().
>>
>> - writes from sparc64 to one filesystem, and only one filesystem, on
>> i586, both running 2.4.20, UDP NFSv3; rquotad and quotas are on,
>> but I am well within my quota. (quota 3.06, nfs-utils 1.0). Writes to
>> other filesystems on the same machine, even if they too are using
>> ext3, even if they too have user quotas for the same user.
>>
>> What differs between filesystems that work and the one that fails
>> I can't tell; other FSen *on the same block device* work... (the
>> block device is an un-RAIDed SCSI disk.)
>>
>> - local writes to the same filesystem, with the same test program,
>> never fail.
>>
>> - writes from another IA32 box (all these boxes are near-clones of
>> each other as far as software is concerned) to the NFS server box
>> never fail.
>>
>> - It happens if I mount the fs with -o soft (my default for all NFS
>> mounts for robustness-in-the-presence-of-machine-failure reasons),
>> but also if I mount with -o hard :(( besides, the timeouts happen
>> far too fast for it to be major timeout expiry that casues the EIOs.
>>
>> - The failure always occurs for writes that cross the 2^21 byte
>> boundary, but not all such writes fail. You seem to need to have
>> done a lot of write()s before, perhaps even starting with O_TRUNC
>> and write()ing like mad from there on up (the WRITES_PER_OPEN
>> #define is a way to test that; I've never had a failure for a file
>> opened with O_APPEND, even if it crossed the 2^21 byte boundary).
>>
>> - It happens whether _LARGEFILE_SOURCE / _FILE_OFFSET_BITS are
>> defined or not (I'd be amazed if this affected it, actually, but
>> it never hurts to check).
>>
>> - Despite the EIO, the write actually *succeeds* most of the time
>> (perhaps not all the time; again, I'm not sure yet). In fact...
>>
>> - It is quite thoroughly inconsistent. If you #define REPRODUCE to 1
>> in the test program and fill out sizes_to_reproduce[] with a set
>> of write() sizes that have caused the error in the past, the error
>> happens again, but not always:

[and more; see <http://www.uwsg.iu.edu/hypermail/linux/kernel/0301.1/0597.html>;
note that I have seen errors on writes to files of total size <2Mb --- e.g.,
my mail overview databases --- but I can't reproduce them with a test program
yet.]

> This beast is most probably Sparc64 or 64-bit arch specific.

That seems very likely.

> Try to pin down the first 2.4.20-preN where it appears.
> Then inform NFS and Sparc64 folks.

Done. Sorry for the delay; this box is quite hard to arrange to reboot :(

Anyway, the problem appears in 2.4.20-pre10; I suspect

Trond Myklebust <[email protected]>:
o Workaround NFS hangs introduced in 2.4.20-pre

(so Cc:ed)

Does anyone have a pointer to this patch so I can try reversing it from
2.4.20pre10? (I can't see it on l-k, but since I don't know what it
looks like it's hard to find it in the archives; I don't have bitkeeper
on this machine, and can't, as one of my current projects involves
version-control filesystems).

Anyone got any other ideas, suggestions, or anything?


Thanks!

--
`I knew that there had to be aliens somewhere in the universe. What I
did not know until now was that they read USENET.' --- Mark Hughes,
on those who unaccountably fail to like _A Fire Upon The Deep_

2003-01-19 20:51:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: strange sparc64 -> i586 intermittent but reproducible NFS write errors to one and only one fs

>>>>> " " == nix <[email protected]> writes:


> Anyway, the problem appears in 2.4.20-pre10; I suspect

> Trond Myklebust <[email protected]>:
> o Workaround NFS hangs introduced in 2.4.20-pre

> (so Cc:ed)

> Does anyone have a pointer to this patch so I can try reversing
> it from 2.4.20pre10? (I can't see it on l-k, but since I don't
> know what it looks like it's hard to find it in the archives; I
> don't have bitkeeper on this machine, and can't, as one of my
> current projects involves version-control filesystems).

It sounds rather strange that this particular patch should introduce
an EIO, but here it is (fresh from BitKeeper)

Cheers,
Trond

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.717.1.6 -> 1.717.1.7
# net/sunrpc/xprt.c 1.30 -> 1.31
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/10/07 [email protected] 1.717.1.7
# [PATCH] Workaround NFS hangs introduced in 2.4.20-pre
#
# Alan,
#
# Does the following patch make any difference to the hangs?
#
# It reverts one effect of my changes, and ensures that requests get resent
# immediately after the timeout, even if doing so would violate the current
# congestion window size.
# Doing this ensures that we keep probing the `connection' to the server
# rather than just waiting for the entire window to time out. The latter
# can be very expensive due to the exponential backoff rule...
#
# Cheers,
# Trond
# --------------------------------------------
#
diff -Nru a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
--- a/net/sunrpc/xprt.c Sun Jan 19 21:56:20 2003
+++ b/net/sunrpc/xprt.c Sun Jan 19 21:56:20 2003
@@ -171,10 +171,10 @@

if (xprt->snd_task)
return;
- if (!xprt->nocong && RPCXPRT_CONGESTED(xprt))
- return;
task = rpc_wake_up_next(&xprt->resend);
if (!task) {
+ if (!xprt->nocong && RPCXPRT_CONGESTED(xprt))
+ return;
task = rpc_wake_up_next(&xprt->sending);
if (!task)
return;
@@ -1013,7 +1013,6 @@
}
rpc_inc_timeo(&task->tk_client->cl_rtt);
xprt_adjust_cwnd(req->rq_xprt, -ETIMEDOUT);
- __xprt_put_cong(xprt, req);
}
req->rq_nresend++;

@@ -1150,10 +1149,7 @@
req->rq_bytes_sent = 0;
}
out_release:
- spin_lock_bh(&xprt->sock_lock);
- __xprt_release_write(xprt, task);
- __xprt_put_cong(xprt, req);
- spin_unlock_bh(&xprt->sock_lock);
+ xprt_release_write(xprt, task);
return;
out_receive:
dprintk("RPC: %4d xmit complete\n", task->tk_pid);

2003-01-20 05:52:13

by David Miller

[permalink] [raw]
Subject: Re: strange sparc64 -> i586 intermittent but reproducible NFS write errors to one and only one fs

On Sun, 2003-01-19 at 13:00, Trond Myklebust wrote:
> It sounds rather strange that this particular patch should introduce
> an EIO, but here it is (fresh from BitKeeper)

It is possible in the context of the change causing a miscompile.
Another possibility is just timing differences causing different
sequences of events to occur than before.

2003-01-20 20:51:45

by Nix

[permalink] [raw]
Subject: Re: strange sparc64 -> i586 intermittent but reproducible NFS write errors to one and only one fs

On 19 Jan 2003, David S. Miller muttered drunkenly:
> On Sun, 2003-01-19 at 13:00, Trond Myklebust wrote:
>> It sounds rather strange that this particular patch should introduce
>> an EIO, but here it is (fresh from BitKeeper)
>
> It is possible in the context of the change causing a miscompile.
> Another possibility is just timing differences causing different
> sequences of events to occur than before.

I'll be rebooting with -pre10-minus-Trond's-patch later tonight, and
tcpdumping the NFS conversation to death if it hasn't been cured.

FWIW, the MTU on the subnet that both these machines are on has MTU
576. I really doubt this affects anything; I mention it merely because
it's a bit unusual on Ethernet networks. And all these FSen are mounted
with -o defaults,rsize=8192,wsize=8192.

I think the problem *must* be somehow timing-related; I can't see
anything else that could cause writes to fail when writing to one
NFS-exported filesystem but not to another (of the same type) on the
same machine, save that the two fsen are on different physical disks
which have rather different timing characteristics.

(Either that or the machine just hates me.)

--
`I knew that there had to be aliens somewhere in the universe. What I
did not know until now was that they read USENET.' --- Mark Hughes,
on those who unaccountably fail to like _A Fire Upon The Deep_

2003-02-03 17:50:44

by Nix

[permalink] [raw]
Subject: Re: strange sparc64 -> i586 intermittent but reproducible NFS write errors to one and only one fs

On Sun, 19 Jan 2003, Trond Myklebust said:
> It sounds rather strange that this particular patch should introduce
> an EIO, but here it is (fresh from BitKeeper)

... and indeed it doesn't.

The problem still exists in -pre9, but is very much rarer and harder to
replicate; I've sene it only half a dozen times in two weeks, in each
case during an ftp retrieval; I'm assuming there's something about the
write patterns used by ncftp (lots of few-KB appends, far apart in time)
that triggers it.

So it really is merely a timing change that has brought a pre-existing
problem into the light.

I'm going to try to come up with something that consistently reproduces
this as well, so I can track down the origins of this bug more
correctly.

--
2003-02-01: the day the STS died.