2002-05-14 12:11:29

by Ulrich Hochholdinger

[permalink] [raw]
Subject: Stale NFS file handle with 2.4.17 / 2.4.18

Hi,
I've problems with stale NFS file handles.


2002-05-17 09:58:36

by Ryan Sweet

[permalink] [raw]
Subject: Re: 2.4.18 disk i/o load spikes was: re: knfsd load spikes


I did some additional testing, and in my case I do not think the problem
I am having is nfs related. Thus perhaps we can move this discussion to
lkml. I will probably post a summary there later today.

I can reproduce the issue at will when the file server is not busy using
the slowspeed.c program that was attached in previous message.

If I run it with 10 streams at 65k against the external RAID array
(adaptec 29160 controller), it will eventually (within 20 minutes) spiral
into severe pain (load > 30).

Looking at /proc/scsi/aic7xxx/2, I can see that the Commands Active is
always pegged at 8. The Command Openings reads 245 (the controller depth
of 253-8). Looking at the kernel config, the aic7xx driver was built with
the old default TCQ depth of 8, but it should really be 253 (I think).

I tested another system, slower, only single cpu, but with the same
controller. I used the same kernel and could easily reproduce the problem
with about 6 streams. Then I rebuilt the same kernel only changed the TCQ
depth to 253. In that configuration the system does very well up to
about 20 - 25 streams, at which point it starts to wait too long. Looking
in /proc/scsi/aic7xx on that system the Commands Active is pegged at 64,
and Command Openings at 0. When the system is idle, Command Openings is
at 64. Note that I can still cause the problem to happen with 20+ streams
of I/O. That hardly seems optimal.

So first on my list is to reboot the filer with the aic7xxx set to TCQ
depth of 253.

My questions are still:
1) Why if the kernel (as reported by dmesg) has the TCQ set to 253, does
it cap it at 64?

2) What causes it to spiral to unusable loads when the TCQ is full?



--
Ryan Sweet <[email protected]>
Atos Origin Engineering Services
http://www.aoes.nl




_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-14 12:56:48

by Ryan Sweet

[permalink] [raw]
Subject: Re: OOPS: in kernel rpc.mountd with IRIX client patch


I have been running a server with 2.4.2+xfs-1.0.2 and the IRIX client
patch (posted on this list a while back, to fix problems with IRIX
clients and 64bit filehandles, included below) successfully for quite a
while.

The server is a dual PIII 733/256MB system with an adaptec 29xx UW160 card
and an external SkyRAID array.

Recently, after adding several new and _very_ fast client machines
(several dual xeon 2.2 gigahertz systems, running 2.4.9-31 redhat kernel)
that are doing thousands of small writes, all at once, the performance has
started to really suck for periods of two/three minutes at a time. the
load will go up to 30+, the kernel will be thrashing in bdflush mostly,
and then eventually the load will come back down again.

Updating to 2.4.18+XFS-1.1 appears to have solved that problem (hard to
say for sure since it is intermittent, however if we apply the IRIX client
workaround patch then we get almost immediate oopses all over the nfs
server components. Here are some:
May 13 22:53:47 ats-data-1 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000
May 13 22:53:47 ats-data-1 kernel: printing eip:
May 13 22:53:47 ats-data-1 kernel: c01831e3
May 13 22:53:47 ats-data-1 kernel: *pde = 00000000
May 13 22:53:47 ats-data-1 kernel: Oops: 0002
May 13 22:53:47 ats-data-1 kernel: CPU: 1
May 13 22:53:47 ats-data-1 kernel: EIP: 0010:[fh_compose+483/788]
Not tainted
May 13 22:53:47 ats-data-1 kernel: EFLAGS: 00010203
May 13 22:53:47 ats-data-1 kernel: eax: 00000040 ebx: d83dc094 ecx:
d83dc0a4 edx: 00000004
May 13 22:53:47 ats-data-1 kernel: esi: dd535eb8 edi: 00000000 ebp:
dd6b59e0 esp: dd535e7c
May 13 22:53:47 ats-data-1 kernel: ds: 0018 es: 0018 ss: 0018
May 13 22:53:47 ats-data-1 kernel: Process nfsd (pid: 723,
stackpage=dd535000)
May 13 22:53:47 ats-data-1 kernel: Stack: 00000006 dd6b59e0 cb0980c8
d83dc004 dd6b59e0 cb0980ce 84e67838 cb0980c8
May 13 22:53:47 ats-data-1 kernel: d83dc004 c014242f dd535eb4
da053820 00000006 d83dc0a4 d8985b60 0000000d
May 13 22:53:47 ats-data-1 kernel: c0183cf1 d83dc094 d6b2d000
dd6b59e0 d83dc004 d83dc004 00000006 cb0980c8
May 13 22:53:47 ats-data-1 kernel: Call Trace: [lookup_one_len+87/104]
[nfsd_lookup+945/1000] [nfsd3_proc_lookup+331/348]
[nfs3svc_decode_diropargs+152/260] [nfsd_dispatch+203
/402]
May 13 22:53:48 ats-data-1 kernel: [svc_process+653/1308]
[nfsd+428/856] [kernel_thread+35/48]
May 13 22:53:48 ats-data-1 kernel:
May 13 22:53:48 ats-data-1 kernel: Code: c7 07 00 00 00 00 83 c7 04 4a 79
f4 8b 55 08 8b 4c 24 48 8b
M

May 14 14:02:17 ats-data-0 rpc.mountd: authenticated mount request from
iapp-0:749 for /exportB/home (/exportB)
May 14 14:02:17 ats-data-0 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000
May 14 14:02:17 ats-data-0 kernel: printing eip:
May 14 14:02:17 ats-data-0 kernel: c01831e3
May 14 14:02:17 ats-data-0 kernel: *pde = 00000000
May 14 14:02:17 ats-data-0 kernel: Oops: 0002
May 14 14:02:17 ats-data-0 kernel: CPU: 0
May 14 14:02:17 ats-data-0 kernel: EIP: 0010:[fh_compose+483/788]
Not tainted
May 14 14:02:17 ats-data-0 kernel: EFLAGS: 00010203
May 14 14:02:17 ats-data-0 kernel: eax: 00000040 ebx: de9c5e8c ecx:
de9c5e9c edx: 00000004
May 14 14:02:17 ats-data-0 kernel: esi: de9c5e50 edi: 00000000 ebp:
dd1b9940 esp: de9c5e14
May 14 14:02:17 ats-data-0 kernel: ds: 0018 es: 0018 ss: 0018
May 14 14:02:17 ats-data-0 kernel: Process rpc.mountd (pid: 751,
stackpage=de9c5000)
May 14 14:02:17 ats-data-0 kernel: Stack: de9c5e8c 00000083 de9c5f1c
cac02800 019cc780 dd1b9940 c0141f18 de694320
May 14 14:02:18 ats-data-0 kernel: de9c5f1c 00000000 cc3b2000
cac02800 c0186582 de9c5e9c dd1679a0 0000000d
May 14 14:02:18 ats-data-0 kernel: c0186d06 de9c5e8c cac02800
dd1b9940 00000000 0000041c cc3b2004 cc3b2000
May 14 14:02:18 ats-data-0 kernel: Call Trace: [link_path_walk+1872/2072]
[exp_parent+50/68] [exp_rootfh+538/632] [sys_nfsservctl+878/1028] [filp_cl
ose+156/168]
May 14 14:02:18 ats-data-0 kernel: [sys_close+91/112]
[system_call+51/56]
May 14 14:02:18 ats-data-0 kernel:
May 14 14:02:18 ats-data-0 kernel: Code: c7 07 00 00 00 00 83 c7 04 4a 79
f4 8b 55 08 8b 4c 24 48 8b

I assume then that this patch (below) needs to be updated somewhere for
2.4.18. I tried diving in to see if I could figure out where/why/etc...,
but I have to admit that I do not see what is broken.

Is there a newer version of the IRIX nfs client patch (IIRC Neil has said
it would not ever go into the kernel because it was a temporary workaround
for a bug in IRIX - the problem does not occur with IRIX 6.5.14+)?

If not, does someone see what needs to be changed/fixed, etc...?

Here is the patch:
*** fs/nfsd/nfsfh.c 2001/02/14 03:20:12 1.1
--- fs/nfsd/nfsfh.c 2001/02/14 04:23:40
***************
*** 699,705 ****
* an inode. In this case a call to fh_update should be made
* before the fh goes out on the wire ...
*/
! inline int _fh_update(struct dentry *dentry, struct svc_export *exp,
__u32 **datapp, int maxsize)
{
__u32 *datap= *datapp;
--- 699,705 ----
* an inode. In this case a call to fh_update should be made
* before the fh goes out on the wire ...
*/
! inline int _fh_update2(struct dentry *dentry, struct svc_export *exp,
__u32 **datapp, int maxsize)
{
__u32 *datap= *datapp;
***************
*** 717,723 ****
*datapp = datap;
return 2;
}
!
int
fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry)
{
--- 717,733 ----
*datapp = datap;
return 2;
}
! inline int _fh_update(struct dentry *dentry, struct svc_export *exp,
! __u32 **datapp, int maxsize)
! {
! __u32 *datap = *datapp;
! int i;
! for (i=3;i<8;i++)
! *datap++ = 0;
! i = _fh_update2(dentry, exp, datapp, maxsize);
! *datapp = datap;
! return i;
! }
int
fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry)
{


--
Ryan Sweet <[email protected]>
Atos Origin Engineering Services
http://www.aoes.nl


_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-15 12:53:23

by Ryan Sweet

[permalink] [raw]
Subject: 2.4.18 knfsd load spikes


I didn't get any responses to the message below, but I _did_ bite the
bullet and update the IRIX systems, and now the 64bit filehandle problem
is solved.

However, the performance problem is not. With 2.4.18+xfs1.1, It is
definitely better (the load spikes to 7 or 8, sometimes 10, instead of 20
or 30...), but I still get the periods where suddenly the system will
respond _very_ slowly, cpu is mostly idle, memory is all used, but only
for cache, the system is not swapping at all, but the load climbs up and
up. It then gradually falls back down. The top processes are usually
bdflush and kupdated, with kupdated always in the dead wait (DW) state.
It is basically the same behaviour that we saw with 2.4.[2|5]+xfs1.0.2,
though not as painful. The problem usually lasts for 3 or four minutes,
then subsides.

The problem seemed to begin around the time we added a few new, really
fast compute workstations, each of which is periodically doing thousands
of small writes/reads. I cannot yet make a direct correlation, however,
until I can get a decent tcpdump.

does anyone have any pointers on where to begin looking? Have other
people seen this behaviour?

thanks,
-Ryan

On Tue, 14 May 2002, Ryan Sweet wrote:

>
> I have been running a server with 2.4.2+xfs-1.0.2 and the IRIX client
> patch (posted on this list a while back, to fix problems with IRIX
> clients and 64bit filehandles, included below) successfully for quite a
> while.
>
> The server is a dual PIII 733/256MB system with an adaptec 29xx UW160 card
> and an external SkyRAID array.
>
> Recently, after adding several new and _very_ fast client machines
> (several dual xeon 2.2 gigahertz systems, running 2.4.9-31 redhat kernel)
> that are doing thousands of small writes, all at once, the performance has
> started to really suck for periods of two/three minutes at a time. the
> load will go up to 30+, the kernel will be thrashing in bdflush mostly,
> and then eventually the load will come back down again.
>
> Updating to 2.4.18+XFS-1.1 appears to have solved that problem (hard to
> say for sure since it is intermittent, however if we apply the IRIX client
> workaround patch then we get almost immediate oopses all over the nfs
> server components. Here are some:
> May 13 22:53:47 ats-data-1 kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000000
> May 13 22:53:47 ats-data-1 kernel: printing eip:
> May 13 22:53:47 ats-data-1 kernel: c01831e3
> May 13 22:53:47 ats-data-1 kernel: *pde = 00000000
> May 13 22:53:47 ats-data-1 kernel: Oops: 0002
> May 13 22:53:47 ats-data-1 kernel: CPU: 1
> May 13 22:53:47 ats-data-1 kernel: EIP: 0010:[fh_compose+483/788]
> Not tainted
> May 13 22:53:47 ats-data-1 kernel: EFLAGS: 00010203
> May 13 22:53:47 ats-data-1 kernel: eax: 00000040 ebx: d83dc094 ecx:
> d83dc0a4 edx: 00000004
> May 13 22:53:47 ats-data-1 kernel: esi: dd535eb8 edi: 00000000 ebp:
> dd6b59e0 esp: dd535e7c
> May 13 22:53:47 ats-data-1 kernel: ds: 0018 es: 0018 ss: 0018
> May 13 22:53:47 ats-data-1 kernel: Process nfsd (pid: 723,
> stackpage=dd535000)
> May 13 22:53:47 ats-data-1 kernel: Stack: 00000006 dd6b59e0 cb0980c8
> d83dc004 dd6b59e0 cb0980ce 84e67838 cb0980c8
> May 13 22:53:47 ats-data-1 kernel: d83dc004 c014242f dd535eb4
> da053820 00000006 d83dc0a4 d8985b60 0000000d
> May 13 22:53:47 ats-data-1 kernel: c0183cf1 d83dc094 d6b2d000
> dd6b59e0 d83dc004 d83dc004 00000006 cb0980c8
> May 13 22:53:47 ats-data-1 kernel: Call Trace: [lookup_one_len+87/104]
> [nfsd_lookup+945/1000] [nfsd3_proc_lookup+331/348]
> [nfs3svc_decode_diropargs+152/260] [nfsd_dispatch+203
> /402]
> May 13 22:53:48 ats-data-1 kernel: [svc_process+653/1308]
> [nfsd+428/856] [kernel_thread+35/48]
> May 13 22:53:48 ats-data-1 kernel:
> May 13 22:53:48 ats-data-1 kernel: Code: c7 07 00 00 00 00 83 c7 04 4a 79
> f4 8b 55 08 8b 4c 24 48 8b
> M
>
> May 14 14:02:17 ats-data-0 rpc.mountd: authenticated mount request from
> iapp-0:749 for /exportB/home (/exportB)
> May 14 14:02:17 ats-data-0 kernel: Unable to handle kernel NULL pointer
> dereference at virtual address 00000000
> May 14 14:02:17 ats-data-0 kernel: printing eip:
> May 14 14:02:17 ats-data-0 kernel: c01831e3
> May 14 14:02:17 ats-data-0 kernel: *pde = 00000000
> May 14 14:02:17 ats-data-0 kernel: Oops: 0002
> May 14 14:02:17 ats-data-0 kernel: CPU: 0
> May 14 14:02:17 ats-data-0 kernel: EIP: 0010:[fh_compose+483/788]
> Not tainted
> May 14 14:02:17 ats-data-0 kernel: EFLAGS: 00010203
> May 14 14:02:17 ats-data-0 kernel: eax: 00000040 ebx: de9c5e8c ecx:
> de9c5e9c edx: 00000004
> May 14 14:02:17 ats-data-0 kernel: esi: de9c5e50 edi: 00000000 ebp:
> dd1b9940 esp: de9c5e14
> May 14 14:02:17 ats-data-0 kernel: ds: 0018 es: 0018 ss: 0018
> May 14 14:02:17 ats-data-0 kernel: Process rpc.mountd (pid: 751,
> stackpage=de9c5000)
> May 14 14:02:17 ats-data-0 kernel: Stack: de9c5e8c 00000083 de9c5f1c
> cac02800 019cc780 dd1b9940 c0141f18 de694320
> May 14 14:02:18 ats-data-0 kernel: de9c5f1c 00000000 cc3b2000
> cac02800 c0186582 de9c5e9c dd1679a0 0000000d
> May 14 14:02:18 ats-data-0 kernel: c0186d06 de9c5e8c cac02800
> dd1b9940 00000000 0000041c cc3b2004 cc3b2000
> May 14 14:02:18 ats-data-0 kernel: Call Trace: [link_path_walk+1872/2072]
> [exp_parent+50/68] [exp_rootfh+538/632] [sys_nfsservctl+878/1028] [filp_cl
> ose+156/168]
> May 14 14:02:18 ats-data-0 kernel: [sys_close+91/112]
> [system_call+51/56]
> May 14 14:02:18 ats-data-0 kernel:
> May 14 14:02:18 ats-data-0 kernel: Code: c7 07 00 00 00 00 83 c7 04 4a 79
> f4 8b 55 08 8b 4c 24 48 8b
>
> I assume then that this patch (below) needs to be updated somewhere for
> 2.4.18. I tried diving in to see if I could figure out where/why/etc...,
> but I have to admit that I do not see what is broken.
>
> Is there a newer version of the IRIX nfs client patch (IIRC Neil has said
> it would not ever go into the kernel because it was a temporary workaround
> for a bug in IRIX - the problem does not occur with IRIX 6.5.14+)?
>
> If not, does someone see what needs to be changed/fixed, etc...?
>
> Here is the patch:
> *** fs/nfsd/nfsfh.c 2001/02/14 03:20:12 1.1
> --- fs/nfsd/nfsfh.c 2001/02/14 04:23:40
> ***************
> *** 699,705 ****
> * an inode. In this case a call to fh_update should be made
> * before the fh goes out on the wire ...
> */
> ! inline int _fh_update(struct dentry *dentry, struct svc_export *exp,
> __u32 **datapp, int maxsize)
> {
> __u32 *datap= *datapp;
> --- 699,705 ----
> * an inode. In this case a call to fh_update should be made
> * before the fh goes out on the wire ...
> */
> ! inline int _fh_update2(struct dentry *dentry, struct svc_export *exp,
> __u32 **datapp, int maxsize)
> {
> __u32 *datap= *datapp;
> ***************
> *** 717,723 ****
> *datapp = datap;
> return 2;
> }
> !
> int
> fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry)
> {
> --- 717,733 ----
> *datapp = datap;
> return 2;
> }
> ! inline int _fh_update(struct dentry *dentry, struct svc_export *exp,
> ! __u32 **datapp, int maxsize)
> ! {
> ! __u32 *datap = *datapp;
> ! int i;
> ! for (i=3;i<8;i++)
> ! *datap++ = 0;
> ! i = _fh_update2(dentry, exp, datapp, maxsize);
> ! *datapp = datap;
> ! return i;
> ! }
> int
> fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry)
> {
>
>
>

--
Ryan Sweet <[email protected]>
Atos Origin Engineering Services
http://www.aoes.nl



_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-15 21:44:19

by Jeff L. Smith

[permalink] [raw]
Subject: Re: 2.4.18 knfsd load spikes

/* */
/* Written by Roger Heflin [email protected] [email protected] */
/* */

/* Simulates an application writting multiple data streams to several */
/* file to duplicate an application IO issues */
/* code tries to note when a write takes alot longer than expected */
/* and does appear to be able to sometimes detect the bdflush deamon */
/* under the correct conditions */

/* Quite a bit more error checking could be done at various points but
is not done */

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/time.h>
#include <unistd.h>

double my_tod()
{
struct timeval t1;

gettimeofday(&t1,NULL);

return(t1.tv_sec + (double)t1.tv_usec/1e6);
}

int main(int argc,char **argv)
{
char *directory;
int write_size;
char filename[30][256];
int *writebuffer;
long long num_writes;
long long max_writes;
FILE *fn[30];
int nfiles;
int i;
char hostname1[32];
unsigned long sleep_usec;
double delayt;
double write_time;
double bflush_time;
double start_time,end_time,start_time1,end_time1,start_time2,end_time2;
int cnt;
int num_slowwrite;

num_slowwrite = 0;

setlinebuf(stdout);

if (argc != 5)
{
fprintf(stderr,"Usage: %s directory size sleep_time numberoffiles\n",argv[0]);
fprintf(stderr," directory - is the directory to work in\n");
fprintf(stderr," size - is the block size to use for the writes\n");
fprintf(stderr," sleep_time is the sleep time in second to sleep\n");
fprintf(stderr," after writing to all files using really\n");
fprintf(stderr," small numbers below the kernel resolution will not\n");
fprintf(stderr," result in smaller times - decimals are allowed\n");
fprintf(stderr," numberoffiles - number of write streams\n");
exit(-1);
}

write_size = atoi(argv[2]);
delayt = atof(argv[3]);
directory=argv[1];
nfiles=atoi(argv[4]);

printf("write size is %d sleep time is %f\n",write_size,delayt);
printf("with %d files\n",nfiles);

sleep_usec = delayt * 1000000;
writebuffer = malloc(write_size+4);

if (writebuffer == 0)
{
fprintf(stderr,"Malloc of %d bytes failed - error is %s\n",write_size,strerror(errno));
exit(-1);
}
chdir(directory);

for (i=0;i<write_size/4;i++)
{
writebuffer[i] = i;
}
gethostname(hostname1,32);
for (i=0;i<nfiles;i++)
{
sprintf(filename[i],"%s/%s_%d.tmp_%d",directory,hostname1,getpid(),i);
fprintf(stderr,"Using filename %s\n",filename[i]);
}

/* max number of writes to do before doing a rewind */
/* the 2000 number is basically the max file size - adjusting
it smaller will allow testing with more files on small disks */
max_writes = 2000*1024*1024 / write_size;
num_writes = 0;
write_time = 0;

for (i=0;i<nfiles;i++)
{
fn[i] = fopen(filename[i],"wb");
if (fn[i] == NULL)
{
fprintf(stderr,"Failure opening file %s - error is %s\n",filename[i],strerror(errno));
exit(-1);
}
setvbuf(fn[i],malloc(write_size),_IOFBF,write_size);
}

cnt = 0;
start_time1 = my_tod();
start_time2 = start_time1;
bflush_time=start_time1;
while (1)
{
for (i=0;i<nfiles;i++)
{
if ((num_writes%max_writes) == 0)
{
fprintf(stderr,"Rewinding file %d\n",i);
fseek(fn[i],0,SEEK_SET);
}

start_time = my_tod();
if (fwrite(writebuffer,write_size,1,fn[i]) != 1)
{
fprintf(stderr,"Error fwrite failure - error was %s\n",strerror(errno));
exit(-1);
}
end_time = my_tod();
write_time += (end_time - start_time);

if (i==0)
num_writes ++;

if (end_time-start_time > 1.0)
{
num_slowwrite ++;
}
}

/* Only sleep once per every nfiles writes of write_size */

if (sleep_usec != 0)
usleep(sleep_usec);

/* Print out a rate every xxx writes for each file*/
if (cnt == 10)
{
end_time1 = my_tod();
end_time2 = end_time1;
printf("%s %8.1f secs - last wrt speed %8.2f MB/sec %10.2f GB written %8.2f MB/sec overall average %d slowwrites",
hostname1,
(end_time2-start_time2),
((cnt*write_size*nfiles) / (end_time1-start_time1))/(1024*1024),
((double)num_writes*(double)write_size*nfiles)/(1024*1024*1024),
(((double)num_writes*(double)write_size*nfiles) / (end_time2-start_time2))/(1024*1024),num_slowwrite);
if ( ( end_time1 - start_time1) > 5 )
{
printf(" Buffer flush %5.2f - last was %f seconds ago\n",end_time1 - start_time1,start_time1 - bflush_time);
bflush_time = start_time1;
}
else
{
printf("\n");
}

start_time1 = my_tod();
cnt = 0;
num_slowwrite = 0;
}

/* if (access("QUIT_NOW",F_OK) == 0)
{
printf("Average time per write %s %f\n",hostname1,write_time/num_writes);
for (i=0;i<nfiles;i++)
fclose(fn[i]);
exit(0);
} */
cnt ++;
}

}


Attachments:
slowspeed.c (4.89 kB)

2002-05-16 08:04:20

by Ryan Sweet

[permalink] [raw]
Subject: Re: 2.4.18 knfsd load spikes


hmm, I'm not convinced that we have _the_ same problem, but possibly they
are related. In particular my cpu utilisation (dual PIII733) is minimal
when this happens. What filesystems/NICS are you using? My server is
using an intel e1000.

I will test the program on a local disk to see if it also causes the
problem.

-ryan

On Wed, 15 May 2002, Jeff Smith wrote:

> Ahhhh... Welcome to my hell. I'm experiencing something similar but
> have no resolution. Here is the exchange I had with Roger Heflin who
> also had a similar problem. I was hoping this would go away with 2.4,
> but your experience leaves me very worried...
>
>
> "Heflin, Roger A." wrote:
>
> Compile it,
> run it with ./slowspeed . 65536 .0002 10
>
> This will write 10 files in a round robbin fashion, it will rewind just
> before
> it hits 2GB and start over again. 65536 is the block size which should
> eliminate any disk head thrash issues. The .0002 is a sleep time to
> use and may not really be sleeping much at all during this test.
>
> You will need about 20GB (10x2GB per file) to run this test, and the
> IO rates will be pretty good for a while and then will slowly start to
> drop over the next few hours until things become pretty bad. It
> appears
> to work over NFS or on local disk, it does not appear to work if you
> decrease the number of files to write to at the same time.
>
> Our machines are 440GX/BX's for the disk nodes with ASUS P2D's,
> we have been using the older slower machines for the disk as they
> seem to have no real issues until this happens, and then the faster
> machines appear to do no better. The disk nodes have 1GB ram.
>
> I went to eXtreme 3000 controllers and I like them more than the
> LVD scsi controllers (2000,1100), they appear to be less sensitive
> to cabling issues with the copper fiber channel.
>
> Roger
>
> > -----Original Message-----
> > From: Jeff Smith [SMTP:[email protected]]
> > Sent: 3/ 08/ 2002 12:41 PM
> > To: Heflin, Roger A.
> > Subject: Re: [NFS] IO write rate problem with multiple writers to
> > different files
> >
> > Is is possible to send me the test as well so that I can verify that
> > I'm
> > experiencing the same problem?
> >
> > Thanks,
> > Jeff
> >
> > "Heflin, Roger A." wrote:
> > >
> > > I am talking to Alan Cox and he seems interested in the problem,
> > > I have figured out that running the same job on the local machine
> > with
> > > multiple writers also kills the IO rate and have a fairly small test
> > > job that nicely duplicates the problem. I will be sending this to
> > Alan
> > > to see if it occurs on other kernels, and if so if it can be fixed
> > on on
> > > the
> > > other kernel and maybe on the 2.2 series.
> > >
> > > I am pretty leary of the 2.4 kernels as 2.2.19 is very very stable
> > and
> > > I don't know if 2.4 has this kind of stability.
> > >
> > > Roger
> > >
> > > > -----Original Message-----
> > > > From: Jeff Smith [SMTP:[email protected]]
> > > > Sent: 3/ 08/ 2002 10:40 AM
> > > > To: Heflin, Roger A.; Stephen Padnos
> > > > Subject: Re: [NFS] IO write rate problem with multiple
> > writers to
> > > > different files
> > > >
> > > > Be comforted that you are not alone. Every time we go through a
> > chip
> > > > tapeout, the number of large jobs rises, causing our NFS servers
> > to
> > > > suddenly fall off a cliff and exhibit the same symptoms (low IO
> > rate
> > > > plummets and the CPU utilization goes to 100%, all of it taken by
> > the
> > > > nfsd's). We are running 2.2.18.
> > > >
> > > > We've been trying for six months to find a window where we can
> > upgrade
> > > > to 2.4.X and pray that this resolves the problem, but these are
> > > > production server and cannot afford any downtime.
> > > >
> > > > Let me know if you get any unposted responses. I posted query a
> > few
> > > > months back, but no solutions were forthcoming. I would like to
> > feel
> > > > confident that whatever we try next will actually resolve the
> > problem.
> > > >
> > > > Jeff
> > > >
> > > >
> > > >
> > > > "Heflin, Roger A." wrote:
> > > > >
> > > > > Any ideas on increasing write IO rates in this situation?
> > > > >
> > > > > I am running 2.2.19 with the NFS released about the 2.2.19 was
> > > > released,
> > > > > and
> > > > > the IO writes slow down massively when there are multiple write
> > > > streams,
> > > > > it seems
> > > > > to require several files to be being written to a the same time.
> > > > The
> > > > > same behavior
> > > > > is not noticed with only 1 or 2 files being open and being
> > written
> > > > to.
> > > > > For the
> > > > > behavior to happen it takes 60+ minutes of sustained IO, the
> > buffer
> > > > > cache fills
> > > > > in the expected 2-4 minutes, and then things look pretty good
> > for
> > > > quite
> > > > > a while
> > > > > and around 60 minutes the IO rates start to fall until they hit
> > > > about
> > > > > 1/4-1/8 of
> > > > > the IO rate after the buffercache was filled. The machines are
> > > > being
> > > > > run with
> > > > > sync exports and sync mounts, but the problem was also observed
> > with
> > > > > sync
> > > > > mounts and async exports.
> > > > >
> > > > > The NFSd go to useing 60-80% of a dual cpu 600mhz PIII and the
> > IO
> > > > rate
> > > > > falls
> > > > > down to around 1.1-1.8 MB/second, and machine response
> > generally
> > > > falls
> > > > > apart.
> > > > > I don't understand why the NFSd are using this sort of cpu to do
> > > > this
> > > > > low of IO
> > > > > rate.
> > > > >
> > > > > The application is writing the data in 128kb chunks, and the
> > duty
> > > > cycle
> > > > > on
> > > > > the disk lights is under 50%.
> > > > >
> > > > > How does NFS interact with the kernel buffercache and could the
> > > > > buffercache
> > > > > be causing the problem?
> > > > > Roger
> > > > >
> > > > > _______________________________________________
> > > > > NFS maillist - [email protected]
> > > > > https://lists.sourceforge.net/lists/listinfo/nfs
> > > >
> > > > --
> > > > Jeff Smith Atheros
> > Communications,
> > > > Inc.
> > > > Hardware Manager 529 Almanor Avenue
> > > > (408) 773-5257 Sunnyvale, CA 94086
> >
> > --
> > Jeff Smith Atheros Communications,
> > Inc.
> > Hardware Manager 529 Almanor Avenue
> > (408) 773-5257 Sunnyvale, CA 94086
>
> Ryan Sweet wrote:
> >
> > I didn't get any responses to the message below, but I _did_ bite the
> > bullet and update the IRIX systems, and now the 64bit filehandle problem
> > is solved.
> >
> > However, the performance problem is not. With 2.4.18+xfs1.1, It is
> > definitely better (the load spikes to 7 or 8, sometimes 10, instead of 20
> > or 30...), but I still get the periods where suddenly the system will
> > respond _very_ slowly, cpu is mostly idle, memory is all used, but only
> > for cache, the system is not swapping at all, but the load climbs up and
> > up. It then gradually falls back down. The top processes are usually
> > bdflush and kupdated, with kupdated always in the dead wait (DW) state.
> > It is basically the same behaviour that we saw with 2.4.[2|5]+xfs1.0.2,
> > though not as painful. The problem usually lasts for 3 or four minutes,
> > then subsides.
> >
> > The problem seemed to begin around the time we added a few new, really
> > fast compute workstations, each of which is periodically doing thousands
> > of small writes/reads. I cannot yet make a direct correlation, however,
> > until I can get a decent tcpdump.
> >
> > does anyone have any pointers on where to begin looking? Have other
> > people seen this behaviour?
> >
> > thanks,
> > -Ryan
>
> ...
>
> > Ryan Sweet <[email protected]>
> > Atos Origin Engineering Services
> > http://www.aoes.nl
> >
> > _______________________________________________________________
> >
> > Have big pipes? SourceForge.net is looking for download mirrors. We supply
> > the hardware. You get the recognition. Email Us: [email protected]
> > _______________________________________________
> > NFS maillist - [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nfs
>
>

--
Ryan Sweet <[email protected]>
Atos Origin Engineering Services
http://www.aoes.nl



_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-16 16:30:52

by Jeff L. Smith

[permalink] [raw]
Subject: Re: 2.4.18 knfsd load spikes

We are running ext2 filesystems on a Supermicro dual P3 with Serverworks HE
chipset. As best I can tell (which probably does not count for much), the CPU
load comes from all the nfsd's holding off requests while waiting for a cache
flush. It happens whenever a particular job is run which slowly reads and
extends a very large file. When we suspend the job, every thing returns to
normal. When we resume the job, everything continues to run normally for a
while, but soon begins to bog down the fileserver again.

Anyway, I hope you are right that you are experiencing a different problem.
Scheduling downtime around here is difficult, but hopefully in the next few
weeks I will be able to upgrade the fileservers to 2.4.18 (or 2.4.19?). In the
mean time, I'm trying to build a test machine to replicate the problem (and,
hopefully, verify the fix).

Jeff

Ryan Sweet wrote:
>
> hmm, I'm not convinced that we have _the_ same problem, but possibly they
> are related. In particular my cpu utilisation (dual PIII733) is minimal
> when this happens. What filesystems/NICS are you using? My server is
> using an intel e1000.
>
> I will test the program on a local disk to see if it also causes the
> problem.
>
> -ryan
>
> On Wed, 15 May 2002, Jeff Smith wrote:
>
> > Ahhhh... Welcome to my hell. I'm experiencing something similar but
> > have no resolution. Here is the exchange I had with Roger Heflin who
> > also had a similar problem. I was hoping this would go away with 2.4,
> > but your experience leaves me very worried...
> >
> >
> > "Heflin, Roger A." wrote:
> >
> > Compile it,
> > run it with ./slowspeed . 65536 .0002 10
> >
> > This will write 10 files in a round robbin fashion, it will rewind just
> > before
> > it hits 2GB and start over again. 65536 is the block size which should
> > eliminate any disk head thrash issues. The .0002 is a sleep time to
> > use and may not really be sleeping much at all during this test.
> >
> > You will need about 20GB (10x2GB per file) to run this test, and the
> > IO rates will be pretty good for a while and then will slowly start to
> > drop over the next few hours until things become pretty bad. It
> > appears
> > to work over NFS or on local disk, it does not appear to work if you
> > decrease the number of files to write to at the same time.
> >
> > Our machines are 440GX/BX's for the disk nodes with ASUS P2D's,
> > we have been using the older slower machines for the disk as they
> > seem to have no real issues until this happens, and then the faster
> > machines appear to do no better. The disk nodes have 1GB ram.
> >
> > I went to eXtreme 3000 controllers and I like them more than the
> > LVD scsi controllers (2000,1100), they appear to be less sensitive
> > to cabling issues with the copper fiber channel.
> >
> > Roger
> >
> > > -----Original Message-----
> > > From: Jeff Smith [SMTP:[email protected]]
> > > Sent: 3/ 08/ 2002 12:41 PM
> > > To: Heflin, Roger A.
> > > Subject: Re: [NFS] IO write rate problem with multiple writers to
> > > different files
> > >
> > > Is is possible to send me the test as well so that I can verify that
> > > I'm
> > > experiencing the same problem?
> > >
> > > Thanks,
> > > Jeff
> > >
> > > "Heflin, Roger A." wrote:
> > > >
> > > > I am talking to Alan Cox and he seems interested in the problem,
> > > > I have figured out that running the same job on the local machine
> > > with
> > > > multiple writers also kills the IO rate and have a fairly small test
> > > > job that nicely duplicates the problem. I will be sending this to
> > > Alan
> > > > to see if it occurs on other kernels, and if so if it can be fixed
> > > on on
> > > > the
> > > > other kernel and maybe on the 2.2 series.
> > > >
> > > > I am pretty leary of the 2.4 kernels as 2.2.19 is very very stable
> > > and
> > > > I don't know if 2.4 has this kind of stability.
> > > >
> > > > Roger
> > > >
> > > > > -----Original Message-----
> > > > > From: Jeff Smith [SMTP:[email protected]]
> > > > > Sent: 3/ 08/ 2002 10:40 AM
> > > > > To: Heflin, Roger A.; Stephen Padnos
> > > > > Subject: Re: [NFS] IO write rate problem with multiple
> > > writers to
> > > > > different files
> > > > >
> > > > > Be comforted that you are not alone. Every time we go through a
> > > chip
> > > > > tapeout, the number of large jobs rises, causing our NFS servers
> > > to
> > > > > suddenly fall off a cliff and exhibit the same symptoms (low IO
> > > rate
> > > > > plummets and the CPU utilization goes to 100%, all of it taken by
> > > the
> > > > > nfsd's). We are running 2.2.18.
> > > > >
> > > > > We've been trying for six months to find a window where we can
> > > upgrade
> > > > > to 2.4.X and pray that this resolves the problem, but these are
> > > > > production server and cannot afford any downtime.
> > > > >
> > > > > Let me know if you get any unposted responses. I posted query a
> > > few
> > > > > months back, but no solutions were forthcoming. I would like to
> > > feel
> > > > > confident that whatever we try next will actually resolve the
> > > problem.
> > > > >
> > > > > Jeff
> > > > >
> > > > >
> > > > >
> > > > > "Heflin, Roger A." wrote:
> > > > > >
> > > > > > Any ideas on increasing write IO rates in this situation?
> > > > > >
> > > > > > I am running 2.2.19 with the NFS released about the 2.2.19 was
> > > > > released,
> > > > > > and
> > > > > > the IO writes slow down massively when there are multiple write
> > > > > streams,
> > > > > > it seems
> > > > > > to require several files to be being written to a the same time.
> > > > > The
> > > > > > same behavior
> > > > > > is not noticed with only 1 or 2 files being open and being
> > > written
> > > > > to.
> > > > > > For the
> > > > > > behavior to happen it takes 60+ minutes of sustained IO, the
> > > buffer
> > > > > > cache fills
> > > > > > in the expected 2-4 minutes, and then things look pretty good
> > > for
> > > > > quite
> > > > > > a while
> > > > > > and around 60 minutes the IO rates start to fall until they hit
> > > > > about
> > > > > > 1/4-1/8 of
> > > > > > the IO rate after the buffercache was filled. The machines are
> > > > > being
> > > > > > run with
> > > > > > sync exports and sync mounts, but the problem was also observed
> > > with
> > > > > > sync
> > > > > > mounts and async exports.
> > > > > >
> > > > > > The NFSd go to useing 60-80% of a dual cpu 600mhz PIII and the
> > > IO
> > > > > rate
> > > > > > falls
> > > > > > down to around 1.1-1.8 MB/second, and machine response
> > > generally
> > > > > falls
> > > > > > apart.
> > > > > > I don't understand why the NFSd are using this sort of cpu to do
> > > > > this
> > > > > > low of IO
> > > > > > rate.
> > > > > >
> > > > > > The application is writing the data in 128kb chunks, and the
> > > duty
> > > > > cycle
> > > > > > on
> > > > > > the disk lights is under 50%.
> > > > > >
> > > > > > How does NFS interact with the kernel buffercache and could the
> > > > > > buffercache
> > > > > > be causing the problem?
> > > > > > Roger
> > > > > >
> > > > > > _______________________________________________
> > > > > > NFS maillist - [email protected]
> > > > > > https://lists.sourceforge.net/lists/listinfo/nfs
> > > > >
> > > > > --
> > > > > Jeff Smith Atheros
> > > Communications,
> > > > > Inc.
> > > > > Hardware Manager 529 Almanor Avenue
> > > > > (408) 773-5257 Sunnyvale, CA 94086
> > >
> > > --
> > > Jeff Smith Atheros Communications,
> > > Inc.
> > > Hardware Manager 529 Almanor Avenue
> > > (408) 773-5257 Sunnyvale, CA 94086
> >
> > Ryan Sweet wrote:
> > >
> > > I didn't get any responses to the message below, but I _did_ bite the
> > > bullet and update the IRIX systems, and now the 64bit filehandle problem
> > > is solved.
> > >
> > > However, the performance problem is not. With 2.4.18+xfs1.1, It is
> > > definitely better (the load spikes to 7 or 8, sometimes 10, instead of 20
> > > or 30...), but I still get the periods where suddenly the system will
> > > respond _very_ slowly, cpu is mostly idle, memory is all used, but only
> > > for cache, the system is not swapping at all, but the load climbs up and
> > > up. It then gradually falls back down. The top processes are usually
> > > bdflush and kupdated, with kupdated always in the dead wait (DW) state.
> > > It is basically the same behaviour that we saw with 2.4.[2|5]+xfs1.0.2,
> > > though not as painful. The problem usually lasts for 3 or four minutes,
> > > then subsides.
> > >
> > > The problem seemed to begin around the time we added a few new, really
> > > fast compute workstations, each of which is periodically doing thousands
> > > of small writes/reads. I cannot yet make a direct correlation, however,
> > > until I can get a decent tcpdump.
> > >
> > > does anyone have any pointers on where to begin looking? Have other
> > > people seen this behaviour?
> > >
> > > thanks,
> > > -Ryan
> >
> > ...
> >
> > > Ryan Sweet <[email protected]>
> > > Atos Origin Engineering Services
> > > http://www.aoes.nl
> > >
> > > _______________________________________________________________
> > >
> > > Have big pipes? SourceForge.net is looking for download mirrors. We supply
> > > the hardware. You get the recognition. Email Us: [email protected]
> > > _______________________________________________
> > > NFS maillist - [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/nfs
> >
> >
>
> --
> Ryan Sweet <[email protected]>
> Atos Origin Engineering Services
> http://www.aoes.nl
>
> _______________________________________________________________
>
> Have big pipes? SourceForge.net is looking for download mirrors. We supply
> the hardware. You get the recognition. Email Us: [email protected]
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs

--
Jeff Smith Atheros Communications, Inc.
Hardware Manager 529 Almanor Avenue
(408) 773-5257 Sunnyvale, CA 94086

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-16 17:55:49

by Eric Whiting

[permalink] [raw]
Subject: Re: 2.4.18 knfsd load spikes

I see the load spikes as well. A ps shows the nfsd processes in the 'DW'
state. DW isn't bad, but when it sits there a long time then the load
jumps up. (could be disk or network related I think?) This seems similar
to what you describe here. Does the load average ramp up to the number
of nfsd threads?

eric


Jeff Smith wrote:
>
> We are running ext2 filesystems on a Supermicro dual P3 with Serverworks HE
> chipset. As best I can tell (which probably does not count for much), the CPU
> load comes from all the nfsd's holding off requests while waiting for a cache
> flush. It happens whenever a particular job is run which slowly reads and
> extends a very large file. When we suspend the job, every thing returns to
> normal. When we resume the job, everything continues to run normally for a
> while, but soon begins to bog down the fileserver again.
>
> Anyway, I hope you are right that you are experiencing a different problem.
> Scheduling downtime around here is difficult, but hopefully in the next few
> weeks I will be able to upgrade the fileservers to 2.4.18 (or 2.4.19?). In the
> mean time, I'm trying to build a test machine to replicate the problem (and,
> hopefully, verify the fix).
>

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-05-16 18:19:27

by Jeff L. Smith

[permalink] [raw]
Subject: Re: 2.4.18 knfsd load spikes

It is exactly as you describe. Before this started happening, I would run 16
nfsd threads. And when it started happening the load would creep up to 16 as
the server grinds to a halt. To mitigate this, I've dropped down to 2 nfsd
threads so that the machine does not die before I can locate and kill the
"offending" job.

Jeff

Eric Whiting wrote:
>
> I see the load spikes as well. A ps shows the nfsd processes in the 'DW'
> state. DW isn't bad, but when it sits there a long time then the load
> jumps up. (could be disk or network related I think?) This seems similar
> to what you describe here. Does the load average ramp up to the number
> of nfsd threads?
>
> eric
>
> Jeff Smith wrote:
> >
> > We are running ext2 filesystems on a Supermicro dual P3 with Serverworks HE
> > chipset. As best I can tell (which probably does not count for much), the CPU
> > load comes from all the nfsd's holding off requests while waiting for a cache
> > flush. It happens whenever a particular job is run which slowly reads and
> > extends a very large file. When we suspend the job, every thing returns to
> > normal. When we resume the job, everything continues to run normally for a
> > while, but soon begins to bog down the fileserver again.
> >
> > Anyway, I hope you are right that you are experiencing a different problem.
> > Scheduling downtime around here is difficult, but hopefully in the next few
> > weeks I will be able to upgrade the fileservers to 2.4.18 (or 2.4.19?). In the
> > mean time, I'm trying to build a test machine to replicate the problem (and,
> > hopefully, verify the fix).
> >

--
Jeff Smith Atheros Communications, Inc.
Hardware Manager 529 Almanor Avenue
(408) 773-5257 Sunnyvale, CA 94086

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [email protected]
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs