2008-12-10 00:38:21

by Ray Lee

[permalink] [raw]
Subject: Fwd: 2.6.26.3 kernel - progressive slowdown over NFS

[ adding netdev, linux-nfs to cc: -- rbl ]

Hi,


We have a simple python program which keeps running a C loop to lstat
NFS mounted directories. We are seeing some weird behavior w.r.t. the
run-time of this program on 2.6.26.3 kernel vs 2.6.24 kernel.

The run-time of the following code increases over time on the 2.6.26.3
kernel, whereas remains flat (as expected) on the 2.6.24 kernel.
[See attached graphs - B1.jpg and B2.jpg] Once the 2.6.26.3 machine
gets into this state, we need to restart the box to get back to
reasonable run-times. Is this a known issue ?

Setup :

Machine A (2.6.26) : exports NFS directory - /a/baz contains 10,000
directories bar0 ... bar9999
/a/baz *(rw,sync,no_root_squash,no_all_squash,subtree_check)

Machine B1 (2.6.26.3) : mounts NFS dir RO from A - graph B1.jpg
10.x.x.x:/a/baz on /baz type nfs
(ro,vers=3,rsize=4096,wsize=4096,namlen=255,hard,nointr,nolock,proto=udp,timeo=11,retrans=2,sec=sys,mountproto=udp,addr=10.x.x.x)

Machine B2 (2.6.24) : mounts NFS dir RO from A - graph B2.jpg
10.x.x.x:/a/baz on /baz type nfs
(ro,vers=3,rsize=4096,wsize=4096,hard,nointr,nolock,proto=udp,timeo=11,retrans=2,sec=sys,addr=10.x.x.x)

Repro :

B1 and B2, run the following python program :
{{{
#!/usr/bin/env python

import os
import sys
import time

while True:
t1 = time.time()
rv = os.system("/a.out 10000 >& /dev/null") # lstat the 10000
directories mounted via NFS
t2 = time.time()
print >> sys.stderr, "%.3f" % (t2 - t1), rv
}}}

where a.out is the following C code :

{{{
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int main(int argc, char **argv)
{
int i, n = atoi(argv[1]);
char filename[64];
struct stat statbuf;

for (i = 0; i < n; i++) {
sprintf(filename, "/baz/bar%d", i);
lstat(filename, &statbuf);
}

printf("done\n");
return 0;
}
}}}

Attachments :

B1.jpg and B2.jpg - runtimes on the 2 client machines -
X-axis : iteration number
Y-axis : run-time

Config files : - for the machines B1, B2 and A

b1.2.6.26.3.config
b2.2.6.24.config
a.2.6.26.config


- P


Attachments:
(No filename) (2.06 kB)
B1.jpg (45.75 kB)
B2.jpg (33.16 kB)
b1.2.6.26.3.config (38.09 kB)
b2.2.6.24.config (27.29 kB)
a.2.6.26.config (37.59 kB)
Download all attachments

2008-12-10 01:06:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: Fwd: 2.6.26.3 kernel - progressive slowdown over NFS

On Tue, 2008-12-09 at 16:38 -0800, Ray Lee wrote:
> [ adding netdev, linux-nfs to cc: -- rbl ]
>
> Hi,
>
>
> We have a simple python program which keeps running a C loop to lstat
> NFS mounted directories. We are seeing some weird behavior w.r.t. the
> run-time of this program on 2.6.26.3 kernel vs 2.6.24 kernel.
>
> The run-time of the following code increases over time on the 2.6.26.3
> kernel, whereas remains flat (as expected) on the 2.6.24 kernel.
> [See attached graphs - B1.jpg and B2.jpg] Once the 2.6.26.3 machine
> gets into this state, we need to restart the box to get back to
> reasonable run-times. Is this a known issue ?

Could you try applying the following patch?

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=23918b03060f6e572168fdde1798a905679d2e06

Trond

PS: note that your 2.6.26 config file has _very_ different memory models
and kernel debugging options enabled compared to your 2.6.24 kernel.
That may also affect performance.


2008-12-10 01:12:53

by Ray Lee

[permalink] [raw]
Subject: Re: Fwd: 2.6.26.3 kernel - progressive slowdown over NFS

[ sigh, adding back original reporter. Oops. -- rbl ]

Priyank Patel, please see Trond's response below and try out the
suggested patch linked below.

On Tue, Dec 9, 2008 at 5:06 PM, Trond Myklebust
<[email protected]> wrote:
> On Tue, 2008-12-09 at 16:38 -0800, Ray Lee wrote:
>> [ adding netdev, linux-nfs to cc: -- rbl ]
>>
>> Hi,
>>
>>
>> We have a simple python program which keeps running a C loop to lstat
>> NFS mounted directories. We are seeing some weird behavior w.r.t. the
>> run-time of this program on 2.6.26.3 kernel vs 2.6.24 kernel.
>>
>> The run-time of the following code increases over time on the 2.6.26.3
>> kernel, whereas remains flat (as expected) on the 2.6.24 kernel.
>> [See attached graphs - B1.jpg and B2.jpg] Once the 2.6.26.3 machine
>> gets into this state, we need to restart the box to get back to
>> reasonable run-times. Is this a known issue ?
>
> Could you try applying the following patch?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=23918b03060f6e572168fdde1798a905679d2e06
>
> Trond
>
> PS: note that your 2.6.26 config file has _very_ different memory models
> and kernel debugging options enabled compared to your 2.6.24 kernel.
> That may also affect performance.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>