2011-08-20 00:15:43

by Kelsey Cummings

[permalink] [raw]
Subject: Odd locking behavior

I'm looking for some suggestions as to where to look next with an odd
nfs file locking problem with linux clients (RHEL6) and a netapp server.
FSC is enabled on these.

The rather simple example code and output says it all. It works as
expected when a single process is run, when a second proccess is started
it works for a while and then something breaks and we start seeing it
take multiples of 30 seconds for the lock request to succeed. The more
proccess that are run, the more likely it is to occur as well as if a
proccess is started on a second client.

When the issue is exhibiting itself, the netaps' lock status display the
granted and waiting locks as expected(?).

..
6681 0x00085ec8:0x71082409 0:0 1 GWAITING (0x64ea6728)
6645 0x00085ec8:0x71082409 0:0 1 GRANTED (0x523bb598)


# perl t.pl
locking lock-test.1 0.000672
locking lock-test.1 0.000229
locking lock-test.1 0.000244
locking lock-test.1 1.000247 #second process started, expected 1s
locking lock-test.1 0.999786
locking lock-test.1 1.000069
locking lock-test.1 31.000492 #third process is started, at worst we expect 2s for the lock
locking lock-test.1 60.000472
locking lock-test.1 90.001829
locking lock-test.1 89.999942
locking lock-test.1 90.001407
locking lock-test.1 90.00091
locking lock-test.1 90.000044
locking lock-test.1 60.000306

#!/usr/bin/perl

use Fcntl qw(:flock);
use Time::HiRes qw(tv_interval gettimeofday);

$| = 1;

my $file = "lock-test";

open (T,">$file") || die "Couldn't open file.\n";
print T $file;
close(T);

#now run lock tests
while ( 1 )
{
open (F, $file);
my $start = [gettimeofday];
print "locking $file ";
flock(F,LOCK_EX);
print $elapsed = tv_interval ( $start, [gettimeofday]) . "\n";
usleep(10);
close(F);
}


--
Kelsey Cummings - [email protected] sonic.net, inc.
System Architect 2260 Apollo Way
707.522.1000 Santa Rosa, CA 95407


2011-08-21 01:14:37

by Kelsey Cummings

[permalink] [raw]
Subject: Re: Odd locking behavior

On 8/20/2011 9:59 AM, Trond Myklebust wrote:
> Does the netapp filer have multiple NICs? There is a known bug with that
> in which the filer sends GRANTED calls from the wrong IP address: see
> the description at
> http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=237430

Yes, it does. I ran out of time but was starting to suspect that might
be a problem. It looks like maybe some older 2.4 clients are okay with
this? Don't ask.... :)

The Ontap version is suitably ancient due to heroic uptimes and (until
now) no apparent bugs. I doubt it has been upgraded (or crashed) since
it was deployed when the 3020 was the hot new thing. Looks like it's
time to catch up. ;)

Thanks for the tip!

--
Kelsey Cummings - [email protected] sonic.net, inc.
System Architect 2260 Apollo Way
707.522.1000 Santa Rosa, CA 95407

2011-08-20 16:59:19

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Odd locking behavior

On Fri, 2011-08-19 at 16:01 -0700, Kelsey Cummings wrote:
> I'm looking for some suggestions as to where to look next with an odd
> nfs file locking problem with linux clients (RHEL6) and a netapp server.
> FSC is enabled on these.
>
> The rather simple example code and output says it all. It works as
> expected when a single process is run, when a second proccess is started
> it works for a while and then something breaks and we start seeing it
> take multiples of 30 seconds for the lock request to succeed. The more
> proccess that are run, the more likely it is to occur as well as if a
> proccess is started on a second client.
>
> When the issue is exhibiting itself, the netaps' lock status display the
> granted and waiting locks as expected(?).

Does the netapp filer have multiple NICs? There is a known bug with that
in which the filer sends GRANTED calls from the wrong IP address: see
the description at
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=237430

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-08-20 18:24:01

by Haynes, Tom

[permalink] [raw]
Subject: Re: Odd locking behavior



On Aug 20, 2011, at 11:59 AM, "Trond Myklebust" <[email protected]> wrote:

> On Fri, 2011-08-19 at 16:01 -0700, Kelsey Cummings wrote:
>> I'm looking for some suggestions as to where to look next with an odd
>> nfs file locking problem with linux clients (RHEL6) and a netapp server.
>> FSC is enabled on these.
>>
>> The rather simple example code and output says it all. It works as
>> expected when a single process is run, when a second proccess is started
>> it works for a while and then something breaks and we start seeing it
>> take multiples of 30 seconds for the lock request to succeed. The more
>> proccess that are run, the more likely it is to occur as well as if a
>> proccess is started on a second client.
>>
>> When the issue is exhibiting itself, the netaps' lock status display the
>> granted and waiting locks as expected(?).
>

Also, what version of ONTAP is the filer running?


> Does the netapp filer have multiple NICs? There is a known bug with that
> in which the filer sends GRANTED calls from the wrong IP address: see
> the description at
> http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=237430
>
> Cheers
> Trond
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html