2004-08-20 15:08:18

by Ara.T.Howard

[permalink] [raw]
Subject: file system read locks


i have a perfectly functioning filesystem based write lock algorithim
(link(2)). has anyone out there come up with an algorithim to make __read__
locks using file system primitives?

kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-08-20 15:39:25

by Olaf Kirch

[permalink] [raw]
Subject: Re: file system read locks

On Fri, Aug 20, 2004 at 09:08:15AM -0600, Ara.T.Howard wrote:
> i have a perfectly functioning filesystem based write lock algorithim
> (link(2)).

Except that these FS based approaches don't support blocking; you
always have to poll.

> has anyone out there come up with an algorithim to make __read__
> locks using file system primitives?

Take a directory X. If the directory exists and is empty, the lock is not
taken by anyone. To take a read lock, create a file in that directory.
To take a write lock, remove the directory.

(This scheme has the drawback that it's highly unfair to writers, but you
can probably make it favor writers if you start to move it around rather
than rmdir it)

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-20 16:15:48

by Ara.T.Howard

[permalink] [raw]
Subject: Re: file system read locks

On Fri, 20 Aug 2004, Olaf Kirch wrote:

> On Fri, Aug 20, 2004 at 09:08:15AM -0600, Ara.T.Howard wrote:
>> i have a perfectly functioning filesystem based write lock algorithim
>> (link(2)).
>
> Except that these FS based approaches don't support blocking; you
> always have to poll.

yes - yet my filesystem locks give much, much, much better performance than
lockd does when the lock is under heavy contention. the algorithim is a
glorified poll but works really well:

it is controled by these configurable values:

poll_attempts :

how many rapid attempts we'll make in a row

min_poll_time :

minimum amount of time we'll sleep between rapid 'polling' attempts

max_poll_time :

maximum amount of time we'll sleep between rapid 'polling' attempts

min_sleep_time :

minimum amount of time we'll sleep between sessions of rapid 'polling'
attempts

max_sleep_time :

maximum amount of time we'll sleep between sessions of rapid 'polling'
attempts


sleep_time and sleep_inc start at min_sleep_time.

to get the lock the link is attempted rapidly poll_attempts times, sleeping a
random number between min_poll_time and max_poll_time. these values are
typically something like 16, 0.01 and 0.10 repsectively.

if not success and sleep_time < max_sleep_time, increment sleep_time by
sleep_inc and sleep that much before retrying. if not success and sleep_time
>= max_sleep_time decrement sleep_time by sleep_inc and sleep that much before
>retrying.

in otherwords, there is a repeating cycle of attempts of grab the lock. the
initial phase of the cycle has the requester backing off - being patient.
however the requester eventually become impatient and starts waiting less time
untill the minimum is reaached, he become patient again, and the cycle
repeats. each 'attempt' is actually a bunch of attempts rapidly in
succession. you can picture a sine wave puncuated with dots where many rapid
polling atempts are made at closely spaced but random intervals separated by
periods of fluctuating timeouts.

i'm very happy with this algorithim as it seems to provide very very good
performance under heavy loads. i've tested with 30 nodes all competing to
update a file and see min, max, and avg sleep times of about 0, 2, and 2
seconds respectively. when i repeat the test using lockd is see min, max, and
avg of about 0, 300, and 30 respectively. so perhaps polling is not that bad!
;-)

i've not read the locking code, but the performance seems to indicate long
timeouts which never change (plateau) - so when requesters make a single
timeout they will wait a very, very long time to get the lock if it's under
heavy contention since, chances are, it will be held when they next ask and
then sleep for the same long time again. like i said, i routinely see
timeouts in my test of 300, even 900 seconds. this is using 30 nodes
competing to do a .2 second update to a file!

> Take a directory X. If the directory exists and is empty, the lock is not
> taken by anyone. To take a read lock, create a file in that directory. To
> take a write lock, remove the directory.

just to clarify (we assert that mkdir AND rmdir are atomic and report the
correct error code on clients) and

def read_lock dir
begin
FileUtils.touch "#{ dir }/#{ hostname }.#{ pid }"
true
rescue Errno::ENOENT
false
end
end

def write_lock dir
begin
Dir.rmdir dir
rescue Errno::ENOTEMPTY
false
end
end

what gothca are there? for instance when using link(2) you cannot trust the
return codes and then must use stat - do similar problems exist? for which
nfs impls do you think this might work?

thanks a __lot__ for the ideas, i have been wondering how to do read locks for
a while - should've thought about how semaphores work for while and might have
come up with this myself but i was hung up thinking in terms of link! i will
begin implementing your ideas in a LockDirectory class and add it to my
LockFile package (algorithim attributed to you!). my package, which has both
ruby api and command line tools can be found at

http://raa.ruby-lang.org/project/lockfile/

though the download server will be down today. i'd be happy for any testers!

> Olaf Kirch | The Hardware Gods hate me.

me too - i'm burning out disks at the rate of once per week! ;-)

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-20 19:09:34

by Ara.T.Howard

[permalink] [raw]
Subject: Re: file system read locks

On Fri, 20 Aug 2004, Olaf Kirch wrote:

> On Fri, Aug 20, 2004 at 09:08:15AM -0600, Ara.T.Howard wrote:
>> i have a perfectly functioning filesystem based write lock algorithim
>> (link(2)).
>
> Except that these FS based approaches don't support blocking; you
> always have to poll.
>
>> has anyone out there come up with an algorithim to make __read__
>> locks using file system primitives?
>
> Take a directory X. If the directory exists and is empty, the lock is not
> taken by anyone. To take a read lock, create a file in that directory.
> To take a write lock, remove the directory.
>
> (This scheme has the drawback that it's highly unfair to writers, but you
> can probably make it favor writers if you start to move it around rather
> than rmdir it)


seems to work well - the problem is when/how to create the directory in case
of an aborted writer... the dir will not exist - how to know when it's o.k to
create it...

i thinking that initially one could create two dirs

dir
.dir

while holding the lock it will be the responsibility of the lock holder to
keep .dir fresh via touching it. if the .dir directory is ever found to be
stale we can guess that a process holding the lock has died without recreating
the lock directory dir. at that point the process can lock the .dir (rmdir)
and recreate dir. obviously only one process could succeed at this. of
course, if this process died between removing .dir and creating dir this would
be bad. however it seems this would be very, very rare since it would require
first one client dying w/o cleaning up, followed closely by another.

toughts?

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-21 21:04:29

by Goutham Kurra

[permalink] [raw]
Subject: strange behavior with gigE switch and 2.4.22 kernel



I have the following configuration:

1. linux nfs client kernel 2.4.22 with nfs (client,
sunrpc, lockd) patches applied from trond's 2.4.22
patches. (have run connectathon on this with linux nfs
server backend successfully)

2. netapp nfs server (F760).

There's a gigE switch inbetween the client and server.

With a gigE switch in between, the 2.4.22 client
mounts the filesystem, but hangs indefinitely on a
'ls' on the mount. tcpdump shows that the readdir
request is being retransmitted over and over again.

When I replace the gigE switch with a 100/FE switch,
everything works great.

When I keep the gigE switch and use a stock redhat 9
(kernel 2.4.20-6) linux client, everything seems to
work well too.

So here's the problematic combination: linux 2.4.22
(with or without trond's patches), gigE switch in
between, and netapp nfs filer.

Any ideas on what's going on, or how I should begin to
troubleshoot/debug it?

thanks,
goutham




__________________________________
Do you Yahoo!?
Yahoo! Mail - Helps protect you from nasty viruses.
http://promotions.yahoo.com/new_mail


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs