2004-06-24 23:11:22

by Ken Preslan

[permalink] [raw]
Subject: [RFC] Patch to allow distributed flock

Hi,

I'd like to start a discussion about changing the VFS so it allows
flocks to be enforced between machines in a cluster filesystem (such as
GFS). The purpose of GFS it so allow local filesystem semantics
to a filesystem shared between the nodes of a cluster of tightly-coupled
machines. As such, flock is probably expected to work across the cluster.

What are everyone's thoughts on a patch such as this?

Thanks.


# Make the VFS call down into the FS on flock calls.
diff -urN -p linux-2.6.7/fs/locks.c linux/fs/locks.c
--- linux-2.6.7/fs/locks.c 2004-06-16 12:00:44.567463632 -0500
+++ linux/fs/locks.c 2004-06-16 12:01:58.844205936 -0500
@@ -1294,6 +1294,27 @@ out_unlock:
return error;
}

+/*
+ * Wrapper function around the file_operations lock routine when called for
+ * flock(). The lock routine is called for both fcntl() and flock(), so
+ * the flock parameters must be translated to an equivalent fcntl()-like
+ * lock.
+ *
+ * Don't use locks_alloc_lock() (or flock_make_lock()) here, as
+ * this is just a temporary lock structure. We especially don't
+ * want to fail because we couldn't allocate a lock structure if
+ * this is an unlock operation.
+ */
+int flock_fs_file(struct file *filp, int type, int wait)
+{
+ struct file_lock fl = { .fl_flags = FL_FLOCK,
+ .fl_type = type };
+
+ return filp->f_op->lock(filp,
+ (wait) ? F_SETLKW : F_SETLK,
+ &fl);
+}
+
/**
* sys_flock: - flock() system call.
* @fd: the file descriptor to lock.
@@ -1342,6 +1363,50 @@ asmlinkage long sys_flock(unsigned int f
if (error)
goto out_free;

+ /*
+ * Execute any filesystem-specific flock routines. The filesystem may
+ * maintain supplemental locks. This code allows the supplemental locks
+ * to be kept in sync with the vfs flock lock. If flock() is called on
+ * a lock already held for the given filp, the current flock lock is
+ * dropped before obtaining the requested lock. This unlock operation
+ * must be completed for the any filesystem specific locks and the vfs
+ * flock lock before proceeding with obtaining the requested lock. When
+ * the filesystem routine drops a lock for such a request, it must
+ * return -EDEADLK, allowing the vfs lock to be dropped, and the
+ * filesystem code is then re-executed to obtain the lock.
+ *
+ * A non-blocking request that returns EWOULDBLOCK also causes any vfs
+ * flock lock to be released, but then returns the error to the caller.
+ */
+ if (filp->f_op && filp->f_op->lock) {
+ repeat:
+ error = flock_fs_file(filp, lock->fl_type, can_sleep);
+ if (error < 0) {
+ /*
+ * We may have dropped a lock. We need to
+ * finish unlocking before returning or
+ * continuing with lock acquisition.
+ */
+ if (error != -ENOLCK)
+ flock_lock_file(filp, &(struct file_lock){.fl_type = F_UNLCK});
+
+ /*
+ * We already held the lock in some mode, and
+ * had to drop filesystem-specific locks before
+ * proceeding. We come back through this
+ * routine to unlock the vfs flock lock. Now go
+ * back and try again. Using EAGAIN as the
+ * error here would be better, but the one valid
+ * error value defined for flock(), EWOULDBLOCK,
+ * is defined as EAGAIN.
+ */
+ if (error == -EDEADLK)
+ goto repeat;
+
+ goto out_free;
+ }
+ }
+
for (;;) {
error = flock_lock_file(filp, lock);
if ((error != -EAGAIN) || !can_sleep)
@@ -1354,6 +1419,13 @@ asmlinkage long sys_flock(unsigned int f
break;
}

+ /*
+ * If we failed to get the vfs flock, we need to clean up any
+ * filesystem-specific lock state that we previously obtained.
+ */
+ if (error && filp->f_op && filp->f_op->lock)
+ flock_fs_file(filp, F_UNLCK, 1);
+
out_free:
if (list_empty(&lock->fl_link)) {
locks_free_lock(lock);
@@ -1714,6 +1786,8 @@ void locks_remove_flock(struct file *fil
if (fl->fl_file == filp) {
if (IS_FLOCK(fl)) {
locks_delete_lock(before);
+ if (filp->f_op && filp->f_op->lock)
+ flock_fs_file(filp, F_UNLCK, 1);
continue;
}
if (IS_LEASE(fl)) {

--
Ken Preslan <[email protected]>


2004-06-24 23:54:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC] Patch to allow distributed flock

P? to , 24/06/2004 klokka 19:10, skreiv Ken Preslan:
> Hi,
>
> I'd like to start a discussion about changing the VFS so it allows
> flocks to be enforced between machines in a cluster filesystem (such as
> GFS). The purpose of GFS it so allow local filesystem semantics
> to a filesystem shared between the nodes of a cluster of tightly-coupled
> machines. As such, flock is probably expected to work across the cluster.
>
> What are everyone's thoughts on a patch such as this?

If you defer updating the VFS until after the ->lock() call returns,
then it makes it difficult to protect yourself against races (as I
argued about the POSIX lock interface on the list yesterday).

If you have the underlying filesystem call flock_lock_file() itself,
then that gives it the freedom to implement its own locking scheme
around that call.
For instance NFS has a thread that is supposed to reclaim locks if the
server reboots. We take a non-exclusive lock on an rwsem to ensure that
we block it while there are outstanding locking RPC calls, however that
rwsem has to be released before we return from the ->lock() call, and so
there exists a race after the rwsem was released until the
inode->i_flock list is updated.

Cheers,
Trond

2004-06-25 00:07:26

by Ken Preslan

[permalink] [raw]
Subject: Re: [RFC] Patch to allow distributed flock

On Thu, Jun 24, 2004 at 07:52:12PM -0400, Trond Myklebust wrote:
> If you defer updating the VFS until after the ->lock() call returns,
> then it makes it difficult to protect yourself against races (as I
> argued about the POSIX lock interface on the list yesterday).
>
> If you have the underlying filesystem call flock_lock_file() itself,
> then that gives it the freedom to implement its own locking scheme
> around that call.
> For instance NFS has a thread that is supposed to reclaim locks if the
> server reboots. We take a non-exclusive lock on an rwsem to ensure that
> we block it while there are outstanding locking RPC calls, however that
> rwsem has to be released before we return from the ->lock() call, and so
> there exists a race after the rwsem was released until the
> inode->i_flock list is updated.

Ah, good idea.

Something else I've been wondering:

If the FS is managing the posix locks and/or flocks, is there really a
reason to acquire the VFS versions of the locks too? As long as there is
some bit set that tells the VFS to call down into the FS to unlock the
locks on process exit, keeping both sets of locks seems wasteful.
What am I missing?

--
Ken Preslan <[email protected]>

2004-06-25 01:37:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC] Patch to allow distributed flock

P? to , 24/06/2004 klokka 20:07, skreiv Ken Preslan:

> If the FS is managing the posix locks and/or flocks, is there really a
> reason to acquire the VFS versions of the locks too? As long as there is
> some bit set that tells the VFS to call down into the FS to unlock the
> locks on process exit, keeping both sets of locks seems wasteful.
> What am I missing?

The only reason we care in NFS is in order to be able to deal with
server reboot recovery -- which requires you to have an extensive list
of all locks that are held -- and, as you note above, in order to ensure
that locks are all cleared upon process exit.

For flock() locks, I agree: you can probably whittle that information
down to a single bit per open file. For POSIX locks, you need the byte
ranges and lockowner/pid information too...

Cheers,
Trond

2004-06-25 01:47:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: [RFC] Patch to allow distributed flock

P? to , 24/06/2004 klokka 21:37, skreiv Trond Myklebust:
> P? to , 24/06/2004 klokka 20:07, skreiv Ken Preslan:
>
> > If the FS is managing the posix locks and/or flocks, is there really a
> > reason to acquire the VFS versions of the locks too? As long as there is
> > some bit set that tells the VFS to call down into the FS to unlock the
> > locks on process exit, keeping both sets of locks seems wasteful.
> > What am I missing?

...Oh I forgot to mention --- If you want to support mandatory locks (?)
then the VFS interface can serve as a billboard for posting those
regions of the file that are off-limits to other processes.

Cheers,
Trond