NFSv4 uses leases to offer clients better caching. Unfortunately that
doesn't work well on a cluster filesystem, since leases are acquired
only locally.
So, add the following patches add new lease and break methods, and a
simple lease implementation for GFS2 that just refuses to grant any
leases for now.
Opinions?
--b.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
From: J. Bruce Fields <[email protected]>
Currently leases are only kept locally, so there's no way for a distributed
filesystem to enforce them against multiple clients. We're particularly
interested in the case of nfsd exporting a cluster filesystem, in which
case nfsd needs cluster-coherent leases in order to implement delegations
correctly.
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/locks.c | 5 ++++-
include/linux/fs.h | 4 ++++
2 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 3f366e1..40a7f39 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1444,7 +1444,10 @@ int setlease(struct file *filp, long arg, struct file_lock **lease)
return error;
lock_kernel();
- error = __setlease(filp, arg, lease);
+ if (filp->f_op && filp->f_op->set_lease)
+ error = filp->f_op->set_lease(filp, arg, lease);
+ else
+ error = __setlease(filp, arg, lease);
unlock_kernel();
return error;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7cf0c54..09aefb4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1112,6 +1112,7 @@ struct file_operations {
int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
+ int (*set_lease)(struct file *, long, struct file_lock **);
};
struct inode_operations {
@@ -1137,6 +1138,7 @@ struct inode_operations {
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*removexattr) (struct dentry *, const char *);
void (*truncate_range)(struct inode *, loff_t, loff_t);
+ int (*break_lease)(struct inode *, unsigned int);
};
struct seq_file;
@@ -1463,6 +1465,8 @@ static inline int locks_verify_truncate(struct inode *inode,
static inline int break_lease(struct inode *inode, unsigned int mode)
{
+ if (inode->i_op && inode->i_op->break_lease)
+ return inode->i_op->break_lease(inode, mode);
if (inode->i_flock)
return __break_lease(inode, mode);
return 0;
--
1.5.2.rc3
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
From: J. Bruce Fields <[email protected]>
Share more code between setlease (used by nfsd) and fcntl.
Also some minor cleanup.
Signed-off-by: "J. Bruce Fields" <[email protected]>
---
fs/locks.c | 14 +++-----------
1 files changed, 3 insertions(+), 11 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 431a8b8..3f366e1 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1469,14 +1469,6 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
struct inode *inode = dentry->d_inode;
int error;
- if ((current->fsuid != inode->i_uid) && !capable(CAP_LEASE))
- return -EACCES;
- if (!S_ISREG(inode->i_mode))
- return -EINVAL;
- error = security_file_lock(filp, arg);
- if (error)
- return error;
-
locks_init_lock(&fl);
error = lease_init(filp, arg, &fl);
if (error)
@@ -1484,15 +1476,15 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
lock_kernel();
- error = __setlease(filp, arg, &flp);
+ error = setlease(filp, arg, &flp);
if (error || arg == F_UNLCK)
goto out_unlock;
error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
if (error < 0) {
- /* remove lease just inserted by __setlease */
+ /* remove lease just inserted by setlease */
flp->fl_type = F_UNLCK | F_INPROGRESS;
- flp->fl_break_time = jiffies- 10;
+ flp->fl_break_time = jiffies - 10;
time_out_leases(inode);
goto out_unlock;
}
--
1.5.2.rc3
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
From: Marc Eshel <[email protected]>
Since gfs2 can't prevent conflicting opens or leases on other nodes, we
probably shouldn't allow it to give out leases at all.
Put the newly defined lease operation into use in gfs2 by turning off
lease, unless we're using the "nolock' locking module (in which case all
locking is local anyway).
Signed-off-by: Marc Eshel <[email protected]>
---
fs/gfs2/ops_file.c | 26 ++++++++++++++++++++++++++
1 files changed, 26 insertions(+), 0 deletions(-)
diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
index 064df88..78ac4ac 100644
--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -489,6 +489,31 @@ static int gfs2_fsync(struct file *file, struct dentry *dentry, int datasync)
}
/**
+ * gfs2_set_lease - acquire/release a file lease
+ * @file: the file pointer
+ * @arg: lease type
+ * @fl: file lock
+ *
+ * Returns: errno
+ */
+
+static int gfs2_set_lease(struct file *file, long arg, struct file_lock **fl)
+{
+ struct gfs2_sbd *sdp = GFS2_SB(file->f_mapping->host);
+ int ret = EAGAIN;
+
+ if (sdp->sd_args.ar_localflocks) {
+ return setlease(file, arg, fl);
+ }
+
+ /* For now fail the delegation request. Cluster file system can not
+ allow any node in the cluster to get a local lease until it can
+ be managed centrally by the cluster file system.
+ */
+ return ret;
+}
+
+/**
* gfs2_lock - acquire/release a posix lock on a file
* @file: the file pointer
* @cmd: either modify or retrieve lock state, possibly wait
@@ -639,6 +664,7 @@ const struct file_operations gfs2_file_fops = {
.flock = gfs2_flock,
.splice_read = generic_file_splice_read,
.splice_write = generic_file_splice_write,
+ .set_lease = gfs2_set_lease,
};
const struct file_operations gfs2_dir_fops = {
--
1.5.2.rc3
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2007-05-31 at 17:40 -0400, J. Bruce Fields wrote:
> From: J. Bruce Fields <[email protected]>
>
> Share more code between setlease (used by nfsd) and fcntl.
>
> Also some minor cleanup.
>
> Signed-off-by: "J. Bruce Fields" <[email protected]>
> ---
> fs/locks.c | 14 +++-----------
> 1 files changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 431a8b8..3f366e1 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1469,14 +1469,6 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
> struct inode *inode = dentry->d_inode;
> int error;
>
> - if ((current->fsuid != inode->i_uid) && !capable(CAP_LEASE))
> - return -EACCES;
> - if (!S_ISREG(inode->i_mode))
> - return -EINVAL;
> - error = security_file_lock(filp, arg);
> - if (error)
> - return error;
> -
> locks_init_lock(&fl);
> error = lease_init(filp, arg, &fl);
> if (error)
> @@ -1484,15 +1476,15 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
>
> lock_kernel();
>
> - error = __setlease(filp, arg, &flp);
> + error = setlease(filp, arg, &flp);
> if (error || arg == F_UNLCK)
> goto out_unlock;
>
> error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
> if (error < 0) {
> - /* remove lease just inserted by __setlease */
> + /* remove lease just inserted by setlease */
> flp->fl_type = F_UNLCK | F_INPROGRESS;
> - flp->fl_break_time = jiffies- 10;
> + flp->fl_break_time = jiffies - 10;
> time_out_leases(inode);
> goto out_unlock;
> }
Why not move the security checks from setlease() into __setlease()? That
way you can continue to avoid the calls to (re)take the BKL, which are
redundant as far as fcntl_setlease() is concerned.
Cheers
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
I have had some previous communications with Bruce on these topics, and I am
generally pleased with the proposed modifications that are presented here.
I am working on a clustered file system and there is one small additional
modification that would be of great use to me. That would be to export the
__setlease symbol.
In my implementation, my file system specific set_lease() function first
determines whether the granting of the requested lease on the given inode is
compatible with the cluster state of this inode, and then if it is, I simply
invoke the __setlease() routine and have the kernel build the associated
infrastructure.
Having this symbol globally available greatly simplifies things.
On 6/4/07, J. Bruce Fields <[email protected]> wrote:
>
> On Sat, Jun 02, 2007 at 02:21:22PM -0400, Trond Myklebust wrote:
> > Currently, the lease handling is done all in the VFS, and is done prior
> > to calling any filesystem operations. Bruce's break_lease() inode
> > operation allows the VFS to notify the filesystem that some operation is
> > going to be called that requires the lease to be broken.
> >
> > My point is that in doing so, you are not atomic with the operation that
> > requires the lease to be broken. Some different node may re-establish a
> > lease while we're calling back down into the filesystem to perform the
> > operation.
> > So I agree with you. The break_lease() inode operation isn't going to
> > work. The filesystem is going to have to figure out for itself when it
> > needs to notify other nodes that the lease needs breaking, and it needs
> > to figure out its own methods for ensuring atomicity.
>
> OK, I agree with you both, thanks for the explanations.
>
> It looks to me like there's probably a race in the existing code that
> will allow conflicting opens and leases to be granted simultaneously if
> the lease request is handled just after may_open() is called. These
> checks at the beginning of __setlease() are an attempt to prevent that
> race:
>
> if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> goto out;
> if ((arg == F_WRLCK)
> && ((atomic_read(&dentry->d_count) > 1)
> || (atomic_read(&inode->i_count) > 1)))
> goto out;
>
> But, for example, in the case of a simultaneous write open and RDLCK
> lease request, I believe the call to setlease could come after the
> may_open() but before the call to get_write_access() that bumps
> i_writecount.
>
> --b.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Why isn't the existing setlease() export sufficient? The plan is to more
or less get rid of __setlease().
Trond
On Tue, 2007-06-05 at 18:04 -0400, Robert Rappaport wrote:
> I have had some previous communications with Bruce on these topics,
> and I am generally pleased with the proposed modifications that are
> presented here.
>
> I am working on a clustered file system and there is one small
> additional modification that would be of great use to me. That would
> be to export the __setlease symbol.
>
> In my implementation, my file system specific set_lease() function
> first determines whether the granting of the requested lease on the
> given inode is compatible with the cluster state of this inode, and
> then if it is, I simply invoke the __setlease() routine and have the
> kernel build the associated infrastructure.
>
> Having this symbol globally available greatly simplifies things.
>
> On 6/4/07, J. Bruce Fields <[email protected]> wrote:
> On Sat, Jun 02, 2007 at 02:21:22PM -0400, Trond Myklebust
> wrote:
> > Currently, the lease handling is done all in the VFS, and is
> done prior
> > to calling any filesystem operations. Bruce's break_lease()
> inode
> > operation allows the VFS to notify the filesystem that some
> operation is
> > going to be called that requires the lease to be broken.
> >
> > My point is that in doing so, you are not atomic with the
> operation that
> > requires the lease to be broken. Some different node may
> re-establish a
> > lease while we're calling back down into the filesystem to
> perform the
> > operation.
> > So I agree with you. The break_lease() inode operation isn't
> going to
> > work. The filesystem is going to have to figure out for
> itself when it
> > needs to notify other nodes that the lease needs breaking,
> and it needs
> > to figure out its own methods for ensuring atomicity.
>
> OK, I agree with you both, thanks for the explanations.
>
> It looks to me like there's probably a race in the existing
> code that
> will allow conflicting opens and leases to be granted
> simultaneously if
> the lease request is handled just after may_open() is
> called. These
> checks at the beginning of __setlease() are an attempt to
> prevent that
> race:
>
> if ((arg == F_RDLCK) &&
> (atomic_read(&inode->i_writecount) > 0))
> goto out;
> if ((arg == F_WRLCK)
> && ((atomic_read(&dentry->d_count) > 1)
> || (atomic_read(&inode->i_count) > 1)))
> goto out;
>
> But, for example, in the case of a simultaneous write open and
> RDLCK
> lease request, I believe the call to setlease could come after
> the
> may_open() but before the call to get_write_access() that
> bumps
> i_writecount.
>
> --b.
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info
> at http://vger.kernel.org/majordomo-info.html
>
Hi Bruce,
The file system does need to keep the local state up to date, like it does
with posix locks, so it might need to call __setlease(). The why we had it
before was that the call to the file system was done from outside of
setlease() and the file system was able to call setlease() which is
exported. Now that the call to the fs moved into setlease() the file
system can not call it anymore so one possible solution would be to export
__setlease().
Marc.
[email protected] wrote on 06/05/2007 03:04:57 PM:
> I have had some previous communications with Bruce on these topics,
> and I am generally pleased with the proposed modifications that are
> presented here.
>
> I am working on a clustered file system and there is one small
> additional modification that would be of great use to me. That
> would be to export the __setlease symbol.
>
> In my implementation, my file system specific set_lease() function
> first determines whether the granting of the requested lease on the
> given inode is compatible with the cluster state of this inode, and
> then if it is, I simply invoke the __setlease() routine and have the
> kernel build the associated infrastructure.
>
> Having this symbol globally available greatly simplifies things.
> On 6/4/07, J. Bruce Fields <[email protected] > wrote:
> On Sat, Jun 02, 2007 at 02:21:22PM -0400, Trond Myklebust wrote:
> > Currently, the lease handling is done all in the VFS, and is done
prior
> > to calling any filesystem operations. Bruce's break_lease() inode
> > operation allows the VFS to notify the filesystem that some operation
is
> > going to be called that requires the lease to be broken.
> >
> > My point is that in doing so, you are not atomic with the operation
that
> > requires the lease to be broken. Some different node may re-establish
a
> > lease while we're calling back down into the filesystem to perform the
> > operation.
> > So I agree with you. The break_lease() inode operation isn't going to
> > work. The filesystem is going to have to figure out for itself when it
> > needs to notify other nodes that the lease needs breaking, and it
needs
> > to figure out its own methods for ensuring atomicity.
>
> OK, I agree with you both, thanks for the explanations.
>
> It looks to me like there's probably a race in the existing code that
> will allow conflicting opens and leases to be granted simultaneously if
> the lease request is handled just after may_open() is called. These
> checks at the beginning of __setlease() are an attempt to prevent that
> race:
>
> if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> goto out;
> if ((arg == F_WRLCK)
> && ((atomic_read(&dentry->d_count) > 1)
> || (atomic_read(&inode->i_count) > 1)))
> goto out;
>
> But, for example, in the case of a simultaneous write open and RDLCK
> lease request, I believe the call to setlease could come after the
> may_open() but before the call to get_write_access() that bumps
> i_writecount.
>
> --b.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel"
in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
On Tue, 2007-06-05 at 15:56 -0700, Marc Eshel wrote:
> Hi Bruce,
> The file system does need to keep the local state up to date, like it does
> with posix locks, so it might need to call __setlease(). The why we had it
> before was that the call to the file system was done from outside of
> setlease() and the file system was able to call setlease() which is
> exported. Now that the call to the fs moved into setlease() the file
> system can not call it anymore so one possible solution would be to export
> __setlease().
> Marc.
Please just make a vfs_setlease() which has the ability to call down
into the filesystem and leave the exported setlease() as a generic
method that can continue to be called by the filesystems themselves (and
acts as a fallback for vfs_setlease()). That would be closer to the VFS
naming conventions.
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
It appears that there is agreement that the functionality currently
implemented in __setlease() should be exported, even though the exported
name may not be __setlease().
That is just fine with me.
The question that I have now is when do you think it likely that these
changes get into the released code? I hope that the plan is to get it there
fairly soon.
- Robert Rappaport
On 6/5/07, Trond Myklebust <[email protected]> wrote:
>
> On Tue, 2007-06-05 at 15:56 -0700, Marc Eshel wrote:
> > Hi Bruce,
> > The file system does need to keep the local state up to date, like it
> does
> > with posix locks, so it might need to call __setlease(). The why we had
> it
> > before was that the call to the file system was done from outside of
> > setlease() and the file system was able to call setlease() which is
> > exported. Now that the call to the fs moved into setlease() the file
> > system can not call it anymore so one possible solution would be to
> export
> > __setlease().
> > Marc.
>
> Please just make a vfs_setlease() which has the ability to call down
> into the filesystem and leave the exported setlease() as a generic
> method that can continue to be called by the filesystems themselves (and
> acts as a fallback for vfs_setlease()). That would be closer to the VFS
> naming conventions.
>
> Trond
>
>
My interpretation of the preceeding is that there is agreement that
the functionality currently implemented in __setlease() should be
exported, even though the exported name may not be __setlease(). Is
this correct?
If so, that is just fine with me.
The question that I have now is when do you think it likely that these
changes get into the released code? I hope that the plan is to get it
there fairly soon.
- Robert Rappaport
P.S. I previously intended to send to this thread a request to export
the __setlease() routine and also the above query. However I
inadvertently forgot to send the messages in plain text and so it did
not appear in the thread, however some of you may have received the
messages. Sorry, if this is redundant.
>
> On 6/5/07, Trond Myklebust <[email protected]> wrote:
> > On Tue, 2007-06-05 at 15:56 -0700, Marc Eshel wrote:
> > > Hi Bruce,
> > > The file system does need to keep the local state up to date, like it
> does
> > > with posix locks, so it might need to call __setlease(). The why we had
> it
> > > before was that the call to the file system was done from outside of
> > > setlease() and the file system was able to call setlease() which is
> > > exported. Now that the call to the fs moved into setlease() the file
> > > system can not call it anymore so one possible solution would be to
> export
> > > __setlease().
> > > Marc.
> >
> > Please just make a vfs_setlease() which has the ability to call down
> > into the filesystem and leave the exported setlease() as a generic
> > method that can continue to be called by the filesystems themselves (and
> > acts as a fallback for vfs_setlease()). That would be closer to the VFS
> > naming conventions.
> >
> > Trond
> >
> >
>
>
On Thu, Jun 07, 2007 at 10:43:51AM -0400, Robert Rappaport wrote:
> My interpretation of the preceeding is that there is agreement that
> the functionality currently implemented in __setlease() should be
> exported, even though the exported name may not be __setlease(). Is
> this correct?
Yes.
> If so, that is just fine with me.
OK, good. I'll revise and post a new series. (Do people prefer another
mailbomb or a git url?)
> The question that I have now is when do you think it likely that these
> changes get into the released code? I hope that the plan is to get it
> there fairly soon.
It would seem reasonable to me to put off the question of how to do proper
distributed lease-breaking for now, in which case the remaining patches
seem straightforward enough to me that they could go in now.
My main question is whether the partial disabling of leases looks to the
GFS2 people like reasonable behavior.
--b.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
J. Bruce Fields <[email protected]> wrote:
> OK, good. I'll revise and post a new series. (Do people prefer
> another mailbomb or a git url?)
OK, I went for the former; if you'd rather get this out of git, you can
git clone http://www.linux-nfs.org/~bfields/linux.git
git checkout server-cluster-lease-api
The changes from the last version seem pretty trivial, but I've
compile-tested this only for now.
I'm ignoring the problem of breaking leases on unlink and rename. I
think we should go ahead and do this part now--it's adequate for the
current lease semantics, and even more so for our current application
(just turning off leases selectively on some filesystems)--but I'd
really like to solve that problem eventually.
That's probably not going to happen until we get a cluster filesystem
with real lease support into the kernel....
Changes:
- do away with the break_lease method.
- rename __setlease to setlease, setlease to vfs_setlease, and
make sure it's setlease (the one that doesn't call into the
filesystem) that's exported.
- rename ->set_lease to ->setlease. (I don't really care which
we go with, it just seemed confusing when everything else was
already named without the underscore.)
- Add a trivial patch that disables leases on nfs (as suggested
by a patch elsewhere from Peter Staubach)
--b.
J. Bruce Fields <[email protected]> wrote:
> OK, good. I'll revise and post a new series. (Do people prefer
> another mailbomb or a git url?)
OK, I went for the former; if you'd rather get this out of git, you can
git clone http://www.linux-nfs.org/~bfields/linux.git
git checkout server-cluster-lease-api
The changes from the last version seem pretty trivial, but I've
compile-tested this only for now.
I'm ignoring the problem of breaking leases on unlink and rename. I
think we should go ahead and do this part now--it's adequate for the
current lease semantics, and even more so for our current application
(just turning off leases selectively on some filesystems)--but I'd
really like to solve that problem eventually.
That's probably not going to happen until we get a cluster filesystem
with real lease support into the kernel....
Changes:
- do away with the break_lease method.
- rename __setlease to setlease, setlease to vfs_setlease, and
make sure it's setlease (the one that doesn't call into the
filesystem) that's exported.
- rename ->set_lease to ->setlease. (I don't really care which
we go with, it just seemed confusing when everything else was
already named without the underscore.)
- Add a trivial patch that disables leases on nfs (as suggested
by a patch elsewhere from Peter Staubach)
--b.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2007-05-31 at 17:40 -0400, J. Bruce Fields wrote:
> From: J. Bruce Fields <[email protected]>
>
> Currently leases are only kept locally, so there's no way for a distributed
> filesystem to enforce them against multiple clients. We're particularly
> interested in the case of nfsd exporting a cluster filesystem, in which
> case nfsd needs cluster-coherent leases in order to implement delegations
> correctly.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/locks.c | 5 ++++-
> include/linux/fs.h | 4 ++++
> 2 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 3f366e1..40a7f39 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1444,7 +1444,10 @@ int setlease(struct file *filp, long arg, struct file_lock **lease)
> return error;
>
> lock_kernel();
> - error = __setlease(filp, arg, lease);
> + if (filp->f_op && filp->f_op->set_lease)
> + error = filp->f_op->set_lease(filp, arg, lease);
> + else
> + error = __setlease(filp, arg, lease);
> unlock_kernel();
>
> return error;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7cf0c54..09aefb4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1112,6 +1112,7 @@ struct file_operations {
> int (*flock) (struct file *, int, struct file_lock *);
> ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
> ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
> + int (*set_lease)(struct file *, long, struct file_lock **);
> };
>
> struct inode_operations {
> @@ -1137,6 +1138,7 @@ struct inode_operations {
> ssize_t (*listxattr) (struct dentry *, char *, size_t);
> int (*removexattr) (struct dentry *, const char *);
> void (*truncate_range)(struct inode *, loff_t, loff_t);
> + int (*break_lease)(struct inode *, unsigned int);
Splitting the lease into a file_operation part and an inode_operation
part looks really ugly. It also means that you're calling twice down
into the filesystem for every call to may_open() (once for
vfs_permission() and once for break_lease()) and 3 times in
do_sys_truncate().
Would it perhaps make sense to package up the call to vfs_permission()
and break_lease() as a single 'may_open()' inode operation that could be
called by may_open(), do_sys_truncate() and nfsd?
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
J. Bruce Fields wrote:
> From: J. Bruce Fields <[email protected]>
>
> Currently leases are only kept locally, so there's no way for a distributed
> filesystem to enforce them against multiple clients. We're particularly
> interested in the case of nfsd exporting a cluster filesystem, in which
> case nfsd needs cluster-coherent leases in order to implement delegations
> correctly.
A couple of naive questions --
Do the semantics of these leases really match the requirements from NFSv4?
Does the internal NFSv4 server state also need to be shared?
Thanx...
ps
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, May 31, 2007 at 05:51:37PM -0400, Trond Myklebust wrote:
> Why not move the security checks from setlease() into __setlease()? That
> way you can continue to avoid the calls to (re)take the BKL, which are
> redundant as far as fcntl_setlease() is concerned.
Sure. Then setlease() just becomes a simple BKL-taking wrapper around
__setlease(); is that what you were thinking of?
--b.
>From 4ce02551ab16d2812299ac0ced43652f1af0deb1 Mon Sep 17 00:00:00 2001
From: J. Bruce Fields <[email protected]>
Date: Thu, 31 May 2007 17:03:46 -0400
Subject: [PATCH] locks: share more common lease code
Share more code between setlease (used by nfsd) and fcntl.
Also some minor cleanup.
Signed-off-by: "J. Bruce Fields" <[email protected]>
---
fs/locks.c | 30 ++++++++++--------------------
1 files changed, 10 insertions(+), 20 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 431a8b8..6ad3c7b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1346,6 +1346,14 @@ static int __setlease(struct file *filp, long arg, struct file_lock **flp)
struct inode *inode = dentry->d_inode;
int error, rdlease_count = 0, wrlease_count = 0;
+ if ((current->fsuid != inode->i_uid) && !capable(CAP_LEASE))
+ return -EACCES;
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+ error = security_file_lock(filp, arg);
+ if (error)
+ return error;
+
time_out_leases(inode);
error = -EINVAL;
@@ -1431,18 +1439,8 @@ out:
int setlease(struct file *filp, long arg, struct file_lock **lease)
{
- struct dentry *dentry = filp->f_path.dentry;
- struct inode *inode = dentry->d_inode;
int error;
- if ((current->fsuid != inode->i_uid) && !capable(CAP_LEASE))
- return -EACCES;
- if (!S_ISREG(inode->i_mode))
- return -EINVAL;
- error = security_file_lock(filp, arg);
- if (error)
- return error;
-
lock_kernel();
error = __setlease(filp, arg, lease);
unlock_kernel();
@@ -1469,14 +1467,6 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
struct inode *inode = dentry->d_inode;
int error;
- if ((current->fsuid != inode->i_uid) && !capable(CAP_LEASE))
- return -EACCES;
- if (!S_ISREG(inode->i_mode))
- return -EINVAL;
- error = security_file_lock(filp, arg);
- if (error)
- return error;
-
locks_init_lock(&fl);
error = lease_init(filp, arg, &fl);
if (error)
@@ -1490,9 +1480,9 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
if (error < 0) {
- /* remove lease just inserted by __setlease */
+ /* remove lease just inserted by setlease */
flp->fl_type = F_UNLCK | F_INPROGRESS;
- flp->fl_break_time = jiffies- 10;
+ flp->fl_break_time = jiffies - 10;
time_out_leases(inode);
goto out_unlock;
}
--
1.5.2.rc3
On Fri, 2007-06-01 at 12:30 -0400, J. Bruce Fields wrote:
> On Thu, May 31, 2007 at 05:51:37PM -0400, Trond Myklebust wrote:
> > Why not move the security checks from setlease() into __setlease()? That
> > way you can continue to avoid the calls to (re)take the BKL, which are
> > redundant as far as fcntl_setlease() is concerned.
>
> Sure. Then setlease() just becomes a simple BKL-taking wrapper around
> __setlease(); is that what you were thinking of?
>
> --b.
Yep
Trond
>
> >From 4ce02551ab16d2812299ac0ced43652f1af0deb1 Mon Sep 17 00:00:00 2001
> From: J. Bruce Fields <[email protected]>
> Date: Thu, 31 May 2007 17:03:46 -0400
> Subject: [PATCH] locks: share more common lease code
>
> Share more code between setlease (used by nfsd) and fcntl.
>
> Also some minor cleanup.
>
> Signed-off-by: "J. Bruce Fields" <[email protected]>
> ---
> fs/locks.c | 30 ++++++++++--------------------
> 1 files changed, 10 insertions(+), 20 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 431a8b8..6ad3c7b 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1346,6 +1346,14 @@ static int __setlease(struct file *filp, long arg, struct file_lock **flp)
> struct inode *inode = dentry->d_inode;
> int error, rdlease_count = 0, wrlease_count = 0;
>
> + if ((current->fsuid != inode->i_uid) && !capable(CAP_LEASE))
> + return -EACCES;
> + if (!S_ISREG(inode->i_mode))
> + return -EINVAL;
> + error = security_file_lock(filp, arg);
> + if (error)
> + return error;
> +
> time_out_leases(inode);
>
> error = -EINVAL;
> @@ -1431,18 +1439,8 @@ out:
>
> int setlease(struct file *filp, long arg, struct file_lock **lease)
> {
> - struct dentry *dentry = filp->f_path.dentry;
> - struct inode *inode = dentry->d_inode;
> int error;
>
> - if ((current->fsuid != inode->i_uid) && !capable(CAP_LEASE))
> - return -EACCES;
> - if (!S_ISREG(inode->i_mode))
> - return -EINVAL;
> - error = security_file_lock(filp, arg);
> - if (error)
> - return error;
> -
> lock_kernel();
> error = __setlease(filp, arg, lease);
> unlock_kernel();
> @@ -1469,14 +1467,6 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
> struct inode *inode = dentry->d_inode;
> int error;
>
> - if ((current->fsuid != inode->i_uid) && !capable(CAP_LEASE))
> - return -EACCES;
> - if (!S_ISREG(inode->i_mode))
> - return -EINVAL;
> - error = security_file_lock(filp, arg);
> - if (error)
> - return error;
> -
> locks_init_lock(&fl);
> error = lease_init(filp, arg, &fl);
> if (error)
> @@ -1490,9 +1480,9 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
>
> error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
> if (error < 0) {
> - /* remove lease just inserted by __setlease */
> + /* remove lease just inserted by setlease */
> flp->fl_type = F_UNLCK | F_INPROGRESS;
> - flp->fl_break_time = jiffies- 10;
> + flp->fl_break_time = jiffies - 10;
> time_out_leases(inode);
> goto out_unlock;
> }
On Fri, Jun 01, 2007 at 09:14:53AM -0400, Peter Staubach wrote:
> J. Bruce Fields wrote:
> >From: J. Bruce Fields <[email protected]>
> >
> >Currently leases are only kept locally, so there's no way for a distributed
> >filesystem to enforce them against multiple clients. We're particularly
> >interested in the case of nfsd exporting a cluster filesystem, in which
> >case nfsd needs cluster-coherent leases in order to implement delegations
> >correctly.
>
> A couple of naive questions --
>
> Do the semantics of these leases really match the requirements from NFSv4?
The only problem I'm aware of is that leases aren't broken on rename,
link, and unlink. This is kind of tricky to fix. David Richter (cc'd)
and I sketched out a few different approaches, and I think he has some
patches implementing at least one of them.
This may require defining some new type of lease, to avoid changing the
documented behavior of the fcntl lease operations (which only break on
open). Although actually I believe Samba needs the same behavior we do,
and they're probably the most important user of leases....
> Does the internal NFSv4 server state also need to be shared?
Not that I can see. The setup we're trying to deal with here is one
where we have a cluster filesystem mounted by several nodes, at least
two of which are running nfs servers exporting that filesystem. The
servers don't share clients--so no clientid's or stateid's are
shared--but they need leases, locks, etc., to be coherent across all the
servers.
--b.
On Thu, May 31, 2007 at 06:34:09PM -0400, Trond Myklebust wrote:
> On Thu, 2007-05-31 at 17:40 -0400, J. Bruce Fields wrote:
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 7cf0c54..09aefb4 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1112,6 +1112,7 @@ struct file_operations {
> > int (*flock) (struct file *, int, struct file_lock *);
> > ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
> > ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
> > + int (*set_lease)(struct file *, long, struct file_lock **);
> > };
> >
> > struct inode_operations {
> > @@ -1137,6 +1138,7 @@ struct inode_operations {
> > ssize_t (*listxattr) (struct dentry *, char *, size_t);
> > int (*removexattr) (struct dentry *, const char *);
> > void (*truncate_range)(struct inode *, loff_t, loff_t);
> > + int (*break_lease)(struct inode *, unsigned int);
>
> Splitting the lease into a file_operation part and an inode_operation
> part looks really ugly.
Well, we could stick them both in the file_operations, of course, but I
think it really would be a bug for the behavior of the break_lease
operation to ever vary based on the file passed in. (And we eventually
want to break leases in places like rename where we don't even have a
file.)
> It also means that you're calling twice down
> into the filesystem for every call to may_open() (once for
> vfs_permission() and once for break_lease()) and 3 times in
> do_sys_truncate().
>
> Would it perhaps make sense to package up the call to vfs_permission()
> and break_lease() as a single 'may_open()' inode operation that could be
> called by may_open(), do_sys_truncate() and nfsd?
Could be, but this may no longer make sense when we try to do
lease-breaking on rename and unlink.
For the purposes of the current limited GFS2 implementation, we don't
even need a break operation yet. So maybe that should just be split out
into a separate patch for now, and postpone dealing with this part until
we figure out how to do a full GFS2 lease implementation.
--b.
On Fri, Jun 01, 2007 at 12:44:16PM -0400, J. Bruce Fields wrote:
> The only problem I'm aware of is that leases aren't broken on rename,
> link, and unlink. This is kind of tricky to fix. David Richter (cc'd)
> and I sketched out a few different approaches, and I think he has some
> patches implementing at least one of them.
>
> This may require defining some new type of lease, to avoid changing the
> documented behavior of the fcntl lease operations (which only break on
> open). Although actually I believe Samba needs the same behavior we do,
> and they're probably the most important user of leases....
Samba internally prohibits renaming or deleting an open file, to match
Windows semantics. So it won't notice the difference. At least, that's
what I remember from a discussion with Tridge when we were implementing
leases back in 2000.
I think it's an acceptable change in Linux semantics to break leases on
rename/delete/link. Though I'm not quite sure why you need to -- nobody
else is touching the contents of the file, so it's not like you need to
write the data back to it, or discard your cached copy of it in the case
of a read-only lease.
On Fri, Jun 01, 2007 at 11:41:23AM -0600, Matthew Wilcox wrote:
> Samba internally prohibits renaming or deleting an open file, to match
> Windows semantics. So it won't notice the difference. At least, that's
> what I remember from a discussion with Tridge when we were implementing
> leases back in 2000.
NFSv4 should be doing the same thing (err, for some reason I thought it
already did--I remember writing patches to do it--but I must not have
finished them). But there's still the problem of renames done by local
users, other file servers in the multiprotocol case, etc.
> I think it's an acceptable change in Linux semantics to break leases on
> rename/delete/link. Though I'm not quite sure why you need to -- nobody
> else is touching the contents of the file, so it's not like you need to
> write the data back to it, or discard your cached copy of it in the case
> of a read-only lease.
NFSv4 delegations are defined to cover the file's metadata (including
the name it was opened under) as well as its data. Otherwise you
wouldn't be able to perform opens locally, because you wouldn't know
whether the open path still works, or whether the file permissions still
allow the same access.
And other parts of the protocol make use of this assumption (e.g. OPEN
with CLAIM_DELEGATE_CUR or CLAIM_DELEGATE_PREV, which expect to be able
to re-open a file under its original name).
--b.
On Fri, 2007-06-01 at 12:53 -0400, J. Bruce Fields wrote:
> > It also means that you're calling twice down
> > into the filesystem for every call to may_open() (once for
> > vfs_permission() and once for break_lease()) and 3 times in
> > do_sys_truncate().
> >
> > Would it perhaps make sense to package up the call to vfs_permission()
> > and break_lease() as a single 'may_open()' inode operation that could be
> > called by may_open(), do_sys_truncate() and nfsd?
>
> Could be, but this may no longer make sense when we try to do
> lease-breaking on rename and unlink.
Hmm... Clustered lease breaking on rename and unlink really needs to be
done _atomically_ with the actual operation. It really needs to be done
inside the ->rename() and ->unlink() operation because only the
filesystem can guarantee atomicity.
Thinking about it, perhaps the may_open() thing is wrong too: again, you
have atomicity requirements inside the open() call. A break_lease()
inode operation won't help either...
Trond
[email protected] wrote on 06/02/2007 10:39:10 AM:
> On Fri, 2007-06-01 at 12:53 -0400, J. Bruce Fields wrote:
> > > It also means that you're calling twice down
> > > into the filesystem for every call to may_open() (once for
> > > vfs_permission() and once for break_lease()) and 3 times in
> > > do_sys_truncate().
> > >
> > > Would it perhaps make sense to package up the call to
vfs_permission()
> > > and break_lease() as a single 'may_open()' inode operation that
could be
> > > called by may_open(), do_sys_truncate() and nfsd?
> >
> > Could be, but this may no longer make sense when we try to do
> > lease-breaking on rename and unlink.
>
> Hmm... Clustered lease breaking on rename and unlink really needs to be
> done _atomically_ with the actual operation. It really needs to be done
> inside the ->rename() and ->unlink() operation because only the
> filesystem can guarantee atomicity.
A cluster filesystem don't need the inode operation breake_lease. The fs
must be able to detect the conflict and break the lease using
__break_lease() call on all the nodes that hold a conflicting leases.
Marc.
> Thinking about it, perhaps the may_open() thing is wrong too: again, you
> have atomicity requirements inside the open() call. A break_lease()
> inode operation won't help either...
>
> Trond
>
>
>
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
On Sat, 2007-06-02 at 11:09 -0700, Marc Eshel wrote:
> A cluster filesystem don't need the inode operation breake_lease. The fs
> must be able to detect the conflict and break the lease using
> __break_lease() call on all the nodes that hold a conflicting leases.
> Marc.
Currently, the lease handling is done all in the VFS, and is done prior
to calling any filesystem operations. Bruce's break_lease() inode
operation allows the VFS to notify the filesystem that some operation is
going to be called that requires the lease to be broken.
My point is that in doing so, you are not atomic with the operation that
requires the lease to be broken. Some different node may re-establish a
lease while we're calling back down into the filesystem to perform the
operation.
So I agree with you. The break_lease() inode operation isn't going to
work. The filesystem is going to have to figure out for itself when it
needs to notify other nodes that the lease needs breaking, and it needs
to figure out its own methods for ensuring atomicity.
Cheers
Trond
On Sat, 2007-06-02 at 11:09 -0700, Marc Eshel wrote:
> [email protected] wrote on 06/02/2007 10:39:10 AM:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BTW: your mailer is seriously broken.
Trond
On Sat, Jun 02, 2007 at 02:21:22PM -0400, Trond Myklebust wrote:
> Currently, the lease handling is done all in the VFS, and is done prior
> to calling any filesystem operations. Bruce's break_lease() inode
> operation allows the VFS to notify the filesystem that some operation is
> going to be called that requires the lease to be broken.
>
> My point is that in doing so, you are not atomic with the operation that
> requires the lease to be broken. Some different node may re-establish a
> lease while we're calling back down into the filesystem to perform the
> operation.
> So I agree with you. The break_lease() inode operation isn't going to
> work. The filesystem is going to have to figure out for itself when it
> needs to notify other nodes that the lease needs breaking, and it needs
> to figure out its own methods for ensuring atomicity.
OK, I agree with you both, thanks for the explanations.
It looks to me like there's probably a race in the existing code that
will allow conflicting opens and leases to be granted simultaneously if
the lease request is handled just after may_open() is called. These
checks at the beginning of __setlease() are an attempt to prevent that
race:
if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
goto out;
if ((arg == F_WRLCK)
&& ((atomic_read(&dentry->d_count) > 1)
|| (atomic_read(&inode->i_count) > 1)))
goto out;
But, for example, in the case of a simultaneous write open and RDLCK
lease request, I believe the call to setlease could come after the
may_open() but before the call to get_write_access() that bumps
i_writecount.
--b.