At LSF this year, there was a discussion about the "wishlist" for
userland file servers. One of the things brought up was the goofy and
problematic behavior of POSIX locks when a file is closed. Boaz started
a thread on it here:
http://permalink.gmane.org/gmane.linux.file-systems/73364
Userland fileservers often need to maintain more than one open file
descriptor on a file. The POSIX spec says:
"All locks associated with a file for a given process shall be removed
when a file descriptor for that file is closed by that process or the
process holding that file descriptor terminates."
This is problematic since you can't close any file descriptor without
dropping all your POSIX locks. Most userland file servers therefore
end up opening the file with more access than is really necessary, and
keeping fd's open for longer than is necessary to work around this.
This patchset is a first stab at an approach to address this problem by
adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is short
for "private" -- I'm open to changing that if you have a better
mnemonic).
For all intents and purposes these lock types act just like their
"non-P" counterpart. The difference is that they are only implicitly
released when the fd against which they were acquired is closed. As a
side effect, these locks cannot be merged with "non-P" locks since they
have different semantics on close.
I've given this patchset some very basic smoke testing and it seems to
do the right thing, but it is still pretty rough. If this looks
reasonable I'll plan to do some documentation updates and will take a
stab at trying to get these new lock types added to the POSIX spec (as
HCH recommended).
At this point, my main questions are:
1) does this look useful, particularly for fileserver implementors?
2) does this look OK API-wise? We could consider different "cmd" values
or even different syscalls, but I figured this makes it clearer that
"P" and "non-P" locks will still conflict with one another.
Jeff Layton (5):
locks: consolidate checks for compatible filp->f_mode values in setlk
handlers
locks: add definitions for F_RDLCKP and F_WRLCKP
locks: skip FL_FILP_PRIVATE locks on close unless we're closing the
correct filp
locks: handle merging of locks when FL_FILP_PRIVATE is set
locks: show private lock types in /proc/locks
fs/locks.c | 121 ++++++++++++++++++++++++++-------------
include/linux/fs.h | 1 +
include/uapi/asm-generic/fcntl.h | 9 +++
3 files changed, 91 insertions(+), 40 deletions(-)
--
1.8.3.1
If the request has FL_CLOSE set, then we know we're releasing locks in
response to a fd being closed. When iterating over locks, skip any
locks that have FL_FILP_PRIVATE set and that don't have the same fl_file
pointer.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/locks.c b/fs/locks.c
index 086c3b4..7186b9a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -135,6 +135,7 @@
#define IS_POSIX(fl) (fl->fl_flags & FL_POSIX)
#define IS_FLOCK(fl) (fl->fl_flags & FL_FLOCK)
#define IS_LEASE(fl) (fl->fl_flags & FL_LEASE)
+#define IS_PRIVATE(fl) (fl->fl_flags & FL_FILP_PRIVATE)
static bool lease_breaking(struct file_lock *fl)
{
@@ -1009,6 +1010,14 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
/* Process locks with this owner. */
while ((fl = *before) && posix_same_owner(request, fl)) {
+ /*
+ * For fd-private locks, only release locks that are associated
+ * with this filp.
+ */
+ if ((request->fl_flags & FL_CLOSE) && IS_PRIVATE(fl) &&
+ request->fl_file != fl->fl_file)
+ goto next_lock;
+
/* Detect adjacent or overlapping regions (if same lock type)
*/
if (request->fl_type == fl->fl_type) {
--
1.8.3.1
Show "private" locks in /proc/locks with a 'P' suffix.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 3bde157..31abb70 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2393,8 +2393,8 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
} else {
seq_printf(f, "%s ",
(lease_breaking(fl))
- ? (fl->fl_type == F_UNLCK) ? "UNLCK" : "READ "
- : (fl->fl_type == F_WRLCK) ? "WRITE" : "READ ");
+ ? (fl->fl_type == F_UNLCK) ? "UNLCK " : (IS_PRIVATE(fl) ? "READP " : "READ ")
+ : (fl->fl_type == F_WRLCK) ? (IS_PRIVATE(fl) ? "WRITEP" : "WRITE ") : IS_PRIVATE(fl) ? "READP " : "READ ");
}
if (inode) {
#ifdef WE_CAN_BREAK_LSLK_NOW
--
1.8.3.1
Since FL_FILP_PRIVATE locks have different semantics on close, we can't
merge them with "normal" locks or with other FL_FILP_PRIVATE locks that
were acquired on different file descriptors. Consolidate the logic that
determines this into its own function.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 38 +++++++++++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 7186b9a..3bde157 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -934,6 +934,39 @@ out:
return error;
}
+/**
+ * posix_locks_mergeable - are two posix locks able to be merged?
+ * @fl1: first posix lock
+ * @fl2: second posix lock
+ *
+ * Determine whether two posix locks are potentially able to be merged. If
+ * they are of different types or have different settings of FL_FILP_PRIVATE
+ * then they cannot be. Ditto if they are FL_FILP_PRIVATE locks and are
+ * associated with different file descriptors.
+ */
+static bool
+posix_locks_mergeable(struct file_lock *fl1, struct file_lock *fl2)
+{
+ unsigned int p1, p2;
+
+ /* Different types are not mergeable */
+ if (fl1->fl_type != fl2->fl_type)
+ return false;
+
+ p1 = IS_PRIVATE(fl1);
+ p2 = IS_PRIVATE(fl2);
+
+ /* Different settings of FL_FILP_PRIVATE aren't mergeable */
+ if (p1 != p2)
+ return false;
+
+ /* When FL_FILP_PRIVATE is set, fl_file must be same */
+ if (p1 && fl1->fl_file != fl2->fl_file)
+ return false;
+
+ return true;
+}
+
static int __posix_lock_file(struct inode *inode, struct file_lock *request, struct file_lock *conflock)
{
struct file_lock *fl;
@@ -1018,9 +1051,8 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
request->fl_file != fl->fl_file)
goto next_lock;
- /* Detect adjacent or overlapping regions (if same lock type)
- */
- if (request->fl_type == fl->fl_type) {
+ /* Detect adjacent or overlapping regions (if same lock type) */
+ if (posix_locks_mergeable(fl, request)) {
/* In all comparisons of start vs end, use
* "start - 1" rather than "end + 1". If end
* is OFFSET_MAX, end + 1 will become negative.
--
1.8.3.1
No arch seems to use these values for anything, so defining them
globally should be ok. When they are specified, translate them into
their non-P variants, and set the FL_FILP_PRIVATE flag to indicate
that they should get the proper behavior on close.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 11 +++++++++++
include/linux/fs.h | 1 +
include/uapi/asm-generic/fcntl.h | 9 +++++++++
3 files changed, 21 insertions(+)
diff --git a/fs/locks.c b/fs/locks.c
index 389382c..086c3b4 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -348,6 +348,17 @@ static int posix_assign_type(struct file_lock *fl, long type)
{
int err;
+ switch(type) {
+ case F_RDLCKP:
+ type = F_RDLCK;
+ fl->fl_flags |= FL_FILP_PRIVATE;
+ break;
+ case F_WRLCKP:
+ type = F_WRLCK;
+ fl->fl_flags |= FL_FILP_PRIVATE;
+ break;
+ }
+
err = assign_type(fl, type);
if (err)
return err;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b46bfa5..df502bf 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -889,6 +889,7 @@ static inline int file_check_writeable(struct file *filp)
#define FL_SLEEP 128 /* A blocking lock */
#define FL_DOWNGRADE_PENDING 256 /* Lease is being downgraded */
#define FL_UNLOCK_PENDING 512 /* Lease is being broken */
+#define FL_FILP_PRIVATE 1024 /* only release on close of filp on which it was set */
/*
* Special return value from posix_lock_file() and vfs_lock_file() for
diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
index 95e46c8..0a4be0d 100644
--- a/include/uapi/asm-generic/fcntl.h
+++ b/include/uapi/asm-generic/fcntl.h
@@ -151,6 +151,15 @@ struct f_owner_ex {
#define F_UNLCK 2
#endif
+/*
+ * fd "private" POSIX locks. Usually POSIX locks held by a process are
+ * released on *any* close. If they are acquired using these l_type
+ * values, then they are only released on close of the fd on which they
+ * were acquired, similar to how BSD locks work.
+ */
+#define F_RDLCKP 5
+#define F_WRLCKP 6
+
/* for old implementation of bsd flock () */
#ifndef F_EXLCK
#define F_EXLCK 4 /* or 3 */
--
1.8.3.1
Add a wrapper around assign_type that does this instead of duplicating
this check in two places. This also fixes a minor wart in the code where
we continue referring to the struct flock after converting it to struct
file_lock.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 61 +++++++++++++++++++++++++------------------------------------
1 file changed, 25 insertions(+), 36 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index bd88c04..389382c 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -344,6 +344,29 @@ static int assign_type(struct file_lock *fl, long type)
return 0;
}
+static int posix_assign_type(struct file_lock *fl, long type)
+{
+ int err;
+
+ err = assign_type(fl, type);
+ if (err)
+ return err;
+
+ /* Ensure that fl->fl_filp has compatible f_mode */
+ switch (fl->fl_type) {
+ case F_RDLCK:
+ if (!(fl->fl_file->f_mode & FMODE_READ))
+ return -EBADF;
+ break;
+ case F_WRLCK:
+ if (!(fl->fl_file->f_mode & FMODE_WRITE))
+ return -EBADF;
+ break;
+ }
+
+ return 0;
+}
+
/* Verify a "struct flock" and copy it to a "struct file_lock" as a POSIX
* style lock.
*/
@@ -393,7 +416,7 @@ static int flock_to_posix_lock(struct file *filp, struct file_lock *fl,
fl->fl_ops = NULL;
fl->fl_lmops = NULL;
- return assign_type(fl, l->l_type);
+ return posix_assign_type(fl, l->l_type);
}
#if BITS_PER_LONG == 32
@@ -439,7 +462,7 @@ static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl,
fl->fl_ops = NULL;
fl->fl_lmops = NULL;
- return assign_type(fl, l->l_type);
+ return posix_assign_type(fl, l->l_type);
}
#endif
@@ -2025,23 +2048,6 @@ again:
file_lock->fl_flags |= FL_SLEEP;
}
- error = -EBADF;
- switch (flock.l_type) {
- case F_RDLCK:
- if (!(filp->f_mode & FMODE_READ))
- goto out;
- break;
- case F_WRLCK:
- if (!(filp->f_mode & FMODE_WRITE))
- goto out;
- break;
- case F_UNLCK:
- break;
- default:
- error = -EINVAL;
- goto out;
- }
-
error = do_lock_file_wait(filp, cmd, file_lock);
/*
@@ -2143,23 +2149,6 @@ again:
file_lock->fl_flags |= FL_SLEEP;
}
- error = -EBADF;
- switch (flock.l_type) {
- case F_RDLCK:
- if (!(filp->f_mode & FMODE_READ))
- goto out;
- break;
- case F_WRLCK:
- if (!(filp->f_mode & FMODE_WRITE))
- goto out;
- break;
- case F_UNLCK:
- break;
- default:
- error = -EINVAL;
- goto out;
- }
-
error = do_lock_file_wait(filp, cmd, file_lock);
/*
--
1.8.3.1
On Fri, 11 Oct 2013 08:25:17 -0400
Jeff Layton <[email protected]> wrote:
> At LSF this year, there was a discussion about the "wishlist" for
> userland file servers. One of the things brought up was the goofy and
> problematic behavior of POSIX locks when a file is closed. Boaz started
> a thread on it here:
>
> http://permalink.gmane.org/gmane.linux.file-systems/73364
>
> Userland fileservers often need to maintain more than one open file
> descriptor on a file. The POSIX spec says:
>
> "All locks associated with a file for a given process shall be removed
> when a file descriptor for that file is closed by that process or the
> process holding that file descriptor terminates."
>
> This is problematic since you can't close any file descriptor without
> dropping all your POSIX locks. Most userland file servers therefore
> end up opening the file with more access than is really necessary, and
> keeping fd's open for longer than is necessary to work around this.
>
> This patchset is a first stab at an approach to address this problem by
> adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is short
> for "private" -- I'm open to changing that if you have a better
> mnemonic).
>
> For all intents and purposes these lock types act just like their
> "non-P" counterpart. The difference is that they are only implicitly
> released when the fd against which they were acquired is closed. As a
> side effect, these locks cannot be merged with "non-P" locks since they
> have different semantics on close.
>
> I've given this patchset some very basic smoke testing and it seems to
> do the right thing, but it is still pretty rough. If this looks
> reasonable I'll plan to do some documentation updates and will take a
> stab at trying to get these new lock types added to the POSIX spec (as
> HCH recommended).
>
> At this point, my main questions are:
>
> 1) does this look useful, particularly for fileserver implementors?
>
> 2) does this look OK API-wise? We could consider different "cmd" values
> or even different syscalls, but I figured this makes it clearer that
> "P" and "non-P" locks will still conflict with one another.
>
> Jeff Layton (5):
> locks: consolidate checks for compatible filp->f_mode values in setlk
> handlers
> locks: add definitions for F_RDLCKP and F_WRLCKP
> locks: skip FL_FILP_PRIVATE locks on close unless we're closing the
> correct filp
> locks: handle merging of locks when FL_FILP_PRIVATE is set
> locks: show private lock types in /proc/locks
>
> fs/locks.c | 121 ++++++++++++++++++++++++++-------------
> include/linux/fs.h | 1 +
> include/uapi/asm-generic/fcntl.h | 9 +++
> 3 files changed, 91 insertions(+), 40 deletions(-)
>
...and I went ahead and opened a bug at the Austin Group to begin
discussion on adding whatever interface we come up with to the spec.
That should help open the floor for other players in the POSIX standard
community to add input:
http://austingroupbugs.net/view.php?id=768
Cheers,
--
Jeff Layton <[email protected]>
> > At LSF this year, there was a discussion about the "wishlist" for
> > userland file servers. One of the things brought up was the goofy and
> > problematic behavior of POSIX locks when a file is closed. Boaz
> > started a thread on it here:
> >
> > http://permalink.gmane.org/gmane.linux.file-systems/73364
> >
> > Userland fileservers often need to maintain more than one open file
> > descriptor on a file. The POSIX spec says:
> >
> > "All locks associated with a file for a given process shall be removed
> > when a file descriptor for that file is closed by that process or the
> > process holding that file descriptor terminates."
> >
> > This is problematic since you can't close any file descriptor without
> > dropping all your POSIX locks. Most userland file servers therefore
> > end up opening the file with more access than is really necessary, and
> > keeping fd's open for longer than is necessary to work around this.
> >
> > This patchset is a first stab at an approach to address this problem
> > by adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is
> > short for "private" -- I'm open to changing that if you have a better
> > mnemonic).
> >
> > For all intents and purposes these lock types act just like their
> > "non-P" counterpart. The difference is that they are only implicitly
> > released when the fd against which they were acquired is closed. As a
> > side effect, these locks cannot be merged with "non-P" locks since
> > they have different semantics on close.
> >
> > I've given this patchset some very basic smoke testing and it seems to
> > do the right thing, but it is still pretty rough. If this looks
> > reasonable I'll plan to do some documentation updates and will take a
> > stab at trying to get these new lock types added to the POSIX spec (as
> > HCH recommended).
> >
> > At this point, my main questions are:
> >
> > 1) does this look useful, particularly for fileserver implementors?
> >
> > 2) does this look OK API-wise? We could consider different "cmd" values
> > or even different syscalls, but I figured this makes it clearer that
> > "P" and "non-P" locks will still conflict with one another.
This is a good start.
I'd prefer a model where the private locks are maintained even if all file
descriptors are closed and released on garbage collection when the process
terminates. The model presented would require a server to potentially have
at least two file descriptors open (the descriptor originally used for the
locks, and a descriptor used for current access mode needed for some I/O
operation). The server will also need to "remember" to do all locks using
the first file descriptor.
Another thing that would be very useful for servers is to be able to specify
an arbitrary lock owner. Currently, Ganesha has to manage a union of all
locks held on a file and carefully pick it apart when a client does an
unlock. Allowing a process specified owner would allow Ganesha (or other
servers) to have separate locks for each client lock owner.
Frank
On Fri, 11 Oct 2013 08:20:59 -0700
"Frank Filz" <[email protected]> wrote:
> > > At LSF this year, there was a discussion about the "wishlist" for
> > > userland file servers. One of the things brought up was the goofy and
> > > problematic behavior of POSIX locks when a file is closed. Boaz
> > > started a thread on it here:
> > >
> > > http://permalink.gmane.org/gmane.linux.file-systems/73364
> > >
> > > Userland fileservers often need to maintain more than one open file
> > > descriptor on a file. The POSIX spec says:
> > >
> > > "All locks associated with a file for a given process shall be removed
> > > when a file descriptor for that file is closed by that process or the
> > > process holding that file descriptor terminates."
> > >
> > > This is problematic since you can't close any file descriptor without
> > > dropping all your POSIX locks. Most userland file servers therefore
> > > end up opening the file with more access than is really necessary, and
> > > keeping fd's open for longer than is necessary to work around this.
> > >
> > > This patchset is a first stab at an approach to address this problem
> > > by adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is
> > > short for "private" -- I'm open to changing that if you have a better
> > > mnemonic).
> > >
> > > For all intents and purposes these lock types act just like their
> > > "non-P" counterpart. The difference is that they are only implicitly
> > > released when the fd against which they were acquired is closed. As a
> > > side effect, these locks cannot be merged with "non-P" locks since
> > > they have different semantics on close.
> > >
> > > I've given this patchset some very basic smoke testing and it seems to
> > > do the right thing, but it is still pretty rough. If this looks
> > > reasonable I'll plan to do some documentation updates and will take a
> > > stab at trying to get these new lock types added to the POSIX spec (as
> > > HCH recommended).
> > >
> > > At this point, my main questions are:
> > >
> > > 1) does this look useful, particularly for fileserver implementors?
> > >
> > > 2) does this look OK API-wise? We could consider different "cmd" values
> > > or even different syscalls, but I figured this makes it clearer that
> > > "P" and "non-P" locks will still conflict with one another.
>
> This is a good start.
>
> I'd prefer a model where the private locks are maintained even if all file
> descriptors are closed and released on garbage collection when the process
> terminates. The model presented would require a server to potentially have
> at least two file descriptors open (the descriptor originally used for the
> locks, and a descriptor used for current access mode needed for some I/O
> operation). The server will also need to "remember" to do all locks using
> the first file descriptor.
>
That's sort of a non-starter, I think at least in Linux. If you have no
open file descriptor then you have nothing to hang the lock off of.
That sort of interface sounds error-prone and "leaky" too. A long
running process could easily end up leaking POSIX locks over time if
you forget to explicitly unlock them.
> Another thing that would be very useful for servers is to be able to specify
> an arbitrary lock owner. Currently, Ganesha has to manage a union of all
> locks held on a file and carefully pick it apart when a client does an
> unlock. Allowing a process specified owner would allow Ganesha (or other
> servers) to have separate locks for each client lock owner.
>
The trivial answer there would be to give each lockowner its own file
descriptor, right?
--
Jeff Layton <[email protected]>
> > > > At LSF this year, there was a discussion about the "wishlist" for
> > > > userland file servers. One of the things brought up was the goofy
> > > > and problematic behavior of POSIX locks when a file is closed.
> > > > Boaz started a thread on it here:
> > > >
> > > > http://permalink.gmane.org/gmane.linux.file-systems/73364
> > > >
> > > > Userland fileservers often need to maintain more than one open
> > > > file descriptor on a file. The POSIX spec says:
> > > >
> > > > "All locks associated with a file for a given process shall be
> > > > removed when a file descriptor for that file is closed by that
> > > > process or the process holding that file descriptor terminates."
> > > >
> > > > This is problematic since you can't close any file descriptor
> > > > without dropping all your POSIX locks. Most userland file servers
> > > > therefore end up opening the file with more access than is really
> > > > necessary, and keeping fd's open for longer than is necessary to
work
> around this.
> > > >
> > > > This patchset is a first stab at an approach to address this
> > > > problem by adding two new l_type values -- F_RDLCKP and F_WRLCKP
> > > > (the 'P' is short for "private" -- I'm open to changing that if
> > > > you have a better mnemonic).
> > > >
> > > > For all intents and purposes these lock types act just like their
> > > > "non-P" counterpart. The difference is that they are only
> > > > implicitly released when the fd against which they were acquired
> > > > is closed. As a side effect, these locks cannot be merged with
> > > > "non-P" locks since they have different semantics on close.
> > > >
> > > > I've given this patchset some very basic smoke testing and it
> > > > seems to do the right thing, but it is still pretty rough. If this
> > > > looks reasonable I'll plan to do some documentation updates and
> > > > will take a stab at trying to get these new lock types added to
> > > > the POSIX spec (as HCH recommended).
> > > >
> > > > At this point, my main questions are:
> > > >
> > > > 1) does this look useful, particularly for fileserver implementors?
> > > >
> > > > 2) does this look OK API-wise? We could consider different "cmd"
> values
> > > > or even different syscalls, but I figured this makes it clearer
that
> > > > "P" and "non-P" locks will still conflict with one another.
> >
> > This is a good start.
> >
> > I'd prefer a model where the private locks are maintained even if all
> > file descriptors are closed and released on garbage collection when
> > the process terminates. The model presented would require a server to
> > potentially have at least two file descriptors open (the descriptor
> > originally used for the locks, and a descriptor used for current
> > access mode needed for some I/O operation). The server will also need
> > to "remember" to do all locks using the first file descriptor.
> >
>
> That's sort of a non-starter, I think at least in Linux. If you have no
open file
> descriptor then you have nothing to hang the lock off of.
> That sort of interface sounds error-prone and "leaky" too. A long running
> process could easily end up leaking POSIX locks over time if you forget to
> explicitly unlock them.
There is a point there, however see below for discussion of file descriptor
resources.
> > Another thing that would be very useful for servers is to be able to
> > specify an arbitrary lock owner. Currently, Ganesha has to manage a
> > union of all locks held on a file and carefully pick it apart when a
> > client does an unlock. Allowing a process specified owner would allow
> > Ganesha (or other
> > servers) to have separate locks for each client lock owner.
> >
>
> The trivial answer there would be to give each lockowner its own file
> descriptor, right?
Hmm, that would be a solution (of course that would imply that private locks
held by the same process but by different file descriptors would conflict
appropriately).
There is a resource issue though of how many file descriptors we have open.
Is there any practical limit on the number of file descriptors a process has
open? Can the kernel support 1000s of descriptors? How much resource does a
file descriptor take? Looks like a struct file isn't tiny, not quite sure
just how big it is.
There is also some consideration of how this interacts with share
reservations (where is that proposal going BTW?). But I don't think this
really introduces anything new. We still have to guess the best access mode
to open a file descriptor that will be used for locks no matter how we
implement this.
So I guess my big concern is the resource impact of lots of file
descriptors.
Frank
On Fri, 11 Oct 2013 10:07:30 -0700
"Frank Filz" <[email protected]> wrote:
> > > > > At LSF this year, there was a discussion about the "wishlist" for
> > > > > userland file servers. One of the things brought up was the goofy
> > > > > and problematic behavior of POSIX locks when a file is closed.
> > > > > Boaz started a thread on it here:
> > > > >
> > > > > http://permalink.gmane.org/gmane.linux.file-systems/73364
> > > > >
> > > > > Userland fileservers often need to maintain more than one open
> > > > > file descriptor on a file. The POSIX spec says:
> > > > >
> > > > > "All locks associated with a file for a given process shall be
> > > > > removed when a file descriptor for that file is closed by that
> > > > > process or the process holding that file descriptor terminates."
> > > > >
> > > > > This is problematic since you can't close any file descriptor
> > > > > without dropping all your POSIX locks. Most userland file servers
> > > > > therefore end up opening the file with more access than is really
> > > > > necessary, and keeping fd's open for longer than is necessary to
> work
> > around this.
> > > > >
> > > > > This patchset is a first stab at an approach to address this
> > > > > problem by adding two new l_type values -- F_RDLCKP and F_WRLCKP
> > > > > (the 'P' is short for "private" -- I'm open to changing that if
> > > > > you have a better mnemonic).
> > > > >
> > > > > For all intents and purposes these lock types act just like their
> > > > > "non-P" counterpart. The difference is that they are only
> > > > > implicitly released when the fd against which they were acquired
> > > > > is closed. As a side effect, these locks cannot be merged with
> > > > > "non-P" locks since they have different semantics on close.
> > > > >
> > > > > I've given this patchset some very basic smoke testing and it
> > > > > seems to do the right thing, but it is still pretty rough. If this
> > > > > looks reasonable I'll plan to do some documentation updates and
> > > > > will take a stab at trying to get these new lock types added to
> > > > > the POSIX spec (as HCH recommended).
> > > > >
> > > > > At this point, my main questions are:
> > > > >
> > > > > 1) does this look useful, particularly for fileserver implementors?
> > > > >
> > > > > 2) does this look OK API-wise? We could consider different "cmd"
> > values
> > > > > or even different syscalls, but I figured this makes it clearer
> that
> > > > > "P" and "non-P" locks will still conflict with one another.
> > >
> > > This is a good start.
> > >
> > > I'd prefer a model where the private locks are maintained even if all
> > > file descriptors are closed and released on garbage collection when
> > > the process terminates. The model presented would require a server to
> > > potentially have at least two file descriptors open (the descriptor
> > > originally used for the locks, and a descriptor used for current
> > > access mode needed for some I/O operation). The server will also need
> > > to "remember" to do all locks using the first file descriptor.
> > >
> >
> > That's sort of a non-starter, I think at least in Linux. If you have no
> open file
> > descriptor then you have nothing to hang the lock off of.
> > That sort of interface sounds error-prone and "leaky" too. A long running
> > process could easily end up leaking POSIX locks over time if you forget to
> > explicitly unlock them.
>
> There is a point there, however see below for discussion of file descriptor
> resources.
>
> > > Another thing that would be very useful for servers is to be able to
> > > specify an arbitrary lock owner. Currently, Ganesha has to manage a
> > > union of all locks held on a file and carefully pick it apart when a
> > > client does an unlock. Allowing a process specified owner would allow
> > > Ganesha (or other
> > > servers) to have separate locks for each client lock owner.
> > >
> >
> > The trivial answer there would be to give each lockowner its own file
> > descriptor, right?
>
> Hmm, that would be a solution (of course that would imply that private locks
> held by the same process but by different file descriptors would conflict
> appropriately).
>
Good point. In the implementation I have so far, POSIX locks held by
the same process don't conflict, just like normal POSIX locks do. For
these sorts of locks, I think it would make sense to have more
flock()-like behavior there, such that locks held by the same process
on different file descriptors will still conflict. I'll plan to make
that change on the next pass.
> There is a resource issue though of how many file descriptors we have open.
> Is there any practical limit on the number of file descriptors a process has
> open? Can the kernel support 1000s of descriptors? How much resource does a
> file descriptor take? Looks like a struct file isn't tiny, not quite sure
> just how big it is.
>
> There is also some consideration of how this interacts with share
> reservations (where is that proposal going BTW?). But I don't think this
> really introduces anything new. We still have to guess the best access mode
> to open a file descriptor that will be used for locks no matter how we
> implement this.
>
At least in the currently proposed patchset by Pavel, share
reservations are orthogonal to these since they're based on LOCK_MAND
flock() locks.
> So I guess my big concern is the resource impact of lots of file
> descriptors.
>
That's understandable. I'm not clear on how big an overhead there is on
maintaining an open file descriptor. OTOH, people use flock() and such
and it has similar requirements.
I guess my main concern is that while I'm interested in adding
interfaces that make it _easier_ to implement fileservers, I'm not
terribly interested in adding interfaces that are _specific_ to
implementing them.
Whatever interface we add needs to be generic enough to be useful to
other applications as well. The changes you're suggesting sound rather
specific to a particular use-case.
--
Jeff Layton <[email protected]>
> > > > > > At LSF this year, there was a discussion about the "wishlist"
> > > > > > for userland file servers. One of the things brought up was
> > > > > > the goofy and problematic behavior of POSIX locks when a file is
> closed.
> > > > > > Boaz started a thread on it here:
> > > > > >
> > > > > > http://permalink.gmane.org/gmane.linux.file-systems/73364
> > > > > >
> > > > > > Userland fileservers often need to maintain more than one open
> > > > > > file descriptor on a file. The POSIX spec says:
> > > > > >
> > > > > > "All locks associated with a file for a given process shall be
> > > > > > removed when a file descriptor for that file is closed by that
> > > > > > process or the process holding that file descriptor terminates."
> > > > > >
> > > > > > This is problematic since you can't close any file descriptor
> > > > > > without dropping all your POSIX locks. Most userland file
> > > > > > servers therefore end up opening the file with more access
> > > > > > than is really necessary, and keeping fd's open for longer
> > > > > > than is necessary to
> > work
> > > around this.
> > > > > >
> > > > > > This patchset is a first stab at an approach to address this
> > > > > > problem by adding two new l_type values -- F_RDLCKP and
> > > > > > F_WRLCKP (the 'P' is short for "private" -- I'm open to
> > > > > > changing that if you have a better mnemonic).
> > > > > >
> > > > > > For all intents and purposes these lock types act just like
> > > > > > their "non-P" counterpart. The difference is that they are
> > > > > > only implicitly released when the fd against which they were
> > > > > > acquired is closed. As a side effect, these locks cannot be
> > > > > > merged with "non-P" locks since they have different semantics on
> close.
> > > > > >
> > > > > > I've given this patchset some very basic smoke testing and it
> > > > > > seems to do the right thing, but it is still pretty rough. If
> > > > > > this looks reasonable I'll plan to do some documentation
> > > > > > updates and will take a stab at trying to get these new lock
> > > > > > types added to the POSIX spec (as HCH recommended).
> > > > > >
> > > > > > At this point, my main questions are:
> > > > > >
> > > > > > 1) does this look useful, particularly for fileserver
implementors?
> > > > > >
> > > > > > 2) does this look OK API-wise? We could consider different "cmd"
> > > values
> > > > > > or even different syscalls, but I figured this makes it
> > > > > > clearer
> > that
> > > > > > "P" and "non-P" locks will still conflict with one another.
> > > >
> > > > This is a good start.
> > > >
> > > > I'd prefer a model where the private locks are maintained even if
> > > > all file descriptors are closed and released on garbage collection
> > > > when the process terminates. The model presented would require a
> > > > server to potentially have at least two file descriptors open (the
> > > > descriptor originally used for the locks, and a descriptor used
> > > > for current access mode needed for some I/O operation). The server
> > > > will also need to "remember" to do all locks using the first file
> descriptor.
> > > >
> > >
> > > That's sort of a non-starter, I think at least in Linux. If you have
> > > no
> > open file
> > > descriptor then you have nothing to hang the lock off of.
> > > That sort of interface sounds error-prone and "leaky" too. A long
> > > running process could easily end up leaking POSIX locks over time if
> > > you forget to explicitly unlock them.
> >
> > There is a point there, however see below for discussion of file
> > descriptor resources.
> >
> > > > Another thing that would be very useful for servers is to be able
> > > > to specify an arbitrary lock owner. Currently, Ganesha has to
> > > > manage a union of all locks held on a file and carefully pick it
> > > > apart when a client does an unlock. Allowing a process specified
> > > > owner would allow Ganesha (or other
> > > > servers) to have separate locks for each client lock owner.
> > > >
> > >
> > > The trivial answer there would be to give each lockowner its own
> > > file descriptor, right?
> >
> > Hmm, that would be a solution (of course that would imply that private
> > locks held by the same process but by different file descriptors would
> > conflict appropriately).
> >
>
> Good point. In the implementation I have so far, POSIX locks held by the
> same process don't conflict, just like normal POSIX locks do. For these
sorts
> of locks, I think it would make sense to have more flock()-like behavior
there,
> such that locks held by the same process on different file descriptors
will still
> conflict. I'll plan to make that change on the next pass.
Yea, and I think the "private" being part of the descriptor of these locks
will make those semantics make sense. These locks are private to the
particular file descriptor.
> > There is a resource issue though of how many file descriptors we have
> open.
> > Is there any practical limit on the number of file descriptors a
> > process has open? Can the kernel support 1000s of descriptors? How
> > much resource does a file descriptor take? Looks like a struct file
> > isn't tiny, not quite sure just how big it is.
> >
> > There is also some consideration of how this interacts with share
> > reservations (where is that proposal going BTW?). But I don't think
> > this really introduces anything new. We still have to guess the best
> > access mode to open a file descriptor that will be used for locks no
> > matter how we implement this.
> >
>
> At least in the currently proposed patchset by Pavel, share reservations
are
> orthogonal to these since they're based on LOCK_MAND
> flock() locks.
Sure, and I don't think they'll cause an issue. Hmm, what about delegations?
An out of tree file system I used to be associated with had issues with
Ganesha's opening files read/write because of the POSIX behavior caused fits
with delegations.
> > So I guess my big concern is the resource impact of lots of file
> > descriptors.
> >
>
> That's understandable. I'm not clear on how big an overhead there is on
> maintaining an open file descriptor. OTOH, people use flock() and such and
it
> has similar requirements.
Let's mull on that some more. Does anyone have a quick answer to the amount
of memory used by an open file descriptor, and the related question of how
many open file descriptors it's reasonable for one process to have?
> I guess my main concern is that while I'm interested in adding interfaces
that
> make it _easier_ to implement fileservers, I'm not terribly interested in
> adding interfaces that are _specific_ to implementing them.
>
> Whatever interface we add needs to be generic enough to be useful to other
> applications as well. The changes you're suggesting sound rather specific
to a
> particular use-case.
Sure, understood, though the Samba folks may have some input here also
(though at least they have a process per connection still right? So the
trouble would be a client with lots of processes running on it, not lots of
clients running one (or a few) process each.
Frank
On Fri, Oct 11, 2013 at 08:25:17AM -0400, Jeff Layton wrote:
> At LSF this year, there was a discussion about the "wishlist" for
> userland file servers. One of the things brought up was the goofy and
> problematic behavior of POSIX locks when a file is closed. Boaz started
> a thread on it here:
>
> http://permalink.gmane.org/gmane.linux.file-systems/73364
>
> Userland fileservers often need to maintain more than one open file
> descriptor on a file. The POSIX spec says:
>
> "All locks associated with a file for a given process shall be removed
> when a file descriptor for that file is closed by that process or the
> process holding that file descriptor terminates."
>
> This is problematic since you can't close any file descriptor without
> dropping all your POSIX locks. Most userland file servers therefore
> end up opening the file with more access than is really necessary, and
> keeping fd's open for longer than is necessary to work around this.
>
> This patchset is a first stab at an approach to address this problem by
> adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is short
> for "private" -- I'm open to changing that if you have a better
> mnemonic).
>
> For all intents and purposes these lock types act just like their
> "non-P" counterpart. The difference is that they are only implicitly
> released when the fd against which they were acquired is closed. As a
> side effect, these locks cannot be merged with "non-P" locks since they
> have different semantics on close.
They also can't be merged with "private" locks belonging to other file
descriptors. (If you merge lock A, then lock B into a single lock C,
what do you do when B's owner goes away?)
But based on other discussion in the thread it sounds like we'll
actually want to go farther and make them conflict, so, fine.
Seems like a good thing to have, anyway.
--b.
> I've given this patchset some very basic smoke testing and it seems to
> do the right thing, but it is still pretty rough. If this looks
> reasonable I'll plan to do some documentation updates and will take a
> stab at trying to get these new lock types added to the POSIX spec (as
> HCH recommended).
>
> At this point, my main questions are:
>
> 1) does this look useful, particularly for fileserver implementors?
>
> 2) does this look OK API-wise? We could consider different "cmd" values
> or even different syscalls, but I figured this makes it clearer that
> "P" and "non-P" locks will still conflict with one another.
>
> Jeff Layton (5):
> locks: consolidate checks for compatible filp->f_mode values in setlk
> handlers
> locks: add definitions for F_RDLCKP and F_WRLCKP
> locks: skip FL_FILP_PRIVATE locks on close unless we're closing the
> correct filp
> locks: handle merging of locks when FL_FILP_PRIVATE is set
> locks: show private lock types in /proc/locks
>
> fs/locks.c | 121 ++++++++++++++++++++++++++-------------
> include/linux/fs.h | 1 +
> include/uapi/asm-generic/fcntl.h | 9 +++
> 3 files changed, 91 insertions(+), 40 deletions(-)
>
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 11, 2013 at 08:25:17AM -0400, Jeff Layton wrote:
> At LSF this year, there was a discussion about the "wishlist" for
> userland file servers. One of the things brought up was the goofy and
> problematic behavior of POSIX locks when a file is closed. Boaz started
> a thread on it here:
>
> http://permalink.gmane.org/gmane.linux.file-systems/73364
>
> Userland fileservers often need to maintain more than one open file
> descriptor on a file. The POSIX spec says:
>
> "All locks associated with a file for a given process shall be removed
> when a file descriptor for that file is closed by that process or the
> process holding that file descriptor terminates."
>
> This is problematic since you can't close any file descriptor without
> dropping all your POSIX locks. Most userland file servers therefore
> end up opening the file with more access than is really necessary, and
> keeping fd's open for longer than is necessary to work around this.
>
> This patchset is a first stab at an approach to address this problem by
> adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is short
> for "private" -- I'm open to changing that if you have a better
> mnemonic).
>
> For all intents and purposes these lock types act just like their
> "non-P" counterpart. The difference is that they are only implicitly
> released when the fd against which they were acquired is closed. As a
> side effect, these locks cannot be merged with "non-P" locks since they
> have different semantics on close.
It isn't explicitly stated here, but will the POSIX and non-POSIX
locks interact with each other? For example, if one process is using
the old POSIX semantics for lock release, but another process is using
the new non-POSIX semantics for lock release, will the two locks
otherwise behave as expected and conflict with each other if needed?
Cheers, Andreas
> I've given this patchset some very basic smoke testing and it seems to
> do the right thing, but it is still pretty rough. If this looks
> reasonable I'll plan to do some documentation updates and will take a
> stab at trying to get these new lock types added to the POSIX spec (as
> HCH recommended).
>
> At this point, my main questions are:
>
> 1) does this look useful, particularly for fileserver implementors?
>
> 2) does this look OK API-wise? We could consider different "cmd" values
> or even different syscalls, but I figured this makes it clearer that
> "P" and "non-P" locks will still conflict with one another.
Cheers, Andreas
On Fri, 11 Oct 2013 15:36:43 -0600
Andreas Dilger <[email protected]> wrote:
> On Fri, Oct 11, 2013 at 08:25:17AM -0400, Jeff Layton wrote:
> > At LSF this year, there was a discussion about the "wishlist" for
> > userland file servers. One of the things brought up was the goofy and
> > problematic behavior of POSIX locks when a file is closed. Boaz started
> > a thread on it here:
> >
> > http://permalink.gmane.org/gmane.linux.file-systems/73364
> >
> > Userland fileservers often need to maintain more than one open file
> > descriptor on a file. The POSIX spec says:
> >
> > "All locks associated with a file for a given process shall be removed
> > when a file descriptor for that file is closed by that process or the
> > process holding that file descriptor terminates."
> >
> > This is problematic since you can't close any file descriptor without
> > dropping all your POSIX locks. Most userland file servers therefore
> > end up opening the file with more access than is really necessary, and
> > keeping fd's open for longer than is necessary to work around this.
> >
> > This patchset is a first stab at an approach to address this problem by
> > adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is short
> > for "private" -- I'm open to changing that if you have a better
> > mnemonic).
> >
> > For all intents and purposes these lock types act just like their
> > "non-P" counterpart. The difference is that they are only implicitly
> > released when the fd against which they were acquired is closed. As a
> > side effect, these locks cannot be merged with "non-P" locks since they
> > have different semantics on close.
>
> It isn't explicitly stated here, but will the POSIX and non-POSIX
> locks interact with each other? For example, if one process is using
> the old POSIX semantics for lock release, but another process is using
> the new non-POSIX semantics for lock release, will the two locks
> otherwise behave as expected and conflict with each other if needed?
>
> Cheers, Andreas
>
Yes, they'll still conflict with "legacy" POSIX locks. The idea is to
keep mutual exclusion with "legacy" POSIX locks, but give them more
flock()-like semantics with respect to dealing with multiple open file
descriptors. By doing this, we allow these programs to keep working in
parallel with those that don't use the new lock types.
> > I've given this patchset some very basic smoke testing and it seems to
> > do the right thing, but it is still pretty rough. If this looks
> > reasonable I'll plan to do some documentation updates and will take a
> > stab at trying to get these new lock types added to the POSIX spec (as
> > HCH recommended).
> >
> > At this point, my main questions are:
> >
> > 1) does this look useful, particularly for fileserver implementors?
> >
> > 2) does this look OK API-wise? We could consider different "cmd" values
> > or even different syscalls, but I figured this makes it clearer that
> > "P" and "non-P" locks will still conflict with one another.
>
>
> Cheers, Andreas
>
>
>
>
>
--
Jeff Layton <[email protected]>
On Fri, 11 Oct 2013 15:36:43 -0600 Andreas Dilger <[email protected]> wrote:
> >
> > At this point, my main questions are:
> >
> > 1) does this look useful, particularly for fileserver implementors?
Yes from the Samba perspective. We'll have to keep the old
code around for compatibility with non-Linux OS'es, but this
will allow Linux Samba to short-circuit a bunch of logic
we have to get around the insane POSIX locking semantics
on close.
Jeremy.
On Oct 11, 2013, at 19:49, Jeremy Allison <[email protected]> wrote:
> On Fri, 11 Oct 2013 15:36:43 -0600 Andreas Dilger <[email protected]> wrote:
>>>
>>> At this point, my main questions are:
>>>
>>> 1) does this look useful, particularly for fileserver implementors?
>
> Yes from the Samba perspective. We'll have to keep the old
> code around for compatibility with non-Linux OS'es, but this
> will allow Linux Samba to short-circuit a bunch of logic
> we have to get around the insane POSIX locking semantics
> on close.
>
> Jeremy.
>From the peanut gallery, IIRC from college a few years back, wasn't the POSIX file locking stuff passed by all parties because they intended to do their own thing regardless of the standard? The reason that all locks are blown on a release is mostly because there were already implementations and no one wanted to push the issue, or am I misunderstanding/forgetting the history of file locks in POSIX?-
On Fri, 11 Oct 2013 20:18:58 -0400
Scott Lovenberg <[email protected]> wrote:
>
> On Oct 11, 2013, at 19:49, Jeremy Allison <[email protected]> wrote:
>
> > On Fri, 11 Oct 2013 15:36:43 -0600 Andreas Dilger <[email protected]> wrote:
> >>>
> >>> At this point, my main questions are:
> >>>
> >>> 1) does this look useful, particularly for fileserver implementors?
> >
> > Yes from the Samba perspective. We'll have to keep the old
> > code around for compatibility with non-Linux OS'es, but this
> > will allow Linux Samba to short-circuit a bunch of logic
> > we have to get around the insane POSIX locking semantics
> > on close.
> >
> > Jeremy.
>
> From the peanut gallery, IIRC from college a few years back, wasn't the POSIX file locking stuff passed by all parties because they intended to do their own thing regardless of the standard? The reason that all locks are blown on a release is mostly because there were already implementations and no one wanted to push the issue, or am I misunderstanding/forgetting the history of file locks in POSIX?
This blog post of Jeremy's explains some of the history:
http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
See the section entitled "First Implementation Past the Post".
--
Jeff Layton <[email protected]>
Hi Jeff,
> At LSF this year, there was a discussion about the "wishlist" for
> userland file servers. One of the things brought up was the goofy and
> problematic behavior of POSIX locks when a file is closed. Boaz started
> a thread on it here:
>
> http://permalink.gmane.org/gmane.linux.file-systems/73364
>
> Userland fileservers often need to maintain more than one open file
> descriptor on a file. The POSIX spec says:
>
> "All locks associated with a file for a given process shall be removed
> when a file descriptor for that file is closed by that process or the
> process holding that file descriptor terminates."
>
> This is problematic since you can't close any file descriptor without
> dropping all your POSIX locks. Most userland file servers therefore
> end up opening the file with more access than is really necessary, and
> keeping fd's open for longer than is necessary to work around this.
>
> This patchset is a first stab at an approach to address this problem by
> adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is short
> for "private" -- I'm open to changing that if you have a better
> mnemonic).
>
> For all intents and purposes these lock types act just like their
> "non-P" counterpart. The difference is that they are only implicitly
> released when the fd against which they were acquired is closed. As a
> side effect, these locks cannot be merged with "non-P" locks since they
> have different semantics on close.
>
> I've given this patchset some very basic smoke testing and it seems to
> do the right thing, but it is still pretty rough. If this looks
> reasonable I'll plan to do some documentation updates and will take a
> stab at trying to get these new lock types added to the POSIX spec (as
> HCH recommended).
>
> At this point, my main questions are:
>
> 1) does this look useful, particularly for fileserver implementors?
>
> 2) does this look OK API-wise? We could consider different "cmd" values
> or even different syscalls, but I figured this makes it clearer that
> "P" and "non-P" locks will still conflict with one another.
>
> Jeff Layton (5):
> locks: consolidate checks for compatible filp->f_mode values in setlk
> handlers
> locks: add definitions for F_RDLCKP and F_WRLCKP
> locks: skip FL_FILP_PRIVATE locks on close unless we're closing the
> correct filp
> locks: handle merging of locks when FL_FILP_PRIVATE is set
> locks: show private lock types in /proc/locks
I haven't looked at the patches, but it would be very good to have locks
per "open" and not per "fd".
What happens in this example?
fd1 = open("/somefile", ...);
fd2 = open("/somefile", ...);
fd3 = dup(fd1);
lock(fd1, range1)
lock(fd2, range2)
lock(fd3, range3)
lock(fd2, range1) // => error already locked?
lock(fd3, range1) // stacked lock?
close(fd1)
lock(fd2, range1) // is range1 still locked by fd3 ?
What about fd-passing, will the locks be transferred/shared with the
other process?
metze
On Fri, Oct 11, 2013 at 11:53:30AM -0700, Frank Filz wrote:
> > I guess my main concern is that while I'm interested in adding interfaces
> that
> > make it _easier_ to implement fileservers, I'm not terribly interested in
> > adding interfaces that are _specific_ to implementing them.
> >
> > Whatever interface we add needs to be generic enough to be useful to other
> > applications as well. The changes you're suggesting sound rather specific
> to a
> > particular use-case.
>
> Sure, understood, though the Samba folks may have some input here also
> (though at least they have a process per connection still right? So the
Yes, correct. With the changes we have to make to support
SMB3 Multichannel this might change in the near future
tough.
Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[email protected]
On Sat, 12 Oct 2013 11:20:33 +0200
"Stefan (metze) Metzmacher" <[email protected]> wrote:
> Hi Jeff,
>
> > At LSF this year, there was a discussion about the "wishlist" for
> > userland file servers. One of the things brought up was the goofy and
> > problematic behavior of POSIX locks when a file is closed. Boaz started
> > a thread on it here:
> >
> > http://permalink.gmane.org/gmane.linux.file-systems/73364
> >
> > Userland fileservers often need to maintain more than one open file
> > descriptor on a file. The POSIX spec says:
> >
> > "All locks associated with a file for a given process shall be removed
> > when a file descriptor for that file is closed by that process or the
> > process holding that file descriptor terminates."
> >
> > This is problematic since you can't close any file descriptor without
> > dropping all your POSIX locks. Most userland file servers therefore
> > end up opening the file with more access than is really necessary, and
> > keeping fd's open for longer than is necessary to work around this.
> >
> > This patchset is a first stab at an approach to address this problem by
> > adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is short
> > for "private" -- I'm open to changing that if you have a better
> > mnemonic).
> >
> > For all intents and purposes these lock types act just like their
> > "non-P" counterpart. The difference is that they are only implicitly
> > released when the fd against which they were acquired is closed. As a
> > side effect, these locks cannot be merged with "non-P" locks since they
> > have different semantics on close.
> >
> > I've given this patchset some very basic smoke testing and it seems to
> > do the right thing, but it is still pretty rough. If this looks
> > reasonable I'll plan to do some documentation updates and will take a
> > stab at trying to get these new lock types added to the POSIX spec (as
> > HCH recommended).
> >
> > At this point, my main questions are:
> >
> > 1) does this look useful, particularly for fileserver implementors?
> >
> > 2) does this look OK API-wise? We could consider different "cmd" values
> > or even different syscalls, but I figured this makes it clearer that
> > "P" and "non-P" locks will still conflict with one another.
> >
> > Jeff Layton (5):
> > locks: consolidate checks for compatible filp->f_mode values in setlk
> > handlers
> > locks: add definitions for F_RDLCKP and F_WRLCKP
> > locks: skip FL_FILP_PRIVATE locks on close unless we're closing the
> > correct filp
> > locks: handle merging of locks when FL_FILP_PRIVATE is set
> > locks: show private lock types in /proc/locks
>
> I haven't looked at the patches, but it would be very good to have locks
> per "open" and not per "fd".
>
My intent is to make it "per-filp" (aka "struct file") in the same way
that flock() locks are today. Note that the patchset posted so far
doesn't quite have the right semantics yet.
Currently, I think that we want to give these locks flock-like
inheritance and close semantics, but to allow them to conflict with
"legacy" POSIX range locks.
> What happens in this example?
>
As I said, I haven't sat down to change the implementation yet, but
I'll try to answer this in the way that I think we'll want to do it...
> fd1 = open("/somefile", ...);
> fd2 = open("/somefile", ...);
> fd3 = dup(fd1);
>
At this point:
fd1 = filp1
fd2 = filp2
fd3 = filp1
...fd1 and fd3 both hold a reference to filp1.
> lock(fd1, range1)
> lock(fd2, range2)
> lock(fd3, range3)
>
I'll assume that lock() means setting a F_SETLK with F_WRLCKP
> lock(fd2, range1) // => error already locked?
>
Right. fd1 will hold the lock on range1 so -EAGAIN.
> lock(fd3, range1) // stacked lock?
>
Not stacked per-se, but replaced. Since fd1 == fd3, this lock() call
won't conflict and the new lock will replace the old one. Since the
range is the same though, there will be no real difference in the
outcome.
> close(fd1)
>
fput(filp1), but fd3 still has a reference so the lock won't be
released.
> lock(fd2, range1) // is range1 still locked by fd3 ?
>
Yep, still locked.
> What about fd-passing, will the locks be transferred/shared with the
> other process?
>
Yes, I think so. Locks will be passed to the other process in the same
way that flock() locks are today. AIUI, when you fork() you basically
dup() all the file descriptors of the parent so that's basically the
same as what happens above.
Again though, I'm still trying to settle on what the semantics should
be. None of this is etched in stone yet.
--
Jeff Layton <[email protected]>
> > > At LSF this year, there was a discussion about the "wishlist" for
> > > userland file servers. One of the things brought up was the goofy
> > > and problematic behavior of POSIX locks when a file is closed. Boaz
> > > started a thread on it here:
> > >
> > > http://permalink.gmane.org/gmane.linux.file-systems/73364
> > >
> > > Userland fileservers often need to maintain more than one open file
> > > descriptor on a file. The POSIX spec says:
> > >
> > > "All locks associated with a file for a given process shall be
> > > removed when a file descriptor for that file is closed by that
> > > process or the process holding that file descriptor terminates."
> > >
> > > This is problematic since you can't close any file descriptor
> > > without dropping all your POSIX locks. Most userland file servers
> > > therefore end up opening the file with more access than is really
> > > necessary, and keeping fd's open for longer than is necessary to work
> around this.
> > >
> > > This patchset is a first stab at an approach to address this problem
> > > by adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is
> > > short for "private" -- I'm open to changing that if you have a
> > > better mnemonic).
> > >
> > > For all intents and purposes these lock types act just like their
> > > "non-P" counterpart. The difference is that they are only implicitly
> > > released when the fd against which they were acquired is closed. As
> > > a side effect, these locks cannot be merged with "non-P" locks since
> > > they have different semantics on close.
> > >
> > > I've given this patchset some very basic smoke testing and it seems
> > > to do the right thing, but it is still pretty rough. If this looks
> > > reasonable I'll plan to do some documentation updates and will take
> > > a stab at trying to get these new lock types added to the POSIX spec
> > > (as HCH recommended).
> > >
> > > At this point, my main questions are:
> > >
> > > 1) does this look useful, particularly for fileserver implementors?
> > >
> > > 2) does this look OK API-wise? We could consider different "cmd"
values
> > > or even different syscalls, but I figured this makes it clearer
that
> > > "P" and "non-P" locks will still conflict with one another.
> > >
> > > Jeff Layton (5):
> > > locks: consolidate checks for compatible filp->f_mode values in
setlk
> > > handlers
> > > locks: add definitions for F_RDLCKP and F_WRLCKP
> > > locks: skip FL_FILP_PRIVATE locks on close unless we're closing the
> > > correct filp
> > > locks: handle merging of locks when FL_FILP_PRIVATE is set
> > > locks: show private lock types in /proc/locks
> >
> > I haven't looked at the patches, but it would be very good to have
> > locks per "open" and not per "fd".
> >
>
> My intent is to make it "per-filp" (aka "struct file") in the same way
that
> flock() locks are today. Note that the patchset posted so far doesn't
quite
> have the right semantics yet.
>
> Currently, I think that we want to give these locks flock-like inheritance
and
> close semantics, but to allow them to conflict with "legacy" POSIX range
> locks.
>
> > What happens in this example?
> >
>
> As I said, I haven't sat down to change the implementation yet, but I'll
try to
> answer this in the way that I think we'll want to do it...
>
> > fd1 = open("/somefile", ...);
> > fd2 = open("/somefile", ...);
> > fd3 = dup(fd1);
> >
>
> At this point:
>
> fd1 = filp1
> fd2 = filp2
> fd3 = filp1
>
> ...fd1 and fd3 both hold a reference to filp1.
>
> > lock(fd1, range1)
> > lock(fd2, range2)
> > lock(fd3, range3)
> >
>
> I'll assume that lock() means setting a F_SETLK with F_WRLCKP
>
> > lock(fd2, range1) // => error already locked?
> >
>
> Right. fd1 will hold the lock on range1 so -EAGAIN.
>
> > lock(fd3, range1) // stacked lock?
> >
>
> Not stacked per-se, but replaced. Since fd1 == fd3, this lock() call won't
> conflict and the new lock will replace the old one. Since the range is the
same
> though, there will be no real difference in the outcome.
>
> > close(fd1)
> >
>
> fput(filp1), but fd3 still has a reference so the lock won't be released.
>
> > lock(fd2, range1) // is range1 still locked by fd3 ?
> >
>
> Yep, still locked.
>
> > What about fd-passing, will the locks be transferred/shared with the
> > other process?
> >
>
> Yes, I think so. Locks will be passed to the other process in the same way
that
> flock() locks are today. AIUI, when you fork() you basically
> dup() all the file descriptors of the parent so that's basically the same
as what
> happens above.
>
> Again though, I'm still trying to settle on what the semantics should be.
None
> of this is etched in stone yet.
At a quick read, that sounds right to me, connect the locks to the kernel
struct file (filp) and we will get the desirable semantics you describe and
I think it will be easy to document the behavior.
Frank
> This blog post of Jeremy's explains some of the history:
>
>
> http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2
> .html
>
> See the section entitled "First Implementation Past the Post".
Interesting that Jeremy actually suggested the implementation should have
had an arbitrary lock owner as part of the flock structure:
"This is an example of a POSIX interface not being future-proofed against
modern techniques such as threading. A simple amendment to the original
primitive allowing a user-defined "locking context" (like a process id) to
be entered in the struct flock structure used to define the lock would have
fixed this problem, along with extra flags allowing the number of locks per
context to be recorded if needed."
But I'm happy with the lock context per kernel struct file as a solution,
especially since that will allow locks to be sensibly passed to a forked
process.
Another next step would be an asynchronous blocking lock...
Frank
On Fri, Oct 11, 2013 at 8:42 PM, Jeff Layton <[email protected]> wrote:
> On Fri, 11 Oct 2013 20:18:58 -0400
> Scott Lovenberg <[email protected]> wrote:
>
>>
>> On Oct 11, 2013, at 19:49, Jeremy Allison <[email protected]> wrote:
>>
>> > On Fri, 11 Oct 2013 15:36:43 -0600 Andreas Dilger <[email protected]> wrote:
>> >>>
>> >>> At this point, my main questions are:
>> >>>
>> >>> 1) does this look useful, particularly for fileserver implementors?
>> >
>> > Yes from the Samba perspective. We'll have to keep the old
>> > code around for compatibility with non-Linux OS'es, but this
>> > will allow Linux Samba to short-circuit a bunch of logic
>> > we have to get around the insane POSIX locking semantics
>> > on close.
>> >
>> > Jeremy.
>>
>> From the peanut gallery, IIRC from college a few years back, wasn't the POSIX file locking stuff passed by all parties because they intended to do their own thing regardless of the standard? The reason that all locks are blown on a release is mostly because there were already implementations and no one wanted to push the issue, or am I misunderstanding/forgetting the history of file locks in POSIX?
>
> This blog post of Jeremy's explains some of the history:
>
> http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
>
> See the section entitled "First Implementation Past the Post".
>
> --
> Jeff Layton <[email protected]>
Thanks, Jeff. That was actually the exact article I was referencing
but forgetting the details of. Jeremy, thanks for writing that up so
many years ago (I used to eat that stuff up in college).
--
Peace and Blessings,
-Scott.
On Sat, Oct 12, 2013 at 11:12:03AM -0700, Frank Filz wrote:
> > This blog post of Jeremy's explains some of the history:
> >
> >
> > http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2
> > .html
> >
> > See the section entitled "First Implementation Past the Post".
>
> Interesting that Jeremy actually suggested the implementation should have
> had an arbitrary lock owner as part of the flock structure:
>
> "This is an example of a POSIX interface not being future-proofed against
> modern techniques such as threading. A simple amendment to the original
> primitive allowing a user-defined "locking context" (like a process id) to
> be entered in the struct flock structure used to define the lock would have
> fixed this problem, along with extra flags allowing the number of locks per
> context to be recorded if needed."
>
> But I'm happy with the lock context per kernel struct file as a solution,
> especially since that will allow locks to be sensibly passed to a forked
> process.
>
> Another next step would be an asynchronous blocking lock...
Yes, please :-)
Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[email protected]
> http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2
> > > .html
> > >
> > > See the section entitled "First Implementation Past the Post".
> >
> > Interesting that Jeremy actually suggested the implementation should
> > have had an arbitrary lock owner as part of the flock structure:
> >
> > "This is an example of a POSIX interface not being future-proofed
> > against modern techniques such as threading. A simple amendment to the
> > original primitive allowing a user-defined "locking context" (like a
> > process id) to be entered in the struct flock structure used to define
> > the lock would have fixed this problem, along with extra flags
> > allowing the number of locks per context to be recorded if needed."
> >
> > But I'm happy with the lock context per kernel struct file as a
> > solution, especially since that will allow locks to be sensibly passed
> > to a forked process.
> >
> > Another next step would be an asynchronous blocking lock...
>
> Yes, please :-)
What model would be useful to you (and for what project)? One thing I could
think of is since we have a file descriptor for each lock owner/file pair,
we could do something like select on those descriptors, got to think about
how that would actually work though... The vfs lock layer does inherently
support a kernel call back when a blocked lock can be unblocked, so we just
need to figure out the best way to hook that up to user space in a way that
doesn't require a thread per blocked lock.
Frank
On Mon, Oct 14, 2013 at 08:23:03AM -0700, Frank Filz wrote:
> > http://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2
> > > > .html
> > > >
> > > > See the section entitled "First Implementation Past the Post".
> > >
> > > Interesting that Jeremy actually suggested the implementation should
> > > have had an arbitrary lock owner as part of the flock structure:
> > >
> > > "This is an example of a POSIX interface not being future-proofed
> > > against modern techniques such as threading. A simple amendment to the
> > > original primitive allowing a user-defined "locking context" (like a
> > > process id) to be entered in the struct flock structure used to define
> > > the lock would have fixed this problem, along with extra flags
> > > allowing the number of locks per context to be recorded if needed."
> > >
> > > But I'm happy with the lock context per kernel struct file as a
> > > solution, especially since that will allow locks to be sensibly passed
> > > to a forked process.
> > >
> > > Another next step would be an asynchronous blocking lock...
> >
> > Yes, please :-)
>
> What model would be useful to you (and for what project)? One thing I could
It's ctdb that would be mainly interested in this. ctdb
deals a lot with out tdb files, a shared mmap key/value
database protected by fcntl locks. ctdb is the database
daemon distributing records in a cluster. It is a
single-threaded async event loop, and it has to fork helper
processes waiting for locks.
> think of is since we have a file descriptor for each lock owner/file pair,
> we could do something like select on those descriptors, got to think about
> how that would actually work though... The vfs lock layer does inherently
> support a kernel call back when a blocked lock can be unblocked, so we just
> need to figure out the best way to hook that up to user space in a way that
> doesn't require a thread per blocked lock.
A model that would probably work for us is one file
descriptor that becomes readable when one of the blocking
lock states changes. To signal which one changed, I think
passing an opaque uint64 (usable as a pointer) for the
blocking lock would be great, or possibly something like
epoll_data_t. We would pass this in the fcntl call and read
it from the signal, possibly together with an errno
(EDEADLK?). Not sure if this is feasible kernel-side, but I
believe this is something that would work for us user-side.
Volker
--
SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:[email protected]