2014-08-12 14:48:19

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

In the days of yore, the file locking code was primarily protected by
the BKL. That changed in commit 72f98e72551fa (locks: turn lock_flocks
into a spinlock), at which point the code was changed to be protected
by a conventional spinlock (mostly due to a push to finally eliminate
the BKL). Since then, the code has been changed to use the i_lock
instead of a global spinlock, but it's still under a spinlock.

With that change, several functions now no longer can block when they
originally could. This is a particular problem with the
fl_release_private operation. In NFSv4, that operation is used to kick
off a RELEASE_LOCKOWNER or FREE_STATEID call, and that requires being
able to do an allocation.

This was reported by Josh Stone here:

https://bugzilla.redhat.com/show_bug.cgi?id=1089092

My initial stab at fixing this involved moving this to a workqueue, but
Trond pointed out the above change was technically a regression with the
way the spinlocking in the file locking code works, and suggested an
alternate approach to fixing it.

This set focuses on moving most of the locks_release_private calls
outside of the inode->i_lock. There are still a few that are done
under the i_lock in the lease handling code. Cleaning up the use of
the i_lock in the lease code is a larger project which we'll have to
tackle at some point, but there are some other cleanups that will
need to happen first.

Absent any objections, I'll plan to merge these for 3.18.

Jeff Layton (5):
locks: don't call locks_release_private from locks_copy_lock
locks: don't reuse file_lock in __posix_lock_file
locks: defer freeing locks in locks_delete_lock until after i_lock has
been dropped
locks: move locks_free_lock calls in do_fcntl_add_lease outside
spinlock
locks: update Locking documentation to clarify fl_release_private
behavior

Documentation/filesystems/Locking | 6 ++-
fs/locks.c | 80 +++++++++++++++++++++++++--------------
2 files changed, 57 insertions(+), 29 deletions(-)

--
1.9.3



2014-08-12 14:48:23

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 3/5] locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped

In commit 72f98e72551fa (locks: turn lock_flocks into a spinlock), we
moved from using the BKL to a global spinlock. With this change, we lost
the ability to block in the fl_release_private operation.

This is problematic for NFS (and probably some other filesystems as
well). Add a new list_head argument to locks_delete_lock. If that
argument is non-NULL, then queue any locks that we want to free to the
list instead of freeing them.

Then, add a new locks_dispose_list function that will walk such a list
and call locks_free_lock on them after the i_lock has been dropped.

Finally, change all of the callers of locks_delete_lock to pass in a
list_head, except for lease_modify. That function can be called long
after the i_lock has been acquired. Deferring the freeing of a lease
after unlocking it in that function is non-trivial until we overhaul
some of the spinlocking in the lease code.

Currently though, no filesystem that sets fl_release_private supports
leases, so this is not currently a problem. We'll eventually want to
make the same change in the lease code, but it needs a lot more work
before we can reasonably do so.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 7dd4defb4d8d..4ce087cca501 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -247,6 +247,18 @@ void locks_free_lock(struct file_lock *fl)
}
EXPORT_SYMBOL(locks_free_lock);

+static void
+locks_dispose_list(struct list_head *dispose)
+{
+ struct file_lock *fl;
+
+ while (!list_empty(dispose)) {
+ fl = list_first_entry(dispose, struct file_lock, fl_block);
+ list_del_init(&fl->fl_block);
+ locks_free_lock(fl);
+ }
+}
+
void locks_init_lock(struct file_lock *fl)
{
memset(fl, 0, sizeof(struct file_lock));
@@ -651,12 +663,16 @@ static void locks_unlink_lock(struct file_lock **thisfl_p)
*
* Must be called with i_lock held!
*/
-static void locks_delete_lock(struct file_lock **thisfl_p)
+static void locks_delete_lock(struct file_lock **thisfl_p,
+ struct list_head *dispose)
{
struct file_lock *fl = *thisfl_p;

locks_unlink_lock(thisfl_p);
- locks_free_lock(fl);
+ if (dispose)
+ list_add(&fl->fl_block, dispose);
+ else
+ locks_free_lock(fl);
}

/* Determine if lock sys_fl blocks lock caller_fl. Common functionality
@@ -812,6 +828,7 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
struct inode * inode = file_inode(filp);
int error = 0;
int found = 0;
+ LIST_HEAD(dispose);

if (!(request->fl_flags & FL_ACCESS) && (request->fl_type != F_UNLCK)) {
new_fl = locks_alloc_lock();
@@ -834,7 +851,7 @@ static int flock_lock_file(struct file *filp, struct file_lock *request)
if (request->fl_type == fl->fl_type)
goto out;
found = 1;
- locks_delete_lock(before);
+ locks_delete_lock(before, &dispose);
break;
}

@@ -881,6 +898,7 @@ out:
spin_unlock(&inode->i_lock);
if (new_fl)
locks_free_lock(new_fl);
+ locks_dispose_list(&dispose);
return error;
}

@@ -894,6 +912,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
struct file_lock **before;
int error;
bool added = false;
+ LIST_HEAD(dispose);

/*
* We may need two file_lock structures for this operation,
@@ -989,7 +1008,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
else
request->fl_end = fl->fl_end;
if (added) {
- locks_delete_lock(before);
+ locks_delete_lock(before, &dispose);
continue;
}
request = fl;
@@ -1019,7 +1038,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
* one (This may happen several times).
*/
if (added) {
- locks_delete_lock(before);
+ locks_delete_lock(before, &dispose);
continue;
}
/*
@@ -1035,7 +1054,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
locks_copy_lock(new_fl, request);
request = new_fl;
new_fl = NULL;
- locks_delete_lock(before);
+ locks_delete_lock(before, &dispose);
locks_insert_lock(before, request);
added = true;
}
@@ -1097,6 +1116,7 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
locks_free_lock(new_fl);
if (new_fl2)
locks_free_lock(new_fl2);
+ locks_dispose_list(&dispose);
return error;
}

@@ -1272,7 +1292,7 @@ int lease_modify(struct file_lock **before, int arg)
printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync);
fl->fl_fasync = NULL;
}
- locks_delete_lock(before);
+ locks_delete_lock(before, NULL);
}
return 0;
}
@@ -2324,6 +2344,7 @@ void locks_remove_file(struct file *filp)
struct inode * inode = file_inode(filp);
struct file_lock *fl;
struct file_lock **before;
+ LIST_HEAD(dispose);

if (!inode->i_flock)
return;
@@ -2369,12 +2390,13 @@ void locks_remove_file(struct file *filp)
fl->fl_type, fl->fl_flags,
fl->fl_start, fl->fl_end);

- locks_delete_lock(before);
+ locks_delete_lock(before, &dispose);
continue;
}
before = &fl->fl_next;
}
spin_unlock(&inode->i_lock);
+ locks_dispose_list(&dispose);
}

/**
--
1.9.3


2014-08-12 14:48:22

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 2/5] locks: don't reuse file_lock in __posix_lock_file

Currently in the case where a new file lock completely replaces the old
one, we end up overwriting the existing lock with the new info. This
means that we have to call fl_release_private inside i_lock. Change the
code to instead copy the info to new_fl, insert that lock into the
correct spot and then delete the old lock. In a later patch, we'll defer
the freeing of the old lock until after the i_lock has been dropped.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 2c2d4f5022a7..7dd4defb4d8d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1022,18 +1022,21 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
locks_delete_lock(before);
continue;
}
- /* Replace the old lock with the new one.
- * Wake up anybody waiting for the old one,
- * as the change in lock type might satisfy
- * their needs.
+ /*
+ * Replace the old lock with new_fl, and
+ * remove the old one. It's safe to do the
+ * insert here since we know that we won't be
+ * using new_fl later, and that the lock is
+ * just replacing an existing lock.
*/
- locks_wake_up_blocks(fl);
- fl->fl_start = request->fl_start;
- fl->fl_end = request->fl_end;
- fl->fl_type = request->fl_type;
- locks_release_private(fl);
- locks_copy_private(fl, request);
- request = fl;
+ error = -ENOLCK;
+ if (!new_fl)
+ goto out;
+ locks_copy_lock(new_fl, request);
+ request = new_fl;
+ new_fl = NULL;
+ locks_delete_lock(before);
+ locks_insert_lock(before, request);
added = true;
}
}
--
1.9.3


2014-08-12 17:43:15

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

On Tue, 12 Aug 2014 10:48:08 -0400
Jeff Layton <[email protected]> wrote:

> In the days of yore, the file locking code was primarily protected by
> the BKL. That changed in commit 72f98e72551fa (locks: turn lock_flocks
> into a spinlock), at which point the code was changed to be protected
> by a conventional spinlock (mostly due to a push to finally eliminate
> the BKL). Since then, the code has been changed to use the i_lock
> instead of a global spinlock, but it's still under a spinlock.
>
> With that change, several functions now no longer can block when they
> originally could. This is a particular problem with the
> fl_release_private operation. In NFSv4, that operation is used to kick
> off a RELEASE_LOCKOWNER or FREE_STATEID call, and that requires being
> able to do an allocation.
>
> This was reported by Josh Stone here:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1089092
>
> My initial stab at fixing this involved moving this to a workqueue, but
> Trond pointed out the above change was technically a regression with the
> way the spinlocking in the file locking code works, and suggested an
> alternate approach to fixing it.
>
> This set focuses on moving most of the locks_release_private calls
> outside of the inode->i_lock. There are still a few that are done
> under the i_lock in the lease handling code. Cleaning up the use of
> the i_lock in the lease code is a larger project which we'll have to
> tackle at some point, but there are some other cleanups that will
> need to happen first.
>
> Absent any objections, I'll plan to merge these for 3.18.
>

Erm...make that v3.17...

As Trond points out, the fact that we end up doing sleeping allocations
under spinlock can allow an unprivileged user to crash a NFSv4 client.
So I may see about merging these sooner rather than later after they've
had a little soak time in linux-next.

--
Jeff Layton <[email protected]>

2014-08-13 02:09:16

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

Hi Jeff,

On Tue, 12 Aug 2014 19:47:36 -0400 Jeff Layton <[email protected]> wrote:
>
> On Wed, 13 Aug 2014 08:28:27 +1000
> Stephen Rothwell <[email protected]> wrote:
>
> > On Tue, 12 Aug 2014 10:48:08 -0400 Jeff Layton <[email protected]> wrote:
> > >
> > > Absent any objections, I'll plan to merge these for 3.18.
> >
> > This means that this patch set should *not* be in linux-next until after
> > (at least) v3.17-rc1 is released ... This s reinforced by the lack of
> > Acked-by, Reviewed-by and Tested-by tags ... (the addition of which would
> > presumably require the rebase (or rewrite) of a published git tree.)
>
> It would, but I sent a later reply that revised that statement.
>
> Trond pointed out that this problem can cause a user-triggerable oops,
> so we may have to merge it for 3.17 after all. With that in mind, I
> added these to my linux-next branch and will probably send a pull
> request before the window closes (assuming that there are no glaring
> problems with it).

OK, fine. I have merged it today in any case.

> While we're on the subject, we probably ought to rename my tree in your
> "Trees" file from "file-private-locks" to "file-locks" or something.
> File private locks (aka OFD locks) got merged in v3.15, but I have been
> collecting patches that touch fs/locks.c

OK, I will do that tomorrow.

--
Cheers,
Stephen Rothwell [email protected]


Attachments:
signature.asc (819.00 B)

2014-08-12 16:21:20

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

On Tue, 12 Aug 2014 08:32:29 -0700
Christoph Hellwig <[email protected]> wrote:

> Btw, I might be missing something here, but wouldn't it be better
> to reference count the file_lock structure and grab a reference to
> it where we currently call (__)locks_copy_lock?
>

It's not really possible with the way this code works at the moment.
The problem there is that struct file_lock can represent any of:

- a lock request (copied from userland into a kernel structure)
- a lock record (that lives on the i_flock list)
- a conflicting lock record (returned by GETLK-type request)

In many cases we call (__)locks_copy_lock to copy from one "use-type" of
struct file_lock to another, and doing that with refcounts makes that a
bit difficult to manage.

If I were designing this code from scratch, I'd have probably made each
use-type a distinct structure. Maybe we should do that anyway at
some point -- I'm not sure.

Now, all that said...I think we will end up needing to do some sort of
refcounting to fix the leasing code at some point. Currently,
->setlease operations can't block, which is a significant impediment to
adding leases to clustered filesystems and the like. It would be nice
to lift that limitation and that may require making leases be
refcounted (or maybe RCU-managed).

--
Jeff Layton <[email protected]>

2014-08-12 15:32:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

Btw, I might be missing something here, but wouldn't it be better
to reference count the file_lock structure and grab a reference to
it where we currently call (__)locks_copy_lock?


2014-08-12 14:48:26

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 5/5] locks: update Locking documentation to clarify fl_release_private behavior

Signed-off-by: Jeff Layton <[email protected]>
---
Documentation/filesystems/Locking | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index b18dd1779029..f1997e9da61f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -349,7 +349,11 @@ prototypes:
locking rules:
inode->i_lock may block
fl_copy_lock: yes no
-fl_release_private: maybe no
+fl_release_private: maybe maybe[1]
+
+[1]: ->fl_release_private for flock or POSIX locks is currently allowed
+to block. Leases however can still be freed while the i_lock is held and
+so fl_release_private called on a lease should not block.

----------------------- lock_manager_operations ---------------------------
prototypes:
--
1.9.3


2014-08-12 14:48:24

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 4/5] locks: move locks_free_lock calls in do_fcntl_add_lease outside spinlock

There's no need to call locks_free_lock here while still holding the
i_lock. Defer that until the lock has been dropped.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 4ce087cca501..cb66fb05ad4a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1761,13 +1761,10 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
ret = fl;
spin_lock(&inode->i_lock);
error = __vfs_setlease(filp, arg, &ret);
- if (error) {
- spin_unlock(&inode->i_lock);
- locks_free_lock(fl);
- goto out_free_fasync;
- }
- if (ret != fl)
- locks_free_lock(fl);
+ if (error)
+ goto out_unlock;
+ if (ret == fl)
+ fl = NULL;

/*
* fasync_insert_entry() returns the old entry if any.
@@ -1779,9 +1776,10 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
new = NULL;

error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
+out_unlock:
spin_unlock(&inode->i_lock);
-
-out_free_fasync:
+ if (fl)
+ locks_free_lock(fl);
if (new)
fasync_free(new);
return error;
--
1.9.3


2014-08-13 15:51:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

On Wed, Aug 13, 2014 at 08:28:27AM +1000, Stephen Rothwell wrote:
> This s reinforced by the lack of Acked-by, Reviewed-by and Tested-by
> tags ... (the addition of which would presumably require the rebase
> (or rewrite) of a published git tree.)

By the way, I reshuffled my branches recently so the one you pull has
incoming patches that I think are mature enough to be worth testing but
haven't finished review yet.

That was partly Jeff's request, as he wanted his patches to get some
exposure while waiting for review.

It also means I can fold in some minor fixes instead of piling up fixes
for nits found by the kbuild robot. (But maybe that unfairly denies it
some credit, I don't know.)

Anyway that means that branch is getting rewritten, say, weekly, as
opposed to never (or maybe for once-in-a-year level screwups).

Am I doing it wrong?

--b.

2014-08-12 22:28:35

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

Hi Jeff,

On Tue, 12 Aug 2014 10:48:08 -0400 Jeff Layton <[email protected]> wrote:
>
> Absent any objections, I'll plan to merge these for 3.18.

This means that this patch set should *not* be in linux-next until after
(at least) v3.17-rc1 is released ... This s reinforced by the lack of
Acked-by, Reviewed-by and Tested-by tags ... (the addition of which would presumably require the rebase (or rewrite) of a published git tree.)

/me suspects that many people do not read his daily release notes :-(
--
Cheers,
Stephen Rothwell [email protected]


Attachments:
signature.asc (819.00 B)

2014-08-12 17:53:05

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

On Tue, Aug 12, 2014 at 01:43:13PM -0400, Jeff Layton wrote:
> On Tue, 12 Aug 2014 10:48:08 -0400
> Jeff Layton <[email protected]> wrote:
>
> > In the days of yore, the file locking code was primarily protected by
> > the BKL. That changed in commit 72f98e72551fa (locks: turn lock_flocks
> > into a spinlock), at which point the code was changed to be protected
> > by a conventional spinlock (mostly due to a push to finally eliminate
> > the BKL). Since then, the code has been changed to use the i_lock
> > instead of a global spinlock, but it's still under a spinlock.
> >
> > With that change, several functions now no longer can block when they
> > originally could. This is a particular problem with the
> > fl_release_private operation. In NFSv4, that operation is used to kick
> > off a RELEASE_LOCKOWNER or FREE_STATEID call, and that requires being
> > able to do an allocation.
> >
> > This was reported by Josh Stone here:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1089092
> >
> > My initial stab at fixing this involved moving this to a workqueue, but
> > Trond pointed out the above change was technically a regression with the
> > way the spinlocking in the file locking code works, and suggested an
> > alternate approach to fixing it.
> >
> > This set focuses on moving most of the locks_release_private calls
> > outside of the inode->i_lock. There are still a few that are done
> > under the i_lock in the lease handling code. Cleaning up the use of
> > the i_lock in the lease code is a larger project which we'll have to
> > tackle at some point, but there are some other cleanups that will
> > need to happen first.
> >
> > Absent any objections, I'll plan to merge these for 3.18.
> >
>
> Erm...make that v3.17...
>
> As Trond points out, the fact that we end up doing sleeping allocations
> under spinlock can allow an unprivileged user to crash a NFSv4 client.
> So I may see about merging these sooner rather than later after they've
> had a little soak time in linux-next.

Looks reasonable to me--ACK on the set.

--b.

2014-08-12 14:48:20

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 1/5] locks: don't call locks_release_private from locks_copy_lock

All callers of locks_copy_lock pass in a brand new file_lock struct, so
there's no need to calls locks_release_private on it. Replace that with
a warning that fires in the event that we receive a target lock that
doesn't look like it's properly initialized.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/locks.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/locks.c b/fs/locks.c
index 356667a434c1..2c2d4f5022a7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -285,7 +285,8 @@ EXPORT_SYMBOL(__locks_copy_lock);

void locks_copy_lock(struct file_lock *new, struct file_lock *fl)
{
- locks_release_private(new);
+ /* "new" must be a freshly-initialized lock */
+ WARN_ON_ONCE(new->fl_ops);

__locks_copy_lock(new, fl);
new->fl_file = fl->fl_file;
--
1.9.3


2014-08-12 23:47:40

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 0/5] locks: move most locks_release_private calls outside of i_lock

On Wed, 13 Aug 2014 08:28:27 +1000
Stephen Rothwell <[email protected]> wrote:

> Hi Jeff,
>
> On Tue, 12 Aug 2014 10:48:08 -0400 Jeff Layton <[email protected]> wrote:
> >
> > Absent any objections, I'll plan to merge these for 3.18.
>
> This means that this patch set should *not* be in linux-next until after
> (at least) v3.17-rc1 is released ... This s reinforced by the lack of
> Acked-by, Reviewed-by and Tested-by tags ... (the addition of which would presumably require the rebase (or rewrite) of a published git tree.)
>

It would, but I sent a later reply that revised that statement.

Trond pointed out that this problem can cause a user-triggerable oops,
so we may have to merge it for 3.17 after all. With that in mind, I
added these to my linux-next branch and will probably send a pull
request before the window closes (assuming that there are no glaring
problems with it).

While we're on the subject, we probably ought to rename my tree in your
"Trees" file from "file-private-locks" to "file-locks" or something.
File private locks (aka OFD locks) got merged in v3.15, but I have been
collecting patches that touch fs/locks.c

> /me suspects that many people do not read his daily release notes :-(

Guilty as charged. I'll try to keep up in the future.

Thanks,
--
Jeff Layton <[email protected]>


Attachments:
signature.asc (819.00 B)