This patchset fixes the problem that Bruce pointed out last week when
we were discussing the lockd-kthread patches.
The main problem is described in patch #1 and that patch also fixes the
DoS. The remaining patches clean up how GRANT_MSG callbacks handle an
unresponsive client. The goal in those is to make sure that we don't
end up with a ton of duplicate RPC's in queue and that we try to handle
an invalidated block correctly.
Bruce, I'd like to see this fixed in 2.6.25 if at all possible.
Comments and suggestions are appreciated.
Signed-off-by: Jeff Layton <[email protected]>
On Wed, Feb 06, 2008 at 11:34:09AM -0500, Jeff Layton wrote:
> This patchset fixes the problem that Bruce pointed out last week when
> we were discussing the lockd-kthread patches.
>
> The main problem is described in patch #1 and that patch also fixes the
> DoS. The remaining patches clean up how GRANT_MSG callbacks handle an
> unresponsive client. The goal in those is to make sure that we don't
> end up with a ton of duplicate RPC's in queue and that we try to handle
> an invalidated block correctly.
>
> Bruce, I'd like to see this fixed in 2.6.25 if at all possible.
>
> Comments and suggestions are appreciated.
>
> Signed-off-by: Jeff Layton <[email protected]>
I gave them another quick skim, compiled, and did a connectathon run.
(As always, I need better regression testing.) But I'll submit those
tommorow assuming no one finds a problem.
--b.
Now that it no longer does an RPC ping, lockd always ends up queueing
an RPC task for the GRANT_MSG callback. But, it also requeues the block
for later attempts. Since these are hard RPC tasks, if the client we're
calling back goes unresponsive the GRANT_MSG callbacks can stack up in
the RPC queue.
Fix this by making server-side RPC clients default to soft RPC tasks.
lockd requeues the block anyway, so this should be OK.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/host.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index 00063ee..f1ef49f 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -243,11 +243,18 @@ nlm_bind_host(struct nlm_host *host)
.program = &nlm_program,
.version = host->h_version,
.authflavor = RPC_AUTH_UNIX,
- .flags = (RPC_CLNT_CREATE_HARDRTRY |
- RPC_CLNT_CREATE_NOPING |
+ .flags = (RPC_CLNT_CREATE_NOPING |
RPC_CLNT_CREATE_AUTOBIND),
};
+ /*
+ * lockd retries server side blocks automatically so we want
+ * those to be soft RPC calls. Client side calls need to be
+ * hard RPC tasks.
+ */
+ if (!host->h_server)
+ args.flags |= RPC_CLNT_CREATE_HARDRTRY;
+
clnt = rpc_create(&args);
if (!IS_ERR(clnt))
host->h_rpcclnt = clnt;
--
1.5.3.8
With the current scheme in nlmsvc_grant_blocked, we can end up with more
than one GRANT_MSG callback for a block in flight. Right now, we requeue
the block unconditionally so that a GRANT_MSG callback is done again in
30s. If the client is unresponsive, it can take more than 30s for the
call already in flight to time out.
There's no benefit to having more than one GRANT_MSG RPC queued up at a
time, so put it on the list with a timeout of NLM_NEVER before doing the
RPC call. If the RPC call submission fails, we requeue it with a short
timeout. If it works, then nlmsvc_grant_callback will end up requeueing
it with a shorter timeout after it completes.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svclock.c | 17 +++++++++++++----
1 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 2f4d8fa..82db7b3 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -763,11 +763,20 @@ callback:
dprintk("lockd: GRANTing blocked lock.\n");
block->b_granted = 1;
- /* Schedule next grant callback in 30 seconds */
- nlmsvc_insert_block(block, 30 * HZ);
+ /* keep block on the list, but don't reattempt until the RPC
+ * completes or the submission fails
+ */
+ nlmsvc_insert_block(block, NLM_NEVER);
- /* Call the client */
- nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops);
+ /* Call the client -- use a soft RPC task since nlmsvc_retry_blocked
+ * will queue up a new one if this one times out
+ */
+ error = nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG,
+ &nlmsvc_grant_ops);
+
+ /* RPC submission failed, wait a bit and retry */
+ if (error < 0)
+ nlmsvc_insert_block(block, 10 * HZ);
}
/*
--
1.5.3.8
It's currently possible for an unresponsive NLM client to completely
lock up a server's lockd. The scenario is something like this:
1) client1 (or a process on the server) takes a lock on a file
2) client2 tries to take a blocking lock on the same file and
awaits the callback
3) client2 goes unresponsive (plug pulled, network partition, etc)
4) client1 releases the lock
...at that point the server's lockd will try to queue up a GRANT_MSG
callback for client2, but first it requeues the block with a timeout of
30s. nlm_async_call will attempt to bind the RPC client to client2 and
will call rpc_ping. rpc_ping entails a sync RPC call and if client2 is
unresponsive it will take around 60s for that to time out. Once it times
out, it's already time to retry the block and the whole process repeats.
Once in this situation, nlmsvc_retry_blocked will never return until
the host starts responding again. lockd won't service new calls.
Fix this by skipping the RPC ping on NLM RPC clients. This makes
nlm_async_call return quickly when called.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/host.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index ca6b16f..00063ee 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -244,6 +244,7 @@ nlm_bind_host(struct nlm_host *host)
.version = host->h_version,
.authflavor = RPC_AUTH_UNIX,
.flags = (RPC_CLNT_CREATE_HARDRTRY |
+ RPC_CLNT_CREATE_NOPING |
RPC_CLNT_CREATE_AUTOBIND),
};
--
1.5.3.8
It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
is in flight. If this happens we don't want lockd to insert the block
back into the nlm_blocked list.
This helps that situation, but there's still a possible race. Fixing
that will mean adding real locking for nlm_blocked.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svclock.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 82db7b3..fe9bdb4 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data)
dprintk("lockd: GRANT_MSG RPC callback\n");
+ /* if the block is not on a list at this point then it has
+ * been invalidated. Don't try to requeue it.
+ *
+ * FIXME: it's possible that the block is removed from the list
+ * after this check but before the nlmsvc_insert_block. In that
+ * case it will be added back. Perhaps we need better locking
+ * for nlm_blocked?
+ */
+ if (list_empty(&block->b_list))
+ return;
+
/* Technically, we should down the file semaphore here. Since we
* move the block towards the head of the queue only, no harm
* can be done, though. */
--
1.5.3.8
On 2/6/08, Jeff Layton <[email protected]> wrote:
> It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
> is in flight. If this happens we don't want lockd to insert the block
> back into the nlm_blocked list.
>
> This helps that situation, but there's still a possible race. Fixing
> that will mean adding real locking for nlm_blocked.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/lockd/svclock.c | 11 +++++++++++
> 1 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> index 82db7b3..fe9bdb4 100644
> --- a/fs/lockd/svclock.c
> +++ b/fs/lockd/svclock.c
> @@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data)
>
> dprintk("lockd: GRANT_MSG RPC callback\n");
>
> + /* if the block is not on a list at this point then it has
> + * been invalidated. Don't try to requeue it.
> + *
> + * FIXME: it's possible that the block is removed from the list
> + * after this check but before the nlmsvc_insert_block. In that
> + * case it will be added back. Perhaps we need better locking
> + * for nlm_blocked?
> + */
> + if (list_empty(&block->b_list))
> + return;
> +
> /* Technically, we should down the file semaphore here. Since we
> * move the block towards the head of the queue only, no harm
> * can be done, though. */
>
Hi Jeff,
Would the following patch take care of this race? Two questions I have:
1) Does my patch address your FIXME? I believe it should.
2) Can I get rid of the BKL in nlmsvc_grant_deferred() given this
nlm_blocked serialization? It is unclear to me if
nlmsvc_grant_deferred() is taking the BKL purely to protect
nlm_blocked.
As an aside, I'm not convinced that the SIGKILL-only assertion you're
making about the block not being on the list is valid considering it
looks to me like there is a race in nlmsvc_lock().
The block isn't added to nlm_blocked until after there is an opening
for the FS to callback via fl_grant(). Whereby causing
nlmsvc_grant_callback() to not find the block on the list.
Mike
On Fri, 14 Mar 2008 15:55:38 -0400
"Mike Snitzer" <[email protected]> wrote:
> On 2/6/08, Jeff Layton <[email protected]> wrote:
> > It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
> > is in flight. If this happens we don't want lockd to insert the block
> > back into the nlm_blocked list.
> >
> > This helps that situation, but there's still a possible race. Fixing
> > that will mean adding real locking for nlm_blocked.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/lockd/svclock.c | 11 +++++++++++
> > 1 files changed, 11 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > index 82db7b3..fe9bdb4 100644
> > --- a/fs/lockd/svclock.c
> > +++ b/fs/lockd/svclock.c
> > @@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data)
> >
> > dprintk("lockd: GRANT_MSG RPC callback\n");
> >
> > + /* if the block is not on a list at this point then it has
> > + * been invalidated. Don't try to requeue it.
> > + *
> > + * FIXME: it's possible that the block is removed from the list
> > + * after this check but before the nlmsvc_insert_block. In that
> > + * case it will be added back. Perhaps we need better locking
> > + * for nlm_blocked?
> > + */
> > + if (list_empty(&block->b_list))
> > + return;
> > +
> > /* Technically, we should down the file semaphore here. Since we
> > * move the block towards the head of the queue only, no harm
> > * can be done, though. */
> >
>
> Hi Jeff,
>
> Would the following patch take care of this race? Two questions I have:
> 1) Does my patch address your FIXME? I believe it should.
I don't think so. I think you'd have to have nlmsvc_grant_callback()
take and hold your nlm_blocked_mutex over the whole function.
> 2) Can I get rid of the BKL in nlmsvc_grant_deferred() given this
> nlm_blocked serialization? It is unclear to me if
> nlmsvc_grant_deferred() is taking the BKL purely to protect
> nlm_blocked.
>
I'm not at all clear on why we have so much of this done under the BKL.
I'm not sure that anyone does. If you could figure it out and document
it, you'd be my hero. My understanding was that it had more to do with
VFS locking, but I could certainly be wrong on this.
Now that you mention it though, it does look like we do rpc_call_done
under the BKL. So you may be right that that's actually protecting
nlm_blocked here. My FIXME may be bogus.
The main user of nlm_blocked is lockd itself. I don't think anything
else generally touches that list. Since lockd is single-threaded by
design, locking the list may be unnecessary. The exception is the grant
callback here, which runs from rpciod. But if that's protected by the
BKL, it may not be a problem.
> As an aside, I'm not convinced that the SIGKILL-only assertion you're
> making about the block not being on the list is valid considering it
> looks to me like there is a race in nlmsvc_lock().
>
> The block isn't added to nlm_blocked until after there is an opening
> for the FS to callback via fl_grant(). Whereby causing
> nlmsvc_grant_callback() to not find the block on the list.
>
I'm not sure I see this race. nlmsvc_grant_callback() won't get called
until after nlmsvc_grant_blocked() has does the RPC call. That doesn't
happen until the block has been added to the list. Can you elaborate on
the race you're seeing?
--
Jeff Layton <[email protected]>
On Fri, 2008-03-14 at 15:55 -0400, Mike Snitzer wrote:
> On 2/6/08, Jeff Layton <[email protected]> wrote:
> > It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
> > is in flight. If this happens we don't want lockd to insert the block
> > back into the nlm_blocked list.
> >
> > This helps that situation, but there's still a possible race. Fixing
> > that will mean adding real locking for nlm_blocked.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/lockd/svclock.c | 11 +++++++++++
> > 1 files changed, 11 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > index 82db7b3..fe9bdb4 100644
> > --- a/fs/lockd/svclock.c
> > +++ b/fs/lockd/svclock.c
> > @@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data)
> >
> > dprintk("lockd: GRANT_MSG RPC callback\n");
> >
> > + /* if the block is not on a list at this point then it has
> > + * been invalidated. Don't try to requeue it.
> > + *
> > + * FIXME: it's possible that the block is removed from the list
> > + * after this check but before the nlmsvc_insert_block. In that
> > + * case it will be added back. Perhaps we need better locking
> > + * for nlm_blocked?
> > + */
> > + if (list_empty(&block->b_list))
> > + return;
> > +
> > /* Technically, we should down the file semaphore here. Since we
> > * move the block towards the head of the queue only, no harm
> > * can be done, though. */
> >
>
> Hi Jeff,
>
> Would the following patch take care of this race? Two questions I have:
> 1) Does my patch address your FIXME? I believe it should.
> 2) Can I get rid of the BKL in nlmsvc_grant_deferred() given this
> nlm_blocked serialization? It is unclear to me if
> nlmsvc_grant_deferred() is taking the BKL purely to protect
> nlm_blocked.
Ugh... NACK!
We already have the file->f_mutex, so there should be no need to add a
global semaphore on top of that.
Trond
On 3/14/08, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2008-03-14 at 15:55 -0400, Mike Snitzer wrote:
> > On 2/6/08, Jeff Layton <[email protected]> wrote:
> > > It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
> > > is in flight. If this happens we don't want lockd to insert the block
> > > back into the nlm_blocked list.
> > >
> > > This helps that situation, but there's still a possible race. Fixing
> > > that will mean adding real locking for nlm_blocked.
> > >
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > > fs/lockd/svclock.c | 11 +++++++++++
> > > 1 files changed, 11 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > > index 82db7b3..fe9bdb4 100644
> > > --- a/fs/lockd/svclock.c
> > > +++ b/fs/lockd/svclock.c
> > > @@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data)
> > >
> > > dprintk("lockd: GRANT_MSG RPC callback\n");
> > >
> > > + /* if the block is not on a list at this point then it has
> > > + * been invalidated. Don't try to requeue it.
> > > + *
> > > + * FIXME: it's possible that the block is removed from the list
> > > + * after this check but before the nlmsvc_insert_block. In that
> > > + * case it will be added back. Perhaps we need better locking
> > > + * for nlm_blocked?
> > > + */
> > > + if (list_empty(&block->b_list))
> > > + return;
> > > +
> > > /* Technically, we should down the file semaphore here. Since we
> > > * move the block towards the head of the queue only, no harm
> > > * can be done, though. */
> > >
> >
> > Hi Jeff,
> >
> > Would the following patch take care of this race? Two questions I have:
> > 1) Does my patch address your FIXME? I believe it should.
> > 2) Can I get rid of the BKL in nlmsvc_grant_deferred() given this
> > nlm_blocked serialization? It is unclear to me if
> > nlmsvc_grant_deferred() is taking the BKL purely to protect
> > nlm_blocked.
>
>
> Ugh... NACK!
>
> We already have the file->f_mutex, so there should be no need to add a
> global semaphore on top of that.
Um, nlm_blocked is global. file->f_mutex is local. What am I missing?
On Fri, 2008-03-14 at 17:21 -0400, Mike Snitzer wrote:
> On 3/14/08, Trond Myklebust <[email protected]> wrote:
> >
> > On Fri, 2008-03-14 at 15:55 -0400, Mike Snitzer wrote:
> > > On 2/6/08, Jeff Layton <[email protected]> wrote:
> > > > It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
> > > > is in flight. If this happens we don't want lockd to insert the block
> > > > back into the nlm_blocked list.
> > > >
> > > > This helps that situation, but there's still a possible race. Fixing
> > > > that will mean adding real locking for nlm_blocked.
> > > >
> > > > Signed-off-by: Jeff Layton <[email protected]>
> > > > ---
> > > > fs/lockd/svclock.c | 11 +++++++++++
> > > > 1 files changed, 11 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > > > index 82db7b3..fe9bdb4 100644
> > > > --- a/fs/lockd/svclock.c
> > > > +++ b/fs/lockd/svclock.c
> > > > @@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data)
> > > >
> > > > dprintk("lockd: GRANT_MSG RPC callback\n");
> > > >
> > > > + /* if the block is not on a list at this point then it has
> > > > + * been invalidated. Don't try to requeue it.
> > > > + *
> > > > + * FIXME: it's possible that the block is removed from the list
> > > > + * after this check but before the nlmsvc_insert_block. In that
> > > > + * case it will be added back. Perhaps we need better locking
> > > > + * for nlm_blocked?
> > > > + */
> > > > + if (list_empty(&block->b_list))
> > > > + return;
> > > > +
> > > > /* Technically, we should down the file semaphore here. Since we
> > > > * move the block towards the head of the queue only, no harm
> > > > * can be done, though. */
> > > >
> > >
> > > Hi Jeff,
> > >
> > > Would the following patch take care of this race? Two questions I have:
> > > 1) Does my patch address your FIXME? I believe it should.
> > > 2) Can I get rid of the BKL in nlmsvc_grant_deferred() given this
> > > nlm_blocked serialization? It is unclear to me if
> > > nlmsvc_grant_deferred() is taking the BKL purely to protect
> > > nlm_blocked.
> >
> >
> > Ugh... NACK!
> >
> > We already have the file->f_mutex, so there should be no need to add a
> > global semaphore on top of that.
>
> Um, nlm_blocked is global. file->f_mutex is local. What am I missing?
The _race_ occurs when the block is temporarily removed and replaced on
the list.
This should, AFAIK, only occur while holding the file->f_mutex.
Similarly, the GRANTED code should be holding the same lock when looking
up the block.
On Fri, 14 Mar 2008 17:29:24 -0400
Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2008-03-14 at 17:21 -0400, Mike Snitzer wrote:
> > On 3/14/08, Trond Myklebust <[email protected]> wrote:
> > >
> > > On Fri, 2008-03-14 at 15:55 -0400, Mike Snitzer wrote:
> > > > On 2/6/08, Jeff Layton <[email protected]> wrote:
> > > > > It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
> > > > > is in flight. If this happens we don't want lockd to insert the block
> > > > > back into the nlm_blocked list.
> > > > >
> > > > > This helps that situation, but there's still a possible race. Fixing
> > > > > that will mean adding real locking for nlm_blocked.
> > > > >
> > > > > Signed-off-by: Jeff Layton <[email protected]>
> > > > > ---
> > > > > fs/lockd/svclock.c | 11 +++++++++++
> > > > > 1 files changed, 11 insertions(+), 0 deletions(-)
> > > > >
> > > > > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > > > > index 82db7b3..fe9bdb4 100644
> > > > > --- a/fs/lockd/svclock.c
> > > > > +++ b/fs/lockd/svclock.c
> > > > > @@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data)
> > > > >
> > > > > dprintk("lockd: GRANT_MSG RPC callback\n");
> > > > >
> > > > > + /* if the block is not on a list at this point then it has
> > > > > + * been invalidated. Don't try to requeue it.
> > > > > + *
> > > > > + * FIXME: it's possible that the block is removed from the list
> > > > > + * after this check but before the nlmsvc_insert_block. In that
> > > > > + * case it will be added back. Perhaps we need better locking
> > > > > + * for nlm_blocked?
> > > > > + */
> > > > > + if (list_empty(&block->b_list))
> > > > > + return;
> > > > > +
> > > > > /* Technically, we should down the file semaphore here. Since we
> > > > > * move the block towards the head of the queue only, no harm
> > > > > * can be done, though. */
> > > > >
> > > >
> > > > Hi Jeff,
> > > >
> > > > Would the following patch take care of this race? Two questions I have:
> > > > 1) Does my patch address your FIXME? I believe it should.
> > > > 2) Can I get rid of the BKL in nlmsvc_grant_deferred() given this
> > > > nlm_blocked serialization? It is unclear to me if
> > > > nlmsvc_grant_deferred() is taking the BKL purely to protect
> > > > nlm_blocked.
> > >
> > >
> > > Ugh... NACK!
> > >
> > > We already have the file->f_mutex, so there should be no need to add a
> > > global semaphore on top of that.
> >
> > Um, nlm_blocked is global. file->f_mutex is local. What am I missing?
>
> The _race_ occurs when the block is temporarily removed and replaced on
> the list.
>
The race I had in mind was:
rpciod: nlmsvc_grant_callback checks to see if the block is on the list
lockd: lockd() removes the block from the list after getting a SIGKILL
rpciod: nlmsvc_grant_callback requeues the block with a new timeout
...though now that I've looked, it seems like the BKL should prevent
that particular race.
> This should, AFAIK, only occur while holding the file->f_mutex.
> Similarly, the GRANTED code should be holding the same lock when looking
> up the block.
>
It's not though, is it? I don't think nlmsvc_grant_callback is run
under the file->f_mutex (unless I'm missing something).
--
Jeff Layton <[email protected]>
On 3/14/08, Jeff Layton <[email protected]> wrote:
> On Fri, 14 Mar 2008 15:55:38 -0400
> "Mike Snitzer" <[email protected]> wrote:
>
> > On 2/6/08, Jeff Layton <[email protected]> wrote:
> > > It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
> > > is in flight. If this happens we don't want lockd to insert the block
> > > back into the nlm_blocked list.
> > >
> > > This helps that situation, but there's still a possible race. Fixing
> > > that will mean adding real locking for nlm_blocked.
> > >
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > > fs/lockd/svclock.c | 11 +++++++++++
> > > 1 files changed, 11 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > > index 82db7b3..fe9bdb4 100644
> > > --- a/fs/lockd/svclock.c
> > > +++ b/fs/lockd/svclock.c
> > > @@ -795,6 +795,17 @@ static void nlmsvc_grant_callback(struct rpc_task *task, void *data)
> > >
> > > dprintk("lockd: GRANT_MSG RPC callback\n");
> > >
> > > + /* if the block is not on a list at this point then it has
> > > + * been invalidated. Don't try to requeue it.
> > > + *
> > > + * FIXME: it's possible that the block is removed from the list
> > > + * after this check but before the nlmsvc_insert_block. In that
> > > + * case it will be added back. Perhaps we need better locking
> > > + * for nlm_blocked?
> > > + */
> > > + if (list_empty(&block->b_list))
> > > + return;
> > > +
> > > /* Technically, we should down the file semaphore here. Since we
> > > * move the block towards the head of the queue only, no harm
> > > * can be done, though. */
> > >
> >
> > Hi Jeff,
> >
> > Would the following patch take care of this race? Two questions I have:
> > 1) Does my patch address your FIXME? I believe it should.
>
>
> I don't think so. I think you'd have to have nlmsvc_grant_callback()
> take and hold your nlm_blocked_mutex over the whole function.
OK.
> > 2) Can I get rid of the BKL in nlmsvc_grant_deferred() given this
> > nlm_blocked serialization? It is unclear to me if
> > nlmsvc_grant_deferred() is taking the BKL purely to protect
> > nlm_blocked.
> >
>
>
> I'm not at all clear on why we have so much of this done under the BKL.
> I'm not sure that anyone does. If you could figure it out and document
> it, you'd be my hero. My understanding was that it had more to do with
> VFS locking, but I could certainly be wrong on this.
OK, yes.. I was hoping someone else could be my hero ;)
> Now that you mention it though, it does look like we do rpc_call_done
> under the BKL. So you may be right that that's actually protecting
> nlm_blocked here. My FIXME may be bogus.
>
> The main user of nlm_blocked is lockd itself. I don't think anything
> else generally touches that list. Since lockd is single-threaded by
> design, locking the list may be unnecessary. The exception is the grant
> callback here, which runs from rpciod. But if that's protected by the
> BKL, it may not be a problem.
Correct, nlm_blocked is local to svclock.c
I'm concerned precisely with the ->fl_grant callback and it's access
of nlm_blocked.
> > As an aside, I'm not convinced that the SIGKILL-only assertion you're
> > making about the block not being on the list is valid considering it
> > looks to me like there is a race in nlmsvc_lock().
> >
> > The block isn't added to nlm_blocked until after there is an opening
> > for the FS to callback via fl_grant(). Whereby causing
> > nlmsvc_grant_callback() to not find the block on the list.
> >
>
>
> I'm not sure I see this race. nlmsvc_grant_callback() won't get called
> until after nlmsvc_grant_blocked() has does the RPC call. That doesn't
> happen until the block has been added to the list. Can you elaborate on
> the race you're seeing?
I'm sorry I meant nlmsvc_grant_deferred() (aka ->fl_grant) not
nlmsvc_grant_blocked().
The race appears to be that the BKL is not always held on entry into
nlmsvc_lock() and the call to vfs_lock_file() is made. If
vfs_lock_file() returns -EINPROGRESS for a F_SETLK request the block
will get added to nlm_blocked.
But there is a distinct possibility that ->fl_grant() can race against
vfs_lock_file() returning and nlmsvc_lock() adding the block to
nlm_blocked.
This would be a non-issue if both nlmsvc_lock() and
nlmsvc_grant_deferred() are _always_ called with the BKL. But again,
it seems nlmsvc_lock() didn't reliably have the BKL.
Mike
On Fri, 2008-03-14 at 17:39 -0400, Jeff Layton wrote:
> The race I had in mind was:
>
> rpciod: nlmsvc_grant_callback checks to see if the block is on the list
> lockd: lockd() removes the block from the list after getting a SIGKILL
> rpciod: nlmsvc_grant_callback requeues the block with a new timeout
>
> ...though now that I've looked, it seems like the BKL should prevent
> that particular race.
In any case, this can be fixed very simply: add a new flag B_INVALIDATED
to nlm_block->b_flags which tells nlmsvc_grant_callback() that the block
has been invalidated, and must not be requeued. Then remove the test for
block->b_list...
In fact, it is quite arguable that we should probably have some form of
locking scheme to prevent GRANTED and CANCEL requests from colliding. We
might therefore also want to add a B_LOCKED flag, in order to tell
incoming CANCEL requests that they should be deferred until the GRANTED
call has completed. That would be a separate issue/patch, though.
> > This should, AFAIK, only occur while holding the file->f_mutex.
> > Similarly, the GRANTED code should be holding the same lock when looking
> > up the block.
> >
>
> It's not though, is it? I don't think nlmsvc_grant_callback is run
> under the file->f_mutex (unless I'm missing something).
Correct, but nlmsvc_grant_callback isn't actually trying to look up the
block, so my argument is invalid.
Trond