2008-06-10 19:58:35

by Trond Myklebust

[permalink] [raw]
Subject: Re: [2.6.26-rc4] mount.nfsv4/memory poisoning issues...

On Tue, 2008-06-10 at 15:13 -0400, Jeff Layton wrote:

> I think you're basically correct, but it looks to me like the
> nfs_callback_mutex actually protects nfs_callback_info.task as well.
>
> If we're starting the thread, then we can't call kthread_stop on it
> until we release the mutex. So the thread can't exit until we release
> the mutex, and we can be guaranteed that this:
>
> nfs_callback_info.task = NULL;
>
> ...can't happen until after kthread_run returns and nfs_callback_up
> sets it.
>
> If that's right, then maybe this (untested, RFC only) patch would make sense?

Hmm... I suppose that is correct, but what if nfs_alloc_client() does

nfs_callback_up();
<kstrdup() fails>
nfs_callback_down();

AFAICS, if nfs_callback_down() gets called before the kthread() function
gets scheduled back in, then you can get left with a value of
nfs_callback_info.task != NULL, since nfs_callback_svc() will never be
called.

Wouldn't it therefore make more sense to clear nfs_callback_info.task in
nfs_callback_down()?

Cheers
Trond



2008-06-10 20:14:10

by Jeff Layton

[permalink] [raw]
Subject: Re: [2.6.26-rc4] mount.nfsv4/memory poisoning issues...

On Tue, 10 Jun 2008 15:58:29 -0400
Trond Myklebust <[email protected]> wrote:

> On Tue, 2008-06-10 at 15:13 -0400, Jeff Layton wrote:
>
> > I think you're basically correct, but it looks to me like the
> > nfs_callback_mutex actually protects nfs_callback_info.task as well.
> >
> > If we're starting the thread, then we can't call kthread_stop on it
> > until we release the mutex. So the thread can't exit until we release
> > the mutex, and we can be guaranteed that this:
> >
> > nfs_callback_info.task = NULL;
> >
> > ...can't happen until after kthread_run returns and nfs_callback_up
> > sets it.
> >
> > If that's right, then maybe this (untested, RFC only) patch would make sense?
>
> Hmm... I suppose that is correct, but what if nfs_alloc_client() does
>
> nfs_callback_up();
> <kstrdup() fails>
> nfs_callback_down();
>
> AFAICS, if nfs_callback_down() gets called before the kthread() function
> gets scheduled back in, then you can get left with a value of
> nfs_callback_info.task != NULL, since nfs_callback_svc() will never be
> called.
>

I don't see this race.

We can't call nfs_callback_down() until after nfs_callback_up()
returns, so we're guaranteed to have "task" set to a valid task
(presuming that nfs_callback_up() doesn't return error). We also can't
return from nfs_callback_down() until after the nfs_callback_svc() has
exited. kthread_stop() will block until it does.

> Wouldn't it therefore make more sense to clear nfs_callback_info.task in
> nfs_callback_down()?
>

I suppose that makes just as much sense. It also seems more symmetrical
given that we also set the var in nfs_callback_up(). I'll roll that into
the BKL removal patch, and give it some testing. Look for it in a day
or two...

Thanks,
--
Jeff Layton <[email protected]>

2008-06-10 20:33:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: [2.6.26-rc4] mount.nfsv4/memory poisoning issues...

On Tue, 2008-06-10 at 16:13 -0400, Jeff Layton wrote:

> We can't call nfs_callback_down() until after nfs_callback_up()
> returns, so we're guaranteed to have "task" set to a valid task
> (presuming that nfs_callback_up() doesn't return error). We also can't
> return from nfs_callback_down() until after the nfs_callback_svc() has
> exited. kthread_stop() will block until it does.

The code I'm alluding to is in kthread():

/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_UNINTERRUPTIBLE);
complete(&create->started);
schedule();

if (!kthread_should_stop())
ret = threadfn(data);

schedule() is called _after_ the complete() call, and _before_ we
execute threadfn() a.k.a. nfs_callback_svc(). If nfs_alloc_client() has
time to call nfs_callback_down() before the above thread gets scheduled
back in, then threadfn() doesn't get called at all, since
kthread_should_stop() is true.

Cheers
Trond

2008-06-10 20:41:47

by Jeff Layton

[permalink] [raw]
Subject: Re: [2.6.26-rc4] mount.nfsv4/memory poisoning issues...

On Tue, 10 Jun 2008 16:33:32 -0400
Trond Myklebust <[email protected]> wrote:

> On Tue, 2008-06-10 at 16:13 -0400, Jeff Layton wrote:
>
> > We can't call nfs_callback_down() until after nfs_callback_up()
> > returns, so we're guaranteed to have "task" set to a valid task
> > (presuming that nfs_callback_up() doesn't return error). We also can't
> > return from nfs_callback_down() until after the nfs_callback_svc() has
> > exited. kthread_stop() will block until it does.
>
> The code I'm alluding to is in kthread():
>
> /* OK, tell user we're spawned, wait for stop or wakeup */
> __set_current_state(TASK_UNINTERRUPTIBLE);
> complete(&create->started);
> schedule();
>
> if (!kthread_should_stop())
> ret = threadfn(data);
>
> schedule() is called _after_ the complete() call, and _before_ we
> execute threadfn() a.k.a. nfs_callback_svc(). If nfs_alloc_client() has
> time to call nfs_callback_down() before the above thread gets scheduled
> back in, then threadfn() doesn't get called at all, since
> kthread_should_stop() is true.
>

That makes total sense. Thanks for clearing it up.

Cheers,
--
Jeff Layton <[email protected]>

2008-06-10 21:01:54

by Jeff Layton

[permalink] [raw]
Subject: Re: [2.6.26-rc4] mount.nfsv4/memory poisoning issues...

On Tue, 10 Jun 2008 16:33:32 -0400
Trond Myklebust <[email protected]> wrote:

> On Tue, 2008-06-10 at 16:13 -0400, Jeff Layton wrote:
>
> > We can't call nfs_callback_down() until after nfs_callback_up()
> > returns, so we're guaranteed to have "task" set to a valid task
> > (presuming that nfs_callback_up() doesn't return error). We also can't
> > return from nfs_callback_down() until after the nfs_callback_svc() has
> > exited. kthread_stop() will block until it does.
>
> The code I'm alluding to is in kthread():
>
> /* OK, tell user we're spawned, wait for stop or wakeup */
> __set_current_state(TASK_UNINTERRUPTIBLE);
> complete(&create->started);
> schedule();
>
> if (!kthread_should_stop())
> ret = threadfn(data);
>
> schedule() is called _after_ the complete() call, and _before_ we
> execute threadfn() a.k.a. nfs_callback_svc(). If nfs_alloc_client() has
> time to call nfs_callback_down() before the above thread gets scheduled
> back in, then threadfn() doesn't get called at all, since
> kthread_should_stop() is true.
>

Hmm...this is a bigger problem than it looked at first glance. lockd
also has a similar problem...ditto for the new nfsd code wrt module
refcounts.

In practice, I think the thread generally runs immediately (at least
with current scheduler behavior), so we're probably not terribly
vulnerable to this race. Still, we shouldn't rely on that...

For lockd and the nfs4 callback thread, we'll also need to deal with
the fact that svc_exit_thread() doesn't get called if this happens. So
we'll need to call svc_exit_thread from the *_down() functions too
(I presume that's OK).

nfsd is a bigger problem since it exits on a signal. For that, perhaps
we should declare a completion variable and have svc_set_num_threads()
wait until nfsd() has actually run before continuing.

Thoughts?

--
Jeff Layton <[email protected]>

2008-06-10 21:37:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [2.6.26-rc4] mount.nfsv4/memory poisoning issues...

On Tue, 2008-06-10 at 17:01 -0400, Jeff Layton wrote:

> In practice, I think the thread generally runs immediately (at least
> with current scheduler behavior), so we're probably not terribly
> vulnerable to this race. Still, we shouldn't rely on that...

In the code I showed you, the 'kthread' task is put to sleep, then
kthread_run() calls wake_up_process() on it, but the current task isn't
scheduled out. Rather, it continues to run, so in almost all UP cases,
the race is not only possible, it is actually rather likely if
nfs_alloc_client() fails.

> For lockd and the nfs4 callback thread, we'll also need to deal with
> the fact that svc_exit_thread() doesn't get called if this happens. So
> we'll need to call svc_exit_thread from the *_down() functions too
> (I presume that's OK).

These *_up()/*_down() functions are getting very complex. Any chance we
could hide some of this complexity in some helpers? Looking at the NFSv4
callback code and lockd, it seems that there might be a couple of
opportunities for merging code.

> nfsd is a bigger problem since it exits on a signal. For that, perhaps
> we should declare a completion variable and have svc_set_num_threads()
> wait until nfsd() has actually run before continuing.

nfsd doesn't use kthreads, does it?

2008-06-10 22:04:17

by Jeff Layton

[permalink] [raw]
Subject: Re: [2.6.26-rc4] mount.nfsv4/memory poisoning issues...

On Tue, 10 Jun 2008 17:37:56 -0400
Trond Myklebust <[email protected]> wrote:

> On Tue, 2008-06-10 at 17:01 -0400, Jeff Layton wrote:
>
> > In practice, I think the thread generally runs immediately (at least
> > with current scheduler behavior), so we're probably not terribly
> > vulnerable to this race. Still, we shouldn't rely on that...
>
> In the code I showed you, the 'kthread' task is put to sleep, then
> kthread_run() calls wake_up_process() on it, but the current task isn't
> scheduled out. Rather, it continues to run, so in almost all UP cases,
> the race is not only possible, it is actually rather likely if
> nfs_alloc_client() fails.
>

Ok, makes sense. It looks like with the kernel_thread interface, that
the function always runs, so we're not vulnerable to this race with
that.

> > For lockd and the nfs4 callback thread, we'll also need to deal with
> > the fact that svc_exit_thread() doesn't get called if this happens. So
> > we'll need to call svc_exit_thread from the *_down() functions too
> > (I presume that's OK).
>
> These *_up()/*_down() functions are getting very complex. Any chance we
> could hide some of this complexity in some helpers? Looking at the NFSv4
> callback code and lockd, it seems that there might be a couple of
> opportunities for merging code.
>

Possibly. I may start by just closing the races and then look and see what
can be consolidated.

> > nfsd is a bigger problem since it exits on a signal. For that, perhaps
> > we should declare a completion variable and have svc_set_num_threads()
> > wait until nfsd() has actually run before continuing.
>
> nfsd doesn't use kthreads, does it?
>

Bruce just took a patchset to make it use kthreads too. I'm thinking
2.6.27 for that, presuming that we get all of these issues worked out.

I think the only thing we can reasonably do there is make sure that
the thread actually starts before allowing svc_set_num_threads to
continue.

--
Jeff Layton <[email protected]>