LinuxLists.cc - Race condition in xprt

2004-04-02 08:32:39

Subject: Race condition in xprt_disconnect

Hi,

it seems bad things can happen if two RPC processes call xprt_disconnect
at the same time. I have two bug reports of ppc machines oopsing
somewhere deep in some SELinux functions because inode->i_security
is NULL. inode->i_security is allocated when the inode is created,
and cleared when the inode is destroyed.

The stack looks something like
xprt_disconnect->sock_release->...->selinux_foofah

The only way I can see this happening is if sock_release is called twice,
and indeed we do not seem to protect against this. Please take a look
at the attached patch. It should prevent these oopses, but I'm not
entirely sure it's safe.

Thanks,
Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+

Attachments:

(No filename) (737.00 B)
sunrpc-disconnect-race (2.78 kB)
Download all attachments

2004-04-05 20:39:38

by Trond Myklebust

[permalink] [raw]

Subject: RE: Race condition in xprt_disconnect

it seems bad things can happen if two RPC processes call
xprt_disconnect at the same time. I have two bug reports of ppc
machines oopsing somewhere deep in some SELinux functions
because inode->i_security is NULL. inode->i_security is
allocated when the inode is created, and cleared when the inode
is destroyed.

The stack looks something like
xprt_disconnect->sock_release->...->selinux_foofah

The only way I can see this happening is if sock_release is
called twice, and indeed we do not seem to protect against this.
Please take a look at the attached patch. It should prevent
these oopses, but I'm not entirely sure it's safe.

I initially thought that patch you sent out was correct, however the
more I think about it, the more I'm convinced it isn't sufficient.

The only way I see that we can have duplicate calls to sock_release() is
if we are scheduling xprt_socket_autoclose() or xprt_socket_connect()
more than once.
Normally xprt_lock_write() should protect us against this, and so AFAICS
any duplicates must imply that xprt_lock_write() is getting lost before
xprt_socket_connect() has had a chance to run. Presumably this is
occurring because xprt->snd_task is timing out and then being woken up
from xprt->pending.
Note that in that case we don't actually *want* to schedule a new
connect(), since the problem is not that the networking operation timed
out, but rather that keventd is being too slow...

The following patch should hopefully protect against this problem.

Cheers,
Trond

Attachments:

linux-2.6.5-34-sunrpc_disconnect.dif (4.15 kB)

2004-04-06 09:24:04

by Olaf Kirch

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

Hi Trond,

On Mon, Apr 05, 2004 at 04:39:30PM -0400, Trond Myklebust wrote:
> The only way I see that we can have duplicate calls to sock_release() is
> if we are scheduling xprt_socket_autoclose() or xprt_socket_connect()
> more than once.
> Normally xprt_lock_write() should protect us against this, and so AFAICS
> any duplicates must imply that xprt_lock_write() is getting lost before
> xprt_socket_connect() has had a chance to run.

Yes, that's what it looks like. I had another oops now happening
in some selinux function called from tcp_connect. This looks like
two processes in xprt_socket_connect; the first having just opened
the socket and calling connect, and the second one having released
the socket inbetween.

So yes, you're right. My patch is probably not sufficient.

> Presumably this is
> occurring because xprt->snd_task is timing out and then being woken up
> from xprt->pending.

I'm not sure it's a timeout, actually. Looking at the syslog, the NFS
Server closes the connection, and the oops happens one second later.

I've been looking around the code a little more, and here's a
different theory.

- server drops connection
- tcp_state_change is called and wakes up all tasks with ENOTCONN
- rpciod schedules all tasks. The first one gets the write lock,
and schedules the xprt_socket_connect worker. Goes to
sleep on xprt->pending.
- xprt_socket_connect runs and calls xprt_close->xprt_disconnect
which wakes up all tasks with ENOTCONN. While xprt_socket_connect
is calling various network functions to create a socket, bind
and connect it, the following happens on CPU B:
- rpciod wakes up, schedules the task that was waiting for the
connect. Oops, we already have the send lock (we're snd_task),
so go ahead and try to reconnect. Thus we schedule
xprt_socket_connect a second time, which gets run. Note that
the workqueue stuff resets the pending bit before calling
xprt_socket_connect. First thing xprt_socket_connect does
is close xprt->sock.
- Back on CPU A, we find the socket we're just trying to
connect is gone. Oops.

My patch proposed earlier to change xprt_close to not wake up all tasks
when called from xprt_socket_connect should also prevent this race. The
general case of snd_task waking up before the worker thread has run is
not covered by this, though. But that should only happen due to timeout,
and a 60 second timeout should be sufficient for keventd even on a slow
day.

> Note that in that case we don't actually *want* to schedule a new
> connect(), since the problem is not that the networking operation timed
> out, but rather that keventd is being too slow...

The additional XPRT_PENDING bit may help. It should be sufficient to
set it in xprt_connect and clear it in xprt_connect_status, I think.

Alternatively, it would also help if we canceled the pending work request
in __xprt_release_write. Unfortunately the workqueue api doesn't seem
to support cancellation. Alternatively we could just call
flush_pending_work.

Et ceterum censeo: sunrpc needs a rewrite :)
I'd like to spend some time on it this summer...

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-06 13:52:03

by Trond Myklebust

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

On Tue, 2004-04-06 at 05:24, Olaf Kirch wrote:
> Hi Trond,
> Et ceterum censeo: sunrpc needs a rewrite :)
> I'd like to spend some time on it this summer...

I've already got some ideas for some short-term changes...

Here's a couple of patches I've been playing around with lately. The
idea is to make rpciod a little more generic, *and* to make it easier to
migrate rpc_tasks onto other threads when we have a need to perform
operations that might otherwise deadlock rpciod.

Cheers,
Trond

Attachments:

linux-2.6.5-36-rpc_workqueue.dif (29.04 kB)
linux-2.6.5-37-rpc_queue_lock.dif (8.17 kB)
Download all attachments

2004-04-06 14:11:30

by Trond Myklebust

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

On Tue, 2004-04-06 at 05:24, Olaf Kirch wrote:
> > Presumably this is
> > occurring because xprt->snd_task is timing out and then being woken up
> > from xprt->pending.
>
> I'm not sure it's a timeout, actually. Looking at the syslog, the NFS
> Server closes the connection, and the oops happens one second later.
>
> I've been looking around the code a little more, and here's a
> different theory.
>
> - server drops connection
> - tcp_state_change is called and wakes up all tasks with ENOTCONN
> - rpciod schedules all tasks. The first one gets the write lock,
> and schedules the xprt_socket_connect worker. Goes to
> sleep on xprt->pending.
> - xprt_socket_connect runs and calls xprt_close->xprt_disconnect
> which wakes up all tasks with ENOTCONN. While xprt_socket_connect
> is calling various network functions to create a socket, bind
> and connect it, the following happens on CPU B:
> - rpciod wakes up, schedules the task that was waiting for the
> connect. Oops, we already have the send lock (we're snd_task),
> so go ahead and try to reconnect. Thus we schedule
> xprt_socket_connect a second time, which gets run. Note that
> the workqueue stuff resets the pending bit before calling
> xprt_socket_connect. First thing xprt_socket_connect does
> is close xprt->sock.
> - Back on CPU A, we find the socket we're just trying to
> connect is gone. Oops.
>
> My patch proposed earlier to change xprt_close to not wake up all tasks
> when called from xprt_socket_connect should also prevent this race. The
> general case of snd_task waking up before the worker thread has run is
> not covered by this, though. But that should only happen due to timeout,
> and a 60 second timeout should be sufficient for keventd even on a slow
> day.

Can't tcp_state_change() inject a few more wakeups when we call
xprt_close()?

Cheers,
Trond

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-06 14:30:35

by Olaf Kirch

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

On Tue, Apr 06, 2004 at 10:11:29AM -0400, Trond Myklebust wrote:
> Can't tcp_state_change() inject a few more wakeups when we call
> xprt_close()?

I don't think I understand. tcp_state_change can't call xprt_close.

Or are you referring to your alternative patch that simply removed
the xprt_disconnect() from xprt_close? Yes, that would cure both
issues, too.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-06 14:36:12

by Trond Myklebust

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

On Tue, 2004-04-06 at 10:26, Olaf Kirch wrote:
> On Tue, Apr 06, 2004 at 10:11:29AM -0400, Trond Myklebust wrote:
> > Can't tcp_state_change() inject a few more wakeups when we call
> > xprt_close()?
>
> I don't think I understand. tcp_state_change can't call xprt_close.

No, but it does currently call xprt_disconnect().

I'm trying to work out whether or not the socket can change state again
after the server sends the FIN (or whatever it is it sends), but before
xprt_sock_connect()/keventd swaps out sk->state_change in its call to
xprt_close().

Cheers,
Trond

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-06 14:54:22

by Trond Myklebust

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

On Tue, 2004-04-06 at 10:36, Trond Myklebust wrote:
> I'm trying to work out whether or not the socket can change state again
> after the server sends the FIN (or whatever it is it sends), but before
> xprt_sock_connect()/keventd swaps out sk->state_change in its call to
> xprt_close().

I don't think we have to care... Here's a slight variation on the old
patch that should take care of that scenario too:

The policy should be that once tcp_state_change() clears the "connected"
bit, and has woken up those requests that were waiting on a reply, it
should just stay out of the way. We do not want it to interfere with the
reconnection process (the task timeouts etc should suffice for error
handling). This patch should ensure that is the case.

Cheers,
Trond

Attachments:

linux-2.6.5-34-sunrpc_disconnect.dif (4.70 kB)

2004-04-06 15:23:06

by Olaf Kirch

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

On Tue, Apr 06, 2004 at 10:54:15AM -0400, Trond Myklebust wrote:
> The policy should be that once tcp_state_change() clears the "connected"
> bit, and has woken up those requests that were waiting on a reply, it
> should just stay out of the way. We do not want it to interfere with the
> reconnection process (the task timeouts etc should suffice for error
> handling). This patch should ensure that is the case.

Are you sure this helps? Your patch doesn't remove the xprt_disconnect
call from xprt_close. This way, xprt_socket_connect still wakes
up all tasks with ENOTCONN, triggering the connect worker to be
scheduled a second time.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-06 15:27:59

by Brent M. Clements

[permalink] [raw]

Subject: Mounting any other os besides RHEW 3 doesn't work

We are stuck on a problem that no one seems to be able to answer, even
RedHat.

We noticed this problem when we tried deploying RHEW 3 Update 1 in a
production environment.

The problem can be reproduced in the following test environment:

1 HPUX 11.22 NFS Server
1 RHAW 2.1 IA64 System
2 RHEW 3 Update 1 IA64 Systems.

All systems are setup with no firewall settings.

Here are the scenarios:

Scenario 1:

The RHAW 2.1 system can properly mount and access the HPUX NFS system

Scenario 2:

The RHEW 3 system can properly mount the HPUX system but when you try to
do any access on the mounted nfs share, it hangs and the messages file on
the client state that the NFS server doesn't. We have noticed that the
client DOES get the nfs filehandle.

Scenario 3:

The RHEW 3 System can properly mount the RHAW 2.1(setup as an nfs server)
but when you try to do any access on the mounted nfs share, it hangs and
the messages file on the client states that the NFS server doesn't
respond. We have noticed that the client DOES get the nfs filehandle.

Scenario 4:
The RHEW 3 system can properly mount and access the other RHEW 3(setup as
an nfs server)

This is extremely wierd. it appears that the only thing that a RHEW 3
update 1 system can mount is another RHEW 3 update 1 system.

We have looked it over and every time we mount other types of nfs servers,
we receive a file handle but it looks like it fails in the data
connection.

Any help would be appreciated. We have submitted a bug report to RedHat
concerning this, but have not received any updates in a long time.

Thanks,

Brent Clements
Linux Technology Specialist
Information Technology
Rice University

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-06 16:02:42

by Trond Myklebust

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

On Tue, 2004-04-06 at 11:23, Olaf Kirch wrote:

> Are you sure this helps? Your patch doesn't remove the xprt_disconnect
> call from xprt_close. This way, xprt_socket_connect still wakes
> up all tasks with ENOTCONN, triggering the connect worker to be
> scheduled a second time.

Oh, but I already have the xprt_disconnect() removal in a separate patch
which is applied *before* this one (see the patch I published in
response to yours last week).

Cheers,
Trond

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-06 16:17:39

by Olaf Kirch

[permalink] [raw]

Subject: Re: RE: Race condition in xprt_disconnect

On Tue, Apr 06, 2004 at 12:02:38PM -0400, Trond Myklebust wrote:
> Oh, but I already have the xprt_disconnect() removal in a separate patch
> which is applied *before* this one (see the patch I published in
> response to yours last week).

Ah, alright. I wasn't aware of that. Then you probably don't even
need to XPRT_CONNECTING bit at all. It makes things a little more
robust though.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-06 20:06:07

by Steve Dickson

[permalink] [raw]

Subject: Re: Mounting any other os besides RHEW 3 doesn't work

Brent M. Clements wrote:

>We are stuck on a problem that no one seems to be able to answer, even
>RedHat.
>
This is the first I've heard of this.... what's the bugzilla number
associated
with these issues?

SteveD.

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs