Hi All,
Every once in a while, we see complaints that nfs mounts are failing due
to there being no more reserved ports available for outbound rpc
communication. This often happens when using TCP transports, because all
outbound connections that are closed go into a TIME_WAIT state.
Outbound RPC calls happen in both userspace and kernelspace. The
following patches only attempt to fix the problem for userspace.
For various reasons, avoiding the TIME_WAIT state using connect(fd,
{AF_UNSPEC}, ...) or doing SO_REUSEADDR may not be the safest way to
handle things.
An alternative solution that I was kicking around last week at
connectathon is to have an RPC proxy server, that multiplexes outbound
connections and caches connections for a while. The following two patches
to util-linux implement this idea. (Note: this is against my apt-get
source util-linux tree).
The first patch implements two pieces: the client and the proxy itself.
The client looks to the usual caller like a regular CLIENT *, but is
created with the following call:
CLIENT *clntproxy_create(unsigned long domain, unsigned long type, struct
sockaddr *addr, unsigned long addrlen, unsigned long prog, unsigned long
vers);
The hope is that this one interface works well for IPv4/6 and TCP/UDP.
Like the usual clnt*_create calls, you can leave addr->sin_port == 0, and
the portmapper will be consulted on your behalf. Also, a best guess
attempt will be made if the specified version of the program cannot be
found, although this requires the portmapper to support PMAPPROC_DUMP.
The proxy is bundled as a daemon called 'rpcproxyd'. This daemon is
implicitly started by the first call to clntproxy_create(). It deamonizes
itself and listens on a unix socket at /var/run/rpcproxyd/socket. To
proxy, the client connects to this socket, and sends off a negotiation
packet outlining the endpoint's info, and waits for a connect error
message. If the error message is 0, then the client works just like a
clntunix_create call. I originally called clntunix_create from
clntproxy_create, but pulled in the clntunix code directly so that I could
override CLSET_FD_CLOSE && CLSET_FD_NCLOSE clnt_control() flags.
rpcproxyd will create outbound connections and multiplex the transports
with any number of simultaneous clients. There is no support for
re-binding your transport once created. It will also cache outbound
server connections for 30 seconds after last use, which greatly helps keep
the number of ports used down in a mount-storm situation.
rpcproxyd is written as a single-threaded/no-signals/select-based daemon.
It can be run in debug mode by passing '-d' as a command line argument.
The second patch changes mount(8) and umount(8) to use the
clntproxy_create call when doing NFS stuff.
Please review and comment.
Thanks,
Mike Waychison
Hi Mike,
On Mon, Mar 14, 2005 at 12:34:58PM -0500, [email protected] wrote:
> > For instance, I can connect to your service, and fork off
> > some setuid root application, with stderr connected to that
> > socket. Any error message the application prints will be arrive
> > with uid 0. If I manage to make that message appear valid to you,
> > your daemon will accept any future input unquestioned.
> >
>
> Interesting attack, although I doubt the setuid program would be attaching
> an SCM_CREDENTIALS to it's stderr writes. I'll fix it up to check
> credentials on all packets nevertheless.
The application doesn't have to pass them explicitly. They'll be
attached automatically by the kernel.
> > If you make it less generic, and allow only mount calls, you'll
> > be much safer, because in the case of a bug, an attacker will
> > be able to send fake MOUNT packets, but nothing else.
> >
>
> Hmm. I like the idea of keeping it generic as it may very well solve
> someone else's problem as well. As for locking it down to MOUNT (and
> possibly PMAP/RPCB), how about some sort of config file that limits
> PROG/VERS tuples?
That works as well.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
> Hi Mike,
>
> On Mon, Mar 14, 2005 at 12:34:58PM -0500, [email protected] wrote:
>> > For instance, I can connect to your service, and fork off
>> > some setuid root application, with stderr connected to that
>> > socket. Any error message the application prints will be arrive
>> > with uid 0. If I manage to make that message appear valid to you,
>> > your daemon will accept any future input unquestioned.
>> >
>>
>> Interesting attack, although I doubt the setuid program would be
>> attaching
>> an SCM_CREDENTIALS to it's stderr writes. I'll fix it up to check
>> credentials on all packets nevertheless.
>
> The application doesn't have to pass them explicitly. They'll be
> attached automatically by the kernel.
Ah. Didn't know that. This makes verifying each request much more critic=
al.
Mike Waychison
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs
_______________________________________________
autofs mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/autofs
> hi mike-
>
>> Every once in a while, we see complaints that nfs mounts are
>> failing due
>> to there being no more reserved ports available for outbound rpc
>> communication. This often happens when using TCP transports,
>> because all
>> outbound connections that are closed go into a TIME_WAIT state.
>
> i know that steved has also been looking at this problem. he found that
> user-land uses TCP connections with abandon and has put it on a diet.
> and, the client side should take care to avoid using reserved ports
> unless it is absolutely necessary.
Yup. I tested out SteveD's stuff and it certainly cut down on reserved
port usage. However, I was still able locally DOS my test client by
mounting several thousand mounts within a minute or so. The problem is
that talking to mountd still requires the reserved port :\
>
>> For various reasons, avoiding the TIME_WAIT state using connect(fd,
>> {AF_UNSPEC}, ...) or doing SO_REUSEADDR may not be the safest way to
>> handle things.
>
> i have a patch for the kernel RPC client to use AF_UNSPEC to reuse the
> port number on connections. before i head into the wilderness, what's
> unsafe about that?
It's less unsafe than SO_REUSEADDR, however it completely ignores
TIME_WAIT, which has two purposes (according to W.R. Stevens):
- If the client performs the active close, it allows the socket to
remember to resend the final ACK in response to the server's FIN.
Otherwise the client would respond with a RST.
- It ensures that no re-incarnations of the same four-tuple occur within 2
* MSL, ensuring that no packets that were lost on the network from the
first incarnation get used as part of the new incarnation.
Avoiding TIME_WAIT altogether keeps TCP from doing a proper full-duplex
close and also allows old packets to screw up TCP state.
>
>> rpcproxyd will create outbound connections and multiplex the
>> transports
>> with any number of simultaneous clients. There is no support for
>> re-binding your transport once created. It will also cache outbound
>> server connections for 30 seconds after last use, which
>> greatly helps keep the number of ports used down in a mount-storm
> situation.
>
> the typical RPC client-side connection timeout is 5 minutes. any reason
> not to use that value instead of 30 seconds?
The timeouts used currently in rpcproxyd were chosen arbitrarily.
Which timeout is 5 minutes? (sparing me the trouble of finding it myself).
I figured 30 for caching un-used tcp connections sounded like a good
number as it means that if TIME_WAIT period is 2 minutes, that the most
TIME_WAIT connections that you can have to a given remote service at any
time is 4.
A bit more thought could be had for timeouts (wishlist):
- TCP connect()s aren't timedout. (uses EINPROGRESS, but will wait
indefinitely, which I think is bounded by the kernel anyway).
- UDP retransmission occurs in the proxy itself, which is currently
hardcoded to retry 5 times, once every two seconds. I can trivially set
it up so that it gets the actual timeout values from the client though and
use those parameters.
>
> you will also need a unique connection for each program/version number
> combination to a given server; ie you can't share a single socket among
> different program/version numbers (even though some implementations try
> to do this, it is a bad practice, in my opinion).
>
So, I know I discussed this with you last week, however I was under the
impression that that was needed for the case where you support re-binding
of the transport. I'm not up to speed of who are the users of such a
thing (I'm assuming NFSv4).
Also, AFAICT, the glibc sunrpc stuff will actually bind you to a program
even if the version doesn't exist. This is used by rpcinfo [-u | -t] for
udpping / tcpping.
> to support IPv6 you will need support for rpcbind versions 3 and 4; but
> that may be an issue outside of rpcproxyd.
>
Okay. I'm not so familiar with RPCB, but it is just a PMAP on steroids,
right?
It wouldn't be needed for the basic common use of rpcproxyd, however it
would likely be required for cases where clntproxy_create has to do some
lookups (eg: port == 0, maybe even addr == NULL).
>> rpcproxyd is written as a single-threaded/no-signals/select-based
> daemon.
>
> will using select be a scalability issue? you might be better off with
> something like libevent or using epoll.
>
It may become a scalability issue, although I don't know what the typical
numbers are for live descriptors. It was written with select cause that's
what I know ;)
In any event, using epoll would likely be a good thing to do.
Mike Waychison
Hi Mike,
I'm somewhat wary of this idea, because it creates a very attractive
target for crackers. Once an attacker managers to fool this proxy
daemon, he can talk to your NFS Server, concealing his identity or
even posing as root, etc.
I looked at the code in the light of this, and there are a few
soft spots.
- you retrieve credentials in the initial negotiation
only. But client credentials can change, and should be
checked with every packet.
For instance, I can connect to your service, and fork off
some setuid root application, with stderr connected to that
socket. Any error message the application prints will be arrive
with uid 0. If I manage to make that message appear valid to you,
your daemon will accept any future input unquestioned.
So my recommendation would be to validate credentials on
every packet, and if they change from the original values
received during negotiation, drop the connection.
I'd also recommend to change the permissions on the Unix
socket to 700.
- This mechanism is really a special-case workaround for the
mount problem, but you're creating a general purpose framework.
That means if it's attacked successfully, it can be abused
to attack any RPC based service on any host.
If you make it less generic, and allow only mount calls, you'll
be much safer, because in the case of a bug, an attacker will
be able to send fake MOUNT packets, but nothing else.
- In several places, you keep packet offsets (pos, bufferpos)
in signed variables and compare them to unsigneds; that's
dangerous.
- Your code assumes sizeof(unsigned long) == 4, eg here
unsigned long len;
[...]
memcpy(&len, conn->buffer, 4);
len = ntohl(len);
buf = &conn->buffer[4];
It's better to use uint32_t in such cases.
Regards,
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
> Hi Mike,
>
> I'm somewhat wary of this idea, because it creates a very attractive
> target for crackers. Once an attacker managers to fool this proxy
> daemon, he can talk to your NFS Server, concealing his identity or
> even posing as root, etc.
>
> I looked at the code in the light of this, and there are a few
> soft spots.
>
> - you retrieve credentials in the initial negotiation
> only. But client credentials can change, and should be
> checked with every packet.
>
> For instance, I can connect to your service, and fork off
> some setuid root application, with stderr connected to that
> socket. Any error message the application prints will be arrive
> with uid 0. If I manage to make that message appear valid to you,
> your daemon will accept any future input unquestioned.
>
Interesting attack, although I doubt the setuid program would be attachin=
g
an SCM_CREDENTIALS to it's stderr writes. I'll fix it up to check
credentials on all packets nevertheless.
> So my recommendation would be to validate credentials on
> every packet, and if they change from the original values
> received during negotiation, drop the connection.
>
> I'd also recommend to change the permissions on the Unix
> socket to 700.
>
Yup. Sounds good to me.
> - This mechanism is really a special-case workaround for the
> mount problem, but you're creating a general purpose framework.
> That means if it's attacked successfully, it can be abused
> to attack any RPC based service on any host.
>
> If you make it less generic, and allow only mount calls, you'll
> be much safer, because in the case of a bug, an attacker will
> be able to send fake MOUNT packets, but nothing else.
>
Hmm. I like the idea of keeping it generic as it may very well solve
someone else's problem as well. As for locking it down to MOUNT (and
possibly PMAP/RPCB), how about some sort of config file that limits
PROG/VERS tuples?
> - In several places, you keep packet offsets (pos, bufferpos)
> in signed variables and compare them to unsigneds; that's
> dangerous.
Noted. I'll try and audit to resolve any signedness issues.
>
> - Your code assumes sizeof(unsigned long) =3D=3D 4, eg here
>
> unsigned long len;
> [...]
> memcpy(&len, conn->buffer, 4);
> len =3D ntohl(len);
> buf =3D &conn->buffer[4];
>
> It's better to use uint32_t in such cases.
Yup. Will fix.
Thanks for looking it over,
Mike Waychison
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs