2008-07-30 22:13:09

by [email protected]

[permalink] [raw]
Subject: Re: Massive NFS problems on large cluster with large number of mounts

On Wed, Jul 30, 2008 at 03:33:38PM -0400, Chuck Lever wrote:
> On Wed, Jul 30, 2008 at 1:53 PM, J. Bruce Fields <[email protected]> wrote:
> > In any case, this all seems a bit orthogonal to the problem of what
> > ports the rpcbind client uses, right?
>
> No, this is exactly the original problem. The reason xprt_maxresvport
> is allowed to go larger than 1023 is to permit more NFS mounts. There
> really is no other reason for it I can think of.
>
> But it's broken (or at least inconsistent) behavior that max_resvport
> can go past 1023 in the first place. The name is "max_resvport" --
> Maximum Reserved Port. A port value of more than 1024 is not a
> reserved port. These sysctls are designed to restrict the range of
> ports used when a _reserved_ port is requested, not when _any_ source
> port is requested. Trond's suggestion is an "off label" use of this
> facility.

We could do a better job of communicating what is and isn't a documented
usage, in that case.

Once people are already using an interface a certain way (and because we
told them to) discussions about whether it's really a correct use start
to seem a little academic.

> And rpcbind isn't the only kernel-level RPC service that requires a
> reserved port. The kernel-level NSM code that calls user space, for
> example, is one such service. In other words, rpcbind isn't the only
> service that could potentially hit this issue, so an rpcbind-only fix
> would be incomplete.
>
> We already have an appropriate interface for kernel RPC services to
> request a non-privileged port. The NFS client should use that
> interface.

I admit that would be nicer.

--b.

> Now, we don't have to change both at the same time. We can introduce
> the mount option now; the default reserved port range is still good.
> And eventually folks using the sysctl will hit the rpcbind bug (or a
> lock recovery problem), trace it back to this issue, and change their
> mount options and reset their resvport sysctls.
>
> At some later point, though, the maximum should be restricted to 1023.
>
> >> Such an "insecure" mount option would then set
> >> RPC_CLNT_CREATE_NONPRIVPORT on rpc_clnt's created on behalf of the NFS
> >> client.
> >>
> >> I'm not married to the names of the options, or even using a mount
> >> option at all (although that seems like a natural place to put such a
> >> feature).
> >>
> >> Thoughts?
>
> --
> Chuck Lever


2008-07-31 16:36:20

by Chuck Lever III

[permalink] [raw]
Subject: Re: Massive NFS problems on large cluster with large number of mounts

On Jul 30, 2008, at 6:13 PM, J. Bruce Fields wrote:
> On Wed, Jul 30, 2008 at 03:33:38PM -0400, Chuck Lever wrote:
>> On Wed, Jul 30, 2008 at 1:53 PM, J. Bruce Fields <[email protected]
>> > wrote:
>>> In any case, this all seems a bit orthogonal to the problem of what
>>> ports the rpcbind client uses, right?
>>
>> No, this is exactly the original problem. The reason
>> xprt_maxresvport
>> is allowed to go larger than 1023 is to permit more NFS mounts.
>> There
>> really is no other reason for it I can think of.
>>
>> But it's broken (or at least inconsistent) behavior that max_resvport
>> can go past 1023 in the first place. The name is "max_resvport" --
>> Maximum Reserved Port. A port value of more than 1024 is not a
>> reserved port. These sysctls are designed to restrict the range of
>> ports used when a _reserved_ port is requested, not when _any_ source
>> port is requested. Trond's suggestion is an "off label" use of this
>> facility.
>
> We could do a better job of communicating what is and isn't a
> documented
> usage, in that case.
>
> Once people are already using an interface a certain way (and
> because we
> told them to) discussions about whether it's really a correct use
> start
> to seem a little academic.

It's not at all academic.

We _must_ revisit interface design whenever we have a design that
results in a kernel paging exception, a privilege escalation or denial
of service, or it's simply confusing or using standard terminology
incorrectly. It is always appropriate to talk about it.

What we need to be careful about when people are already using an
interface is how we go about changing it.

>> And rpcbind isn't the only kernel-level RPC service that requires a
>> reserved port. The kernel-level NSM code that calls user space, for
>> example, is one such service. In other words, rpcbind isn't the only
>> service that could potentially hit this issue, so an rpcbind-only fix
>> would be incomplete.
>>
>> We already have an appropriate interface for kernel RPC services to
>> request a non-privileged port. The NFS client should use that
>> interface.
>
> I admit that would be nicer.
>
> --b.
>
>> Now, we don't have to change both at the same time. We can introduce
>> the mount option now; the default reserved port range is still good.
>> And eventually folks using the sysctl will hit the rpcbind bug (or a
>> lock recovery problem), trace it back to this issue, and change their
>> mount options and reset their resvport sysctls.
>>
>> At some later point, though, the maximum should be restricted to
>> 1023.
>>
>>>> Such an "insecure" mount option would then set
>>>> RPC_CLNT_CREATE_NONPRIVPORT on rpc_clnt's created on behalf of
>>>> the NFS
>>>> client.
>>>>
>>>> I'm not married to the names of the options, or even using a mount
>>>> option at all (although that seems like a natural place to put
>>>> such a
>>>> feature).
>>>>
>>>> Thoughts?
>>
>> --
>> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com