2008-03-04 05:36:01

by NeilBrown

[permalink] [raw]
Subject: [PATCH] Add more entropy to selection of port and xid for NFS client.

Hi Trond/Chunk

I wonder what you think of this patch.

I found we needed it for some customers who had machines with NFS root
who (for reasons I never fully understood) find they reboot their
machines fairly often (maybe it was a test setup or something). The
often got "inode number mismatch" errors because the reply came out of
the server's cache and was left over from before the client's reboot.

Thanks,
NeilBrown


### Comments for Changeset

Diskless clients that use NFS-root have very little chance
to gain much entropy in net_random before the NFS client must
choose a port number and an 'xid' to being talking to the server.
So add as much entropy as possible.

Without this, clients can and do suffer from collisions in the servers
reply cache, and get meaningless replies.

Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./net/sunrpc/xprt.c | 5 ++++-
./net/sunrpc/xprtsock.c | 7 +++++--
2 files changed, 9 insertions(+), 3 deletions(-)

diff .prev/net/sunrpc/xprt.c ./net/sunrpc/xprt.c
--- .prev/net/sunrpc/xprt.c 2008-03-04 16:33:26.000000000 +1100
+++ ./net/sunrpc/xprt.c 2008-03-04 16:33:36.000000000 +1100
@@ -922,7 +922,10 @@ static inline __be32 xprt_alloc_xid(stru

static inline void xprt_init_xid(struct rpc_xprt *xprt)
{
- xprt->xid = net_random();
+ get_random_bytes(&xprt->xid, sizeof(xprt->xid));
+ /* And just in case random_state isn't initialised yet.. */
+ xprt->xid ^= net_random();
+ xprt->xid += (CURRENT_TIME.tv_sec << 10);
}

static void xprt_request_init(struct rpc_task *task, struct rpc_xprt *xprt)

diff .prev/net/sunrpc/xprtsock.c ./net/sunrpc/xprtsock.c
--- .prev/net/sunrpc/xprtsock.c 2008-03-04 16:33:26.000000000 +1100
+++ ./net/sunrpc/xprtsock.c 2008-03-04 16:28:06.000000000 +1100
@@ -1283,8 +1283,11 @@ static void xs_udp_timer(struct rpc_task
static unsigned short xs_get_random_port(void)
{
unsigned short range = xprt_max_resvport - xprt_min_resvport;
- unsigned short rand = (unsigned short) net_random() % range;
- return rand + xprt_min_resvport;
+ unsigned short rand;
+ get_random_bytes(&rand, sizeof(rand));
+ rand ^= (unsigned short)net_random();
+ rand ^= (CURRENT_TIME.tv_nsec >> 10);
+ return (rand % range) + xprt_min_resvport;
}

/**


2008-03-04 17:02:11

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] Add more entropy to selection of port and xid for NFS client.

Hi Neil-

On Mar 4, 2008, at 12:35 AM, NeilBrown wrote:
> Hi Trond/Chunk
>
> I wonder what you think of this patch.
>
> I found we needed it for some customers who had machines with NFS root
> who (for reasons I never fully understood) find they reboot their
> machines fairly often (maybe it was a test setup or something). The
> often got "inode number mismatch" errors because the reply came out of
> the server's cache and was left over from before the client's reboot.

If the problem is indeed lack of entropy at boot time, then random32
() should be fixed. It suggests that random32_init() isn't getting
invoked soon enough during boot to have any effect on the starting
XID selection. This seems to get broken periodically... perhaps
random32() should warn if it detects that random32_init() hasn't been
invoked yet.

Another alternative: continue to use net_random(), but the RPC client
could add entropy via net_srandom() calls during it's
initialization. Unfortunately net_srandom() works only on the local
CPU's entropy pool, so net_random() calls on other CPUs would still
be a problem.

More below...

> ### Comments for Changeset
>
> Diskless clients that use NFS-root have very little chance
> to gain much entropy in net_random before the NFS client must
> choose a port number and an 'xid' to being talking to the server.
> So add as much entropy as possible.
>
> Without this, clients can and do suffer from collisions in the servers
> reply cache, and get meaningless replies.
>
> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./net/sunrpc/xprt.c | 5 ++++-
> ./net/sunrpc/xprtsock.c | 7 +++++--
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff .prev/net/sunrpc/xprt.c ./net/sunrpc/xprt.c
> --- .prev/net/sunrpc/xprt.c 2008-03-04 16:33:26.000000000 +1100
> +++ ./net/sunrpc/xprt.c 2008-03-04 16:33:36.000000000 +1100
> @@ -922,7 +922,10 @@ static inline __be32 xprt_alloc_xid(stru
>
> static inline void xprt_init_xid(struct rpc_xprt *xprt)
> {
> - xprt->xid = net_random();
> + get_random_bytes(&xprt->xid, sizeof(xprt->xid));

In fact I think we used to use get_random_bytes() for the initial
XID, and that wasn't initialized at boot time either. We hit exactly
the same problem.

> + /* And just in case random_state isn't initialised yet.. */
> + xprt->xid ^= net_random();
> + xprt->xid += (CURRENT_TIME.tv_sec << 10);
> }

I would use tv_nsec instead -- that will give a lot more randomness.
I see that the port selection logic does use tv_nsec.

Basically you are using get_random_bytes and net_random, both of
which are known not to work at boot time. :-)

> static void xprt_request_init(struct rpc_task *task, struct
> rpc_xprt *xprt)
>
> diff .prev/net/sunrpc/xprtsock.c ./net/sunrpc/xprtsock.c
> --- .prev/net/sunrpc/xprtsock.c 2008-03-04 16:33:26.000000000 +1100
> +++ ./net/sunrpc/xprtsock.c 2008-03-04 16:28:06.000000000 +1100
> @@ -1283,8 +1283,11 @@ static void xs_udp_timer(struct rpc_task
> static unsigned short xs_get_random_port(void)
> {
> unsigned short range = xprt_max_resvport - xprt_min_resvport;
> - unsigned short rand = (unsigned short) net_random() % range;
> - return rand + xprt_min_resvport;
> + unsigned short rand;
> + get_random_bytes(&rand, sizeof(rand));
> + rand ^= (unsigned short)net_random();
> + rand ^= (CURRENT_TIME.tv_nsec >> 10);
> + return (rand % range) + xprt_min_resvport;
> }
>
> /**


Maybe you should create a new shared function (perhaps in xprt.c)
that does the mashing up of the current time and some random values.

Port selection can take the middle 16 bits of a 32-bit random number.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-03-06 02:36:27

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] Add more entropy to selection of port and xid for NFS client.

On Tuesday March 4, [email protected] wrote:
> Hi Neil-
>
> On Mar 4, 2008, at 12:35 AM, NeilBrown wrote:
> > Hi Trond/Chunk
> >
> > I wonder what you think of this patch.
> >
> > I found we needed it for some customers who had machines with NFS root
> > who (for reasons I never fully understood) find they reboot their
> > machines fairly often (maybe it was a test setup or something). The
> > often got "inode number mismatch" errors because the reply came out of
> > the server's cache and was left over from before the client's reboot.
>
> If the problem is indeed lack of entropy at boot time, then random32
> () should be fixed. It suggests that random32_init() isn't getting
> invoked soon enough during boot to have any effect on the starting
> XID selection. This seems to get broken periodically... perhaps
> random32() should warn if it detects that random32_init() hasn't been
> invoked yet.

Ahhh.....

This wasn't with the most recent kernel (at all) and I see that the
net_random being used at the time is somewhat more simplistic than the
random32 that is now being used.

So maybe the problem has already been fixed at a deeper level.
Certainly random32 is being initialised with a bit more entropy
than net_random was at the time.

My approach was: these numbers really really need to not repeat, so
mix together as much potential entropy as possible, hoping that some
of it really will be different each reboot.

Maybe random32 is now sufficient in itself.

>
> Another alternative: continue to use net_random(), but the RPC client
> could add entropy via net_srandom() calls during it's
> initialization.

There is nothing random in the MOUNT reply, is there? I guess if TCP
were used, the servers sequence number would be random enough to
remove and XID collision problems. But that is probably fairly hard
to extract.

I think I might just assume that the new net_random (aka random32) is
good enough, and that the problem only exists in older kernels.

Thanks,
NeilBrown