2012-03-11 02:02:18

by Nikolaus Rath

[permalink] [raw]
Subject: NFS4 over VPN hangs when connecting > 2 clients

Hello,

I am experiencing system hangs when running NFSv4 over a tinc VPN. I
don't know if the problem is with NFS or tinc and would appreciate any
suggestions on how to narrow down the culprit. Unfortunately I cannot
simply run NFS directly over TCP -- the participating systems are
connected only over an open network.

The configuration is as follows: I have a main server that exports the
NFS shares and also acts as hub for the VPN. All other clients connect
to the main server to establish the VPN, and then mount the NFS shares
over the VPN.

I am using kernel 3.0.0, 64bit Ubuntu 10.04 LTS on both server and
clients.

On the server:
# cat /etc/exports
/srv/nfs4 -ro,no_subtree_check,fsid=root 192.168.1.1/24
/srv/nfs4/home -rw,async,no_subtree_check 192.168.1.1/24
/srv/nfs4/opt -rw,async,no_subtree_check 192.168.1.1/24
/srv/nfs4/hbt -rw,async,no_subtree_check 192.168.1.1/24

On the clients:
# cat /etc/hostconf/all/etc/fstab | grep -i nfs
spitzer:/opt /opt nfs4 bg 0 0
spitzer:/home /home nfs4 bg 0 0

The problem is that as soon as more than three clients are accessing the
NFS shares, any operations on the NFS mountpoints by the clients hang.
At the same time, CPU usage of the VPN processes becomes very high. If I
run the VPN in debug mode, all I can see is that it is busy forwarding
lots of packets. I also ran a packet sniffer which showed me that 90% of
the packets were NFS related, but I am not familiar enough with NFS to
be able to tell anything from the packets themselves. I can provide an
example of the dump if that helps.


Any suggestions how I could further debug this?


Best,


-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C


2012-03-19 19:08:56

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 19, 2012 at 03:00:57PM -0400, Chuck Lever wrote:
>
> On Mar 19, 2012, at 2:54 PM, J. Bruce Fields wrote:
>
> > On Mon, Mar 19, 2012 at 02:42:30PM -0400, Chuck Lever wrote:
> >>
> >> On Mar 19, 2012, at 2:39 PM, J. Bruce Fields wrote:
> >>
> >>> On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
> >>>>
> >>>> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
> >>>>> That's also not this case, sorry, this time with all the conditions:
> >>>>>
> >>>>> - if the nfs_client_id4 is the same, and
> >>>>> - if the flavor is auth_sys, and
> >>>>> - if the client IP address is different,
> >>>>> - then return NFS4ERR_INUSE.
> >>>>
> >>>> This still breaks for multi-homed servers and UCS clients. The client IP address can be different depending on what server IP address the client is accessing, but all the other parameters are the same.
> >>>
> >>> OK. So probably there's nothing we can do to help here.
> >>>
> >>> As a bandaid maybe a rate-limited log message ("clientid X now in use
> >>> from IP Y") might help debug these things....
> >>
> >> Hm, OK. That implies your server implementation assumes that a clientid4 maps to exactly one client IP address at a time.
> >
> > OK, agreed. So how about something like "state for client X previously
> > established from IP Y now cleared from IP Z" ??
> >
> > (Assuming it's only the I-just-rebooted setclientid case that's likely
> > to be the sign of a problem.)
>
> We would see that only in the case where the boot verifier and the client IP change at the same time. That can happen legitimately, too, if the client has a dynamically assigned IP address. Maybe this event is only interesting if it happens more than once during the same second.

"Warning: IP addresses Y and Z currently appear to be in a bitter
struggle over client id X"....

That may be getting complicated enough to not be worth it except as a
part of some more general statistics.

--b.

2012-03-19 16:53:53

by Rick Macklem

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

J. Bruce Fields wrote:
> On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
> > On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
> > > IMO, the server should do a comparison of the nfs_client_id4
> > > strings,
> > > and nothing else.
> >
> > We're supposed to return CLID_INUSE when we see a setclientid from a
> > "different" client using the same string, to keep clients from doing
> > mischief with other clients' state (either maliciously or, as in
> > this
> > case, accidentally).
> >
> > "Different" here is defined as "not having the same principal". I
> > know
> > what that means in the krb5 case, but I'm less certain in the
> > auth_sys
> > case.
>
> Cc'ing the ietf list. Is it reasonable for a server to expect
> setclientid's to come from the same client IP address at least in the
> auth_sys case, or could that break multi-homed clients?
>
I think that even a dhcp lease renewal might result in a different client
IP, if the client has been partitioned from the dhcp server for a while.

I'm not convinced that different client IP# implies different client.
(Even "same ip# implies same client" might not be true, if the dhcp
server assigned the IP# to another machine while the client was partitioned
from the dhcp server, I think? I haven't looked at current dhcp
implementations, but it seems conceivable to me.)

For AUTH_SYS, all the FreeBSD server does is expect the same uid#.

rick
> At least in the auth_sys case IP addresses are one of the only things
> we
> have left to go on when the client's identifier-generation is messed
> up
> (not that difficult).
>
> --b.
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4

2012-03-12 20:15:05

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 12, 2012 at 03:45:05PM -0400, Nikolaus Rath wrote:
> On 03/12/2012 03:31 PM, J. Bruce Fields wrote:
> > On Mon, Mar 12, 2012 at 12:20:17PM -0400, Nikolaus Rath wrote:
> >> Nikolaus Rath <[email protected]> writes:
> >>> The problem is that as soon as more than three clients are accessing the
> >>> NFS shares, any operations on the NFS mountpoints by the clients hang.
> >>> At the same time, CPU usage of the VPN processes becomes very high. If I
> >>> run the VPN in debug mode, all I can see is that it is busy forwarding
> >>> lots of packets. I also ran a packet sniffer which showed me that 90% of
> >>> the packets were NFS related, but I am not familiar enough with NFS to
> >>> be able to tell anything from the packets themselves. I can provide an
> >>> example of the dump if that helps.
> >>
> >> I have put a screenshot of the dump on
> >> http://www.rath.org/res/wireshark.png (the full dump is 18 MB, and I'm
> >> not sure which parts are important).
> >
> > Looks like they're doing SETCLIENTID, SETCLIENTID_CONFIRM, OPEN,
> > OPEN_CONFIRM repeatedly.
> >
> >> Any suggestions how I could further debug this?
> >
> > Could the clients be stepping on each others' state if they all think
> > they have the same IP address (because of something to do with the VPN
> > networking?)
>
> That sounds like promising path of investigation. What determines the IP
> of a client as far as NFS is concerned?

I don't remember where it gets the ip it uses to construct clientid's
from.... But there is a mount option (clientaddr=) that will let you
change what it uses. So it *might* be worth checking whether using a
clientaddr= option on each client (giving it a different ipaddr on each
client) would change the behavior.

> > It'd be interesting to know the fields of the setclientid call, and the
> > errors that the server is responding with to these calls. If you look
> > at the packet details you'll probably see the same thing happening
> > over and over again.
> >
> > Filtering to look at traffic between server and one client at a time
> > might help to see the pattern.
>
> Hmm. I'm looking at the fields, but I just have no idea what any of
> those mean. Would you possibly be willing to take a look? I uploaded a
> pcap dump of a few packets to http://www.rath.org/res/sample.pcap.

Looking at the packet details, under the client id field, the clients
are all using:

"0.0.0.0/192.168.1.2 tcp UNIX 0"

And the server is returning STALE_CLIENTID to some SETCLIENTID_CONFIRMs
(I wonder if that's a server bug, that doesn't sound like the right
error--though this is a weird case), and NFS4ERR_EXPIRED to some OPENs
(I think that's correct server behavior if it thinks another SETCLIENTID
purged the state).

--b.

2012-03-12 19:45:06

by Nikolaus Rath

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On 03/12/2012 03:31 PM, J. Bruce Fields wrote:
> On Mon, Mar 12, 2012 at 12:20:17PM -0400, Nikolaus Rath wrote:
>> Nikolaus Rath <[email protected]> writes:
>>> The problem is that as soon as more than three clients are accessing the
>>> NFS shares, any operations on the NFS mountpoints by the clients hang.
>>> At the same time, CPU usage of the VPN processes becomes very high. If I
>>> run the VPN in debug mode, all I can see is that it is busy forwarding
>>> lots of packets. I also ran a packet sniffer which showed me that 90% of
>>> the packets were NFS related, but I am not familiar enough with NFS to
>>> be able to tell anything from the packets themselves. I can provide an
>>> example of the dump if that helps.
>>
>> I have put a screenshot of the dump on
>> http://www.rath.org/res/wireshark.png (the full dump is 18 MB, and I'm
>> not sure which parts are important).
>
> Looks like they're doing SETCLIENTID, SETCLIENTID_CONFIRM, OPEN,
> OPEN_CONFIRM repeatedly.
>
>> Any suggestions how I could further debug this?
>
> Could the clients be stepping on each others' state if they all think
> they have the same IP address (because of something to do with the VPN
> networking?)

That sounds like promising path of investigation. What determines the IP
of a client as far as NFS is concerned? ifconfig on the clients reports
different IPs, e.g.

hbt Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:192.168.1.20 P-t-P:192.168.1.20 Mask:255.255.255.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:6285 errors:0 dropped:0 overruns:0 frame:0
TX packets:6851 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:2749422 (2.7 MB) TX bytes:1860654 (1.8 MB)


hbt Link encap:UNSPEC HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:192.168.1.19 P-t-P:192.168.1.19 Mask:255.255.255.0
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:2306277 errors:0 dropped:0 overruns:0 frame:0
TX packets:1761902 errors:0 dropped:0 overruns:33054 carrier:0
collisions:0 txqueuelen:500
RX bytes:2534582154 (2.5 GB) TX bytes:888058175 (888.0 MB)


> It'd be interesting to know the fields of the setclientid call, and the
> errors that the server is responding with to these calls. If you look
> at the packet details you'll probably see the same thing happening
> over and over again.
>
> Filtering to look at traffic between server and one client at a time
> might help to see the pattern.

Hmm. I'm looking at the fields, but I just have no idea what any of
those mean. Would you possibly be willing to take a look? I uploaded a
pcap dump of a few packets to http://www.rath.org/res/sample.pcap.


Best,

-Nikolaus

--
?Time flies like an arrow, fruit flies like a Banana.?

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-19 18:43:50

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On 03/19/2012 02:39 PM, J. Bruce Fields wrote:
> On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
>>
>> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
>>
>>> On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
>>>>
>>>> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>>>>> Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in the
>>>>> case where we get setclientid's with the same client-provided id.
>>>>> There'd be no change of behavior in the case of multiple clients sharing
>>>>> an IP (which is fine, of course).
>>>>
>>>> The migration draft proposes that clients use the same nfs_client_id4 string for all of a server's IP addresses. Would a server then be obliged to return NFS4ERR_CLID_IN_USE if a client attempts a SETCLIENTID with the same boot verifier and nfs_client_id4 on more than one IP address for the same server?
>>>
>>> That's also not this case, sorry, this time with all the conditions:
>>>
>>> - if the nfs_client_id4 is the same, and
>>> - if the flavor is auth_sys, and
>>> - if the client IP address is different,
>>> - then return NFS4ERR_INUSE.
>>
>> This still breaks for multi-homed servers and UCS clients. The client IP address can be different depending on what server IP address the client is accessing, but all the other parameters are the same.
>
> OK. So probably there's nothing we can do to help here.
>
> As a bandaid maybe a rate-limited log message ("clientid X now in use
> from IP Y") might help debug these things....

Since you guys keep Cc'ing me, I'll chime in with a rather naive
suggestion: if all that's required is a unique id for every client, why
not use the MAC of the first network interface, independent of it being
used for communication with the server?


Best,

-Nikolaus

--
?Time flies like an arrow, fruit flies like a Banana.?

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-20 14:36:34

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On 03/20/2012 09:55 AM, Myklebust, Trond wrote:
> On Tue, 2012-03-20 at 09:29 -0400, Nikolaus Rath wrote:
>> On 03/19/2012 06:25 PM, Rick Macklem wrote:
>>> Nikolaus Rath wrote:
>>> ps: Also, although it's not very relevant, getting the MAC address of
>>> the first ethernet interface isn't easy in FreeBSD. I have no idea
>>> if the same is true of Linux. (I'd also be worried that "first"
>>> might not be fixed?)
>>
>> It doesn't need to be the same interface all the time, I just meant the
>> first as in not a specific one.
>
> Yes it does have to be the same interface all the time. Otherwise the
> server cannot tell that this is the same client booting up again.

The likelihood of the interface order changing is certainly much lower
than the likelihood of the IP address changing (which I understand is
currently used for the clientid), so the situation would still improve.


Best,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-12 21:14:49

by Chuck Lever

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients


On Mar 12, 2012, at 5:04 PM, J. Bruce Fields wrote:

> On Mon, Mar 12, 2012 at 04:49:29PM -0400, Chuck Lever wrote:
>>
>> On Mar 12, 2012, at 4:42 PM, J. Bruce Fields wrote:
>>
>>> On Mon, Mar 12, 2012 at 04:30:42PM -0400, Nikolaus Rath wrote:
>>>> On 03/12/2012 04:15 PM, J. Bruce Fields wrote:
>>>>> Looking at the packet details, under the client id field, the clients
>>>>> are all using:
>>>>>
>>>>> "0.0.0.0/192.168.1.2 tcp UNIX 0"
>>>>
>>>> Hmm. 192.168.1.2 is the server's address on the VPN. Is that supposed to
>>>> be there?
>>>
>>> Yes,and the first ip is usually the ip of the client, which does suggest
>>> the client is guessing it's ip wrong; so the "clientaddr=" option will
>>> likely help.
>>
>> I thought 0.0.0.0 was a legal callback address, and means "don't send me CB requests".
>
> Yes, that part's fine, it's using it in the clientid that gets us into
> trouble here....
>
>> But if all the clients are using the same nfs_client_id4 string, then no, the server can't distinguish between them, and they will tromp on each other's state.
>
> Yeah.
>
>>
>> The question is why can't the clients tell what their own IP address is? mount.nfs is supposed to figure that out automatically. Could be a bug in mount.nfs.
>
> You know that code better than me.... Looks like it does basically
> gethostbyname(gethostname()) ?

Nope, it does a connect(2) on a UDP socket, and then getsockname(2) on that socket. See nfs_callback_address() in nfs-utils.

> An strace -f of the mount from Nikolaus might help explain what happened
> here.

Agree.

>
>>> Hm, perhaps the server should be rejecting these SETCLIENTID's with
>>> INUSE. It used to do that, and the client would likely recover from
>>> that more easily.
>>
>> INUSE means the client is using multiple authentication flavors when performing RENEW or SETCLIENTID. I can't think of a reason the server should reject these; it's not supposed to look at the contents of the nfs_client_id4 string.
>
> Well, from the trace the requests do appear (from the server's point of
> view) to be coming from different IP addresses. We used to use that
> fact to return INUSE in this sort of case, which I think would trigger
> the client to increment its uniqufier and work around the problem.
>
> In the commit where I changed that I said:
>
> The spec allows clients to change ip address, so we shouldn't be
> requiring that setclientid always come from the same address.
> For example, a client could reboot and get a new dhcpd address,
> but still present the same clientid to the server. In that case
> the server should revoke the client's previous state and allow
> it to continue, instead of (as it currently does) returning a
> CLID_INUSE error.
>
> But maybe I should have applied that reasoning only in the krb5 case--in
> the auth_sys case maybe the client ip address is really the only thing
> we have to distinguish two clients.

IMO, the server should do a comparison of the nfs_client_id4 strings, and nothing else. The client IP addresses are unreliable. Otherwise, why have an nfs_client_id4 string to begin with? And how could a multi-homed client ever word? Maybe I don't understand what you mean.

But, anyway, if the clients are all using the same nfs_client_id4 string, that's going to cause no end of trouble, since the boot verifier for each of these clients is bound to be different. When the server sees a boot verifier change, it will just drop all the client's state. Each client's SETCLIENTID will trash the state of anything that came before attached to that nfs_client_id4. That will result in the clients all constantly trying to recover state. I suppose the server could watch for a boot verifier replay (cel ducks)

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-19 18:39:59

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
>
> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
>
> > On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
> >>
> >> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
> >>> Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in the
> >>> case where we get setclientid's with the same client-provided id.
> >>> There'd be no change of behavior in the case of multiple clients sharing
> >>> an IP (which is fine, of course).
> >>
> >> The migration draft proposes that clients use the same nfs_client_id4 string for all of a server's IP addresses. Would a server then be obliged to return NFS4ERR_CLID_IN_USE if a client attempts a SETCLIENTID with the same boot verifier and nfs_client_id4 on more than one IP address for the same server?
> >
> > That's also not this case, sorry, this time with all the conditions:
> >
> > - if the nfs_client_id4 is the same, and
> > - if the flavor is auth_sys, and
> > - if the client IP address is different,
> > - then return NFS4ERR_INUSE.
>
> This still breaks for multi-homed servers and UCS clients. The client IP address can be different depending on what server IP address the client is accessing, but all the other parameters are the same.

OK. So probably there's nothing we can do to help here.

As a bandaid maybe a rate-limited log message ("clientid X now in use
from IP Y") might help debug these things....

--b.

2012-03-20 13:29:58

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On 03/19/2012 06:25 PM, Rick Macklem wrote:
> Nikolaus Rath wrote:
>> On 03/19/2012 02:39 PM, J. Bruce Fields wrote:
>>> On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
>>>>
>>>> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
>>>>
>>>>> On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
>>>>>>
>>>>>> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>>>>>>> Well, sure, but all I'm proposing here is returning
>>>>>>> NFS4ERR_INUSE in the
>>>>>>> case where we get setclientid's with the same client-provided
>>>>>>> id.
>>>>>>> There'd be no change of behavior in the case of multiple clients
>>>>>>> sharing
>>>>>>> an IP (which is fine, of course).
>>>>>>
>>>>>> The migration draft proposes that clients use the same
>>>>>> nfs_client_id4 string for all of a server's IP addresses. Would a
>>>>>> server then be obliged to return NFS4ERR_CLID_IN_USE if a client
>>>>>> attempts a SETCLIENTID with the same boot verifier and
>>>>>> nfs_client_id4 on more than one IP address for the same server?
>>>>>
>>>>> That's also not this case, sorry, this time with all the
>>>>> conditions:
>>>>>
>>>>> - if the nfs_client_id4 is the same, and
>>>>> - if the flavor is auth_sys, and
>>>>> - if the client IP address is different,
>>>>> - then return NFS4ERR_INUSE.
>>>>
>>>> This still breaks for multi-homed servers and UCS clients. The
>>>> client IP address can be different depending on what server IP
>>>> address the client is accessing, but all the other parameters are
>>>> the same.
>>>
>>> OK. So probably there's nothing we can do to help here.
>>>
>>> As a bandaid maybe a rate-limited log message ("clientid X now in
>>> use
>>> from IP Y") might help debug these things....
>>
>> Since you guys keep Cc'ing me, I'll chime in with a rather naive
>> suggestion: if all that's required is a unique id for every client,
>> why
>> not use the MAC of the first network interface, independent of it
>> being
>> used for communication with the server?
>>
> I think this works fairly well for "real hardware", but I'm not so sure
> about clients running in VMs. (I don't really know how the VMs assign
> MAC addresses to their fake net interfaces and what uniqueness guarantees
> those have. I remember the old freebie VMware client for Windows just
> had a config file that assigned the MAC. I bet half the installations
> on the planet had the same MAC as the default config file:-)

But as I understand, the clientid doesn't have to globally unique, just
unique for the given NFS server. I think if you have two virtual
machines with the same MAC connecting to the same NFS server, you have
different problems anyway.


> ps: Also, although it's not very relevant, getting the MAC address of
> the first ethernet interface isn't easy in FreeBSD. I have no idea
> if the same is true of Linux. (I'd also be worried that "first"
> might not be fixed?)

It doesn't need to be the same interface all the time, I just meant the
first as in not a specific one.


Best,


-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-12 21:54:32

by Nikolaus Rath

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On 03/12/2012 05:46 PM, Chuck Lever wrote:
>> Oh, so the clientaddr detection takes place only once at the beginning,
>> and is not repeated if the mount attempts are repeated?
>>
>> This would explain what's happening. The VPN is not yet up when the
>> system first attempts to mount the share.
>>
>> Is there a rationale behind this? It seems to me that if the mount is
>> retried, it would be reasonable to expect that the clientaddr detection
>> is retried as well.
>
> There's no reason I can recall requiring that this is done only once, other than it's the simplest implementation for mount.nfs. Historically, NFS is deployed on systems with static network configurations that are supposed to be set up before the NFS utilities come into play. As Bruce suggested, perhaps this design assumption needs to be revisited.
>
> I suppose I should file a bug.

Thanks! Let me know if you need someone for testing.


Best,

-Nikolaus

--
?Time flies like an arrow, fruit flies like a Banana.?

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-19 18:54:23

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 19, 2012 at 02:42:30PM -0400, Chuck Lever wrote:
>
> On Mar 19, 2012, at 2:39 PM, J. Bruce Fields wrote:
>
> > On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
> >>
> >> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
> >>> That's also not this case, sorry, this time with all the conditions:
> >>>
> >>> - if the nfs_client_id4 is the same, and
> >>> - if the flavor is auth_sys, and
> >>> - if the client IP address is different,
> >>> - then return NFS4ERR_INUSE.
> >>
> >> This still breaks for multi-homed servers and UCS clients. The client IP address can be different depending on what server IP address the client is accessing, but all the other parameters are the same.
> >
> > OK. So probably there's nothing we can do to help here.
> >
> > As a bandaid maybe a rate-limited log message ("clientid X now in use
> > from IP Y") might help debug these things....
>
> Hm, OK. That implies your server implementation assumes that a clientid4 maps to exactly one client IP address at a time.

OK, agreed. So how about something like "state for client X previously
established from IP Y now cleared from IP Z" ??

(Assuming it's only the I-just-rebooted setclientid case that's likely
to be the sign of a problem.)

--b.

2012-03-12 21:47:18

by Chuck Lever

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients


On Mar 12, 2012, at 5:38 PM, Nikolaus Rath wrote:

> On 03/12/2012 05:27 PM, Chuck Lever wrote:
>>
>> On Mar 12, 2012, at 5:24 PM, Nikolaus Rath wrote:
>>
>>> Alright, it seems that this was the problem. With correct clientaddr, I
>>> haven't been able to produce any freezes for the last 15 minutes
>>> (usually it happens in ~20 seconds).
>>>
>>> The weird thing is that I cannot reproduce the wrong clientaddr
>>> autodetection when I mount the NFS volumes from the command line. It
>>> seems to happen only when the mounting is done by mountall during the
>>> boot sequence.
>>>
>>> In other words, this fstab entry results in freezes and a clientaddr of 0.0.0.0:
>>>
>>> spitzer:/opt /opt nfs4 bg 0 0
>>>
>>> While this one, followed by a "mount /opt" on the console as soon as I'm
>>> able to log in, works just fine (and has a correct clientaddr):
>>>
>>> spitzer:/opt /opt nfs4 noauto 0 0
>>
>> That's almost certainly because networking isn't up during boot. The "bg" option keeps trying the mount until it succeeds, but the system started the mount process before there was a source address on the system.
>
> Oh, so the clientaddr detection takes place only once at the beginning,
> and is not repeated if the mount attempts are repeated?
>
> This would explain what's happening. The VPN is not yet up when the
> system first attempts to mount the share.
>
> Is there a rationale behind this? It seems to me that if the mount is
> retried, it would be reasonable to expect that the clientaddr detection
> is retried as well.

There's no reason I can recall requiring that this is done only once, other than it's the simplest implementation for mount.nfs. Historically, NFS is deployed on systems with static network configurations that are supposed to be set up before the NFS utilities come into play. As Bruce suggested, perhaps this design assumption needs to be revisited.

I suppose I should file a bug.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-12 21:24:10

by Nikolaus Rath

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

"J. Bruce Fields" <[email protected]> writes:
> On Mon, Mar 12, 2012 at 03:45:05PM -0400, Nikolaus Rath wrote:
>> On 03/12/2012 03:31 PM, J. Bruce Fields wrote:
>> > On Mon, Mar 12, 2012 at 12:20:17PM -0400, Nikolaus Rath wrote:
>> >> Nikolaus Rath <[email protected]> writes:
>> >>> The problem is that as soon as more than three clients are accessing the
>> >>> NFS shares, any operations on the NFS mountpoints by the clients hang.
>> >>> At the same time, CPU usage of the VPN processes becomes very high. If I
>> >>> run the VPN in debug mode, all I can see is that it is busy forwarding
>> >>> lots of packets. I also ran a packet sniffer which showed me that 90% of
>> >>> the packets were NFS related, but I am not familiar enough with NFS to
>> >>> be able to tell anything from the packets themselves. I can provide an
>> >>> example of the dump if that helps.
>> >>
>> >> I have put a screenshot of the dump on
>> >> http://www.rath.org/res/wireshark.png (the full dump is 18 MB, and I'm
>> >> not sure which parts are important).
>> >
>> > Looks like they're doing SETCLIENTID, SETCLIENTID_CONFIRM, OPEN,
>> > OPEN_CONFIRM repeatedly.
>> >
>> >> Any suggestions how I could further debug this?
>> >
>> > Could the clients be stepping on each others' state if they all think
>> > they have the same IP address (because of something to do with the VPN
>> > networking?)
>>
>> That sounds like promising path of investigation. What determines the IP
>> of a client as far as NFS is concerned?
>
> I don't remember where it gets the ip it uses to construct clientid's
> from.... But there is a mount option (clientaddr=) that will let you
> change what it uses. So it *might* be worth checking whether using a
> clientaddr= option on each client (giving it a different ipaddr on each
> client) would change the behavior.

Alright, it seems that this was the problem. With correct clientaddr, I
haven't been able to produce any freezes for the last 15 minutes
(usually it happens in ~20 seconds).

The weird thing is that I cannot reproduce the wrong clientaddr
autodetection when I mount the NFS volumes from the command line. It
seems to happen only when the mounting is done by mountall during the
boot sequence.

In other words, this fstab entry results in freezes and a clientaddr of 0.0.0.0:

spitzer:/opt /opt nfs4 bg 0 0

While this one, followed by a "mount /opt" on the console as soon as I'm
able to log in, works just fine (and has a correct cliendaddr):

spitzer:/opt /opt nfs4 noauto 0 0


I'd be happy to help debugging the failing autodetection, but apparently
it's not going to be as simple as "strace mount /opt".

Are there any Ubuntu exports here? I tried debugging mountall once, and
it was a very painful experience. I can't just strace it, because when
it's called there is no writable file system to write the logs into...


Thanks a lot for your help Bruce!


Best,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-20 13:55:54

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

T24gVHVlLCAyMDEyLTAzLTIwIGF0IDA5OjI5IC0wNDAwLCBOaWtvbGF1cyBSYXRoIHdyb3RlOg0K
PiBPbiAwMy8xOS8yMDEyIDA2OjI1IFBNLCBSaWNrIE1hY2tsZW0gd3JvdGU6DQo+ID4gTmlrb2xh
dXMgUmF0aCB3cm90ZToNCj4gPiBwczogQWxzbywgYWx0aG91Z2ggaXQncyBub3QgdmVyeSByZWxl
dmFudCwgZ2V0dGluZyB0aGUgTUFDIGFkZHJlc3Mgb2YNCj4gPiAgICAgdGhlIGZpcnN0IGV0aGVy
bmV0IGludGVyZmFjZSBpc24ndCBlYXN5IGluIEZyZWVCU0QuIEkgaGF2ZSBubyBpZGVhDQo+ID4g
ICAgIGlmIHRoZSBzYW1lIGlzIHRydWUgb2YgTGludXguIChJJ2QgYWxzbyBiZSB3b3JyaWVkIHRo
YXQgImZpcnN0Ig0KPiA+ICAgICBtaWdodCBub3QgYmUgZml4ZWQ/KQ0KPiANCj4gSXQgZG9lc24n
dCBuZWVkIHRvIGJlIHRoZSBzYW1lIGludGVyZmFjZSBhbGwgdGhlIHRpbWUsIEkganVzdCBtZWFu
dCB0aGUNCj4gZmlyc3QgYXMgaW4gbm90IGEgc3BlY2lmaWMgb25lLg0KDQpZZXMgaXQgZG9lcyBo
YXZlIHRvIGJlIHRoZSBzYW1lIGludGVyZmFjZSBhbGwgdGhlIHRpbWUuIE90aGVyd2lzZSB0aGUN
CnNlcnZlciBjYW5ub3QgdGVsbCB0aGF0IHRoaXMgaXMgdGhlIHNhbWUgY2xpZW50IGJvb3Rpbmcg
dXAgYWdhaW4uDQoNCi4uLmFuZCB5ZXMsIHRoaXMgaXMgYSBwcm9ibGVtIG5vdCBvbmx5IG9uIEZy
ZWVCU0QsIGJ1dCBmb3IgTGludXggdG9vLg0KVGhlIGZhY3QgdGhhdCBkZXZpY2VzIGNhbiBjb21l
IHVwIChhbmQgZ28gZG93biB0b28gaW4gdGhlIGNhc2Ugb2YNCmhvdHBsdWdnaW5nKSBpbiBhbnkg
b3JkZXIgaXMgdGhlIG1haW4gcmVhc29uIHdoeSB0aGUgZXRoZXJuZXQgZGV2aWNlDQpuYW1pbmcg
c2NoZW1lIHdhcyBjaGFuZ2VkIHJlY2VudGx5Lg0KDQpUcm9uZA0KLS0gDQpUcm9uZCBNeWtsZWJ1
c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVz
dEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo=

2012-03-13 13:23:47

by Nikolaus Rath

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

"Myklebust, Trond" <[email protected]> writes:
>> > Oh, so the clientaddr detection takes place only once at the beginning,
>> > and is not repeated if the mount attempts are repeated?
>> >
>> > This would explain what's happening. The VPN is not yet up when the
>> > system first attempts to mount the share.
>> >
>> > Is there a rationale behind this? It seems to me that if the mount is
>> > retried, it would be reasonable to expect that the clientaddr detection
>> > is retried as well.
>>
>> There's no reason I can recall requiring that this is done only once, other than it's the simplest implementation for mount.nfs. Historically, NFS is deployed on systems with static network configurations that are supposed to be set up before the NFS utilities come into play. As Bruce suggested, perhaps this design assumption needs to be revisited.
>>
>> I suppose I should file a bug.
>
> Consider the whole 'bg' option to be a bug at this point.
>
> Now that we have working autofs support for direct mounts, there is no
> reason to keep the 'bg' mount option on life support any more.

I'm not sure that this would be a solution. What happens if some
program accesses the autofs mountpoint before the VPN is up? Wouldn't
autofs try to mount the NFS share right away and thus with a broken
clientaddr?


Best,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-12 21:27:11

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
> IMO, the server should do a comparison of the nfs_client_id4 strings,
> and nothing else.

We're supposed to return CLID_INUSE when we see a setclientid from a
"different" client using the same string, to keep clients from doing
mischief with other clients' state (either maliciously or, as in this
case, accidentally).

"Different" here is defined as "not having the same principal". I know
what that means in the krb5 case, but I'm less certain in the auth_sys
case.

> The client IP addresses are unreliable. Otherwise,
> why have an nfs_client_id4 string to begin with? And how could a
> multi-homed client ever word?

I don't know. Is it expected that such clients would do setclientid's
over different interfaces and expect it to work?

(I'm trying to remember now how we identify clients for the purposes of
NSM. In the auth_sys case maybe the goal should be to keep things
working more or less as they did with auth_sys under v2/v3.)

> Maybe I don't understand what you mean.
>
> But, anyway, if the clients are all using the same nfs_client_id4
> string, that's going to cause no end of trouble, since the boot
> verifier for each of these clients is bound to be different. When the
> server sees a boot verifier change, it will just drop all the client's
> state. Each client's SETCLIENTID will trash the state of anything
> that came before attached to that nfs_client_id4. That will result in
> the clients all constantly trying to recover state.

Yes, looks like something like that is happening.

This is probably a case of a slightly exotic (and possibly broken in
some sense) client network setup--but those may turn out to be more
common than we'd like.

--b.

> I suppose the
> server could watch for a boot verifier replay (cel ducks)

2012-03-19 18:27:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
>
> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>
> > On Mon, Mar 19, 2012 at 01:06:47PM -0400, Rick Macklem wrote:
> >> I wrote:
> >>> J. Bruce Fields wrote:
> >>>> On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
> >>>>> On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
> >>>>>> IMO, the server should do a comparison of the nfs_client_id4
> >>>>>> strings,
> >>>>>> and nothing else.
> >>>>>
> >>>>> We're supposed to return CLID_INUSE when we see a setclientid from
> >>>>> a
> >>>>> "different" client using the same string, to keep clients from
> >>>>> doing
> >>>>> mischief with other clients' state (either maliciously or, as in
> >>>>> this
> >>>>> case, accidentally).
> >>>>>
> >>>>> "Different" here is defined as "not having the same principal". I
> >>>>> know
> >>>>> what that means in the krb5 case, but I'm less certain in the
> >>>>> auth_sys
> >>>>> case.
> >>>>
> >>>> Cc'ing the ietf list. Is it reasonable for a server to expect
> >>>> setclientid's to come from the same client IP address at least in
> >>>> the
> >>>> auth_sys case, or could that break multi-homed clients?
> >>>>
> >>> I think that even a dhcp lease renewal might result in a different
> >>> client
> >>> IP, if the client has been partitioned from the dhcp server for a
> >>> while.
> >
> > Yeah, but by that point the client's v4 lease is probably expired anyway
> > so the client's not likely to be bothered by the NFS4ERR_INUSE.
> >
> >>> I'm not convinced that different client IP# implies different client.
> >>> (Even "same ip# implies same client" might not be true, if the dhcp
> >>> server assigned the IP# to another machine while the client was
> >>> partitioned
> >>> from the dhcp server, I think? I haven't looked at current dhcp
> >>> implementations, but it seems conceivable to me.)
> >>>
> >> Oh, and what about the case of 2 clients that are sitting behind
> >> the same NAT gateway? (I think they'd both be seen as having the
> >> client host ip# of the gateway, but with different TCP connections
> >> on different client port#s.)
> >
> > Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in the
> > case where we get setclientid's with the same client-provided id.
> > There'd be no change of behavior in the case of multiple clients sharing
> > an IP (which is fine, of course).
>
> The migration draft proposes that clients use the same nfs_client_id4 string for all of a server's IP addresses. Would a server then be obliged to return NFS4ERR_CLID_IN_USE if a client attempts a SETCLIENTID with the same boot verifier and nfs_client_id4 on more than one IP address for the same server?

That's also not this case, sorry, this time with all the conditions:

- if the nfs_client_id4 is the same, and
- if the flavor is auth_sys, and
- if the client IP address is different,
- then return NFS4ERR_INUSE.

--b.

2012-03-19 17:06:48

by Rick Macklem

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

I wrote:
> J. Bruce Fields wrote:
> > On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
> > > On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
> > > > IMO, the server should do a comparison of the nfs_client_id4
> > > > strings,
> > > > and nothing else.
> > >
> > > We're supposed to return CLID_INUSE when we see a setclientid from
> > > a
> > > "different" client using the same string, to keep clients from
> > > doing
> > > mischief with other clients' state (either maliciously or, as in
> > > this
> > > case, accidentally).
> > >
> > > "Different" here is defined as "not having the same principal". I
> > > know
> > > what that means in the krb5 case, but I'm less certain in the
> > > auth_sys
> > > case.
> >
> > Cc'ing the ietf list. Is it reasonable for a server to expect
> > setclientid's to come from the same client IP address at least in
> > the
> > auth_sys case, or could that break multi-homed clients?
> >
> I think that even a dhcp lease renewal might result in a different
> client
> IP, if the client has been partitioned from the dhcp server for a
> while.
>
> I'm not convinced that different client IP# implies different client.
> (Even "same ip# implies same client" might not be true, if the dhcp
> server assigned the IP# to another machine while the client was
> partitioned
> from the dhcp server, I think? I haven't looked at current dhcp
> implementations, but it seems conceivable to me.)
>
Oh, and what about the case of 2 clients that are sitting behind
the same NAT gateway? (I think they'd both be seen as having the
client host ip# of the gateway, but with different TCP connections
on different client port#s.)

> For AUTH_SYS, all the FreeBSD server does is expect the same uid#.
>
> rick
> > At least in the auth_sys case IP addresses are one of the only
> > things
> > we
> > have left to go on when the client's identifier-generation is messed
> > up
> > (not that difficult).
> >
> > --b.
> > _______________________________________________
> > nfsv4 mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/nfsv4
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4

2012-03-13 14:50:47

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

T24gVHVlLCAyMDEyLTAzLTEzIGF0IDA5OjIzIC0wNDAwLCBOaWtvbGF1cyBSYXRoIHdyb3RlOg0K
PiAiTXlrbGVidXN0LCBUcm9uZCIgPFRyb25kLk15a2xlYnVzdC1IZ092UXVCRUVnVFFUMGRaUitB
bGZBQHB1YmxpYy5nbWFuZS5vcmc+IHdyaXRlczoNCj4gPj4gPiBPaCwgc28gdGhlIGNsaWVudGFk
ZHIgZGV0ZWN0aW9uIHRha2VzIHBsYWNlIG9ubHkgb25jZSBhdCB0aGUgYmVnaW5uaW5nLA0KPiA+
PiA+IGFuZCBpcyBub3QgcmVwZWF0ZWQgaWYgdGhlIG1vdW50IGF0dGVtcHRzIGFyZSByZXBlYXRl
ZD8NCj4gPj4gPiANCj4gPj4gPiBUaGlzIHdvdWxkIGV4cGxhaW4gd2hhdCdzIGhhcHBlbmluZy4g
VGhlIFZQTiBpcyBub3QgeWV0IHVwIHdoZW4gdGhlDQo+ID4+ID4gc3lzdGVtIGZpcnN0IGF0dGVt
cHRzIHRvIG1vdW50IHRoZSBzaGFyZS4NCj4gPj4gPiANCj4gPj4gPiBJcyB0aGVyZSBhIHJhdGlv
bmFsZSBiZWhpbmQgdGhpcz8gSXQgc2VlbXMgdG8gbWUgdGhhdCBpZiB0aGUgbW91bnQgaXMNCj4g
Pj4gPiByZXRyaWVkLCBpdCB3b3VsZCBiZSByZWFzb25hYmxlIHRvIGV4cGVjdCB0aGF0IHRoZSBj
bGllbnRhZGRyIGRldGVjdGlvbg0KPiA+PiA+IGlzIHJldHJpZWQgYXMgd2VsbC4NCj4gPj4gDQo+
ID4+IFRoZXJlJ3Mgbm8gcmVhc29uIEkgY2FuIHJlY2FsbCByZXF1aXJpbmcgdGhhdCB0aGlzIGlz
IGRvbmUgb25seSBvbmNlLCBvdGhlciB0aGFuIGl0J3MgdGhlIHNpbXBsZXN0IGltcGxlbWVudGF0
aW9uIGZvciBtb3VudC5uZnMuICBIaXN0b3JpY2FsbHksIE5GUyBpcyBkZXBsb3llZCBvbiBzeXN0
ZW1zIHdpdGggc3RhdGljIG5ldHdvcmsgY29uZmlndXJhdGlvbnMgdGhhdCBhcmUgc3VwcG9zZWQg
dG8gYmUgc2V0IHVwIGJlZm9yZSB0aGUgTkZTIHV0aWxpdGllcyBjb21lIGludG8gcGxheS4gIEFz
IEJydWNlIHN1Z2dlc3RlZCwgcGVyaGFwcyB0aGlzIGRlc2lnbiBhc3N1bXB0aW9uIG5lZWRzIHRv
IGJlIHJldmlzaXRlZC4NCj4gPj4gDQo+ID4+IEkgc3VwcG9zZSBJIHNob3VsZCBmaWxlIGEgYnVn
Lg0KPiA+DQo+ID4gQ29uc2lkZXIgdGhlIHdob2xlICdiZycgb3B0aW9uIHRvIGJlIGEgYnVnIGF0
IHRoaXMgcG9pbnQuDQo+ID4NCj4gPiBOb3cgdGhhdCB3ZSBoYXZlIHdvcmtpbmcgYXV0b2ZzIHN1
cHBvcnQgZm9yIGRpcmVjdCBtb3VudHMsIHRoZXJlIGlzIG5vDQo+ID4gcmVhc29uIHRvIGtlZXAg
dGhlICdiZycgbW91bnQgb3B0aW9uIG9uIGxpZmUgc3VwcG9ydCBhbnkgbW9yZS4NCj4gDQo+IEkn
bSBub3Qgc3VyZSB0aGF0IHRoaXMgd291bGQgYmUgYSBzb2x1dGlvbi4gV2hhdCBoYXBwZW5zIGlm
IHNvbWUNCj4gcHJvZ3JhbSBhY2Nlc3NlcyB0aGUgYXV0b2ZzIG1vdW50cG9pbnQgYmVmb3JlIHRo
ZSBWUE4gaXMgdXA/IFdvdWxkbid0DQo+IGF1dG9mcyB0cnkgdG8gbW91bnQgdGhlIE5GUyBzaGFy
ZSByaWdodCBhd2F5IGFuZCB0aHVzIHdpdGggYSBicm9rZW4NCj4gY2xpZW50YWRkcj8NCg0KVW5s
aWtlIHRoZSAnYmcnIGNhc2UsIHRoZSBtb3VudCBzaG91bGQgZmFpbCBpbW1lZGlhdGVseSwgYW5k
IHlvdXINCnByb2dyYW0gZ2V0cyBhbiBFTk9FTlQgZXJyb3IuIFRoaXMgaXMgZXhhY3RseSB3aGF0
IHNob3VsZCBoYXBwZW4gaWYgYQ0KcHJvZ3JhbSB0cmllcyB0byBhY2Nlc3MgYSBmaWxlc3lzdGVt
IHRoYXQgaXNuJ3QgYXZhaWxhYmxlLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZT
IGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20N
Cnd3dy5uZXRhcHAuY29tDQoNCg==

2012-03-20 13:59:26

by Chuck Lever

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients



Sent from my iPad

On Mar 20, 2012, at 9:29 AM, Nikolaus Rath <[email protected]> wrote:

> On 03/19/2012 06:25 PM, Rick Macklem wrote:
>> Nikolaus Rath wrote:
>>> On 03/19/2012 02:39 PM, J. Bruce Fields wrote:
>>>> On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
>>>>>
>>>>> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
>>>>>
>>>>>> On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
>>>>>>>
>>>>>>> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>>>>>>>> Well, sure, but all I'm proposing here is returning
>>>>>>>> NFS4ERR_INUSE in the
>>>>>>>> case where we get setclientid's with the same client-provided
>>>>>>>> id.
>>>>>>>> There'd be no change of behavior in the case of multiple clients
>>>>>>>> sharing
>>>>>>>> an IP (which is fine, of course).
>>>>>>>
>>>>>>> The migration draft proposes that clients use the same
>>>>>>> nfs_client_id4 string for all of a server's IP addresses. Would a
>>>>>>> server then be obliged to return NFS4ERR_CLID_IN_USE if a client
>>>>>>> attempts a SETCLIENTID with the same boot verifier and
>>>>>>> nfs_client_id4 on more than one IP address for the same server?
>>>>>>
>>>>>> That's also not this case, sorry, this time with all the
>>>>>> conditions:
>>>>>>
>>>>>> - if the nfs_client_id4 is the same, and
>>>>>> - if the flavor is auth_sys, and
>>>>>> - if the client IP address is different,
>>>>>> - then return NFS4ERR_INUSE.
>>>>>
>>>>> This still breaks for multi-homed servers and UCS clients. The
>>>>> client IP address can be different depending on what server IP
>>>>> address the client is accessing, but all the other parameters are
>>>>> the same.
>>>>
>>>> OK. So probably there's nothing we can do to help here.
>>>>
>>>> As a bandaid maybe a rate-limited log message ("clientid X now in
>>>> use
>>>> from IP Y") might help debug these things....
>>>
>>> Since you guys keep Cc'ing me, I'll chime in with a rather naive
>>> suggestion: if all that's required is a unique id for every client,
>>> why
>>> not use the MAC of the first network interface, independent of it
>>> being
>>> used for communication with the server?
>>>
>> I think this works fairly well for "real hardware", but I'm not so sure
>> about clients running in VMs. (I don't really know how the VMs assign
>> MAC addresses to their fake net interfaces and what uniqueness guarantees
>> those have. I remember the old freebie VMware client for Windows just
>> had a config file that assigned the MAC. I bet half the installations
>> on the planet had the same MAC as the default config file:-)
>
> But as I understand, the clientid doesn't have to globally unique, just
> unique for the given NFS server.

Global uniqueness is required to support Transparent State Migration.

> I think if you have two virtual
> machines with the same MAC connecting to the same NFS server, you have
> different problems anyway.
>
>
>> ps: Also, although it's not very relevant, getting the MAC address of
>> the first ethernet interface isn't easy in FreeBSD. I have no idea
>> if the same is true of Linux. (I'd also be worried that "first"
>> might not be fixed?)
>
> It doesn't need to be the same interface all the time, I just meant the
> first as in not a specific one.

The nfs_client_id4 string must not change across a reboot. Otherwise the server won't be able to figure out which open and lock state to discard when the client reboots.

>
>
> Best,
>
>
> -Nikolaus
>
> --
> »Time flies like an arrow, fruit flies like a Banana.«
>
> PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-03-12 21:04:17

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 12, 2012 at 04:49:29PM -0400, Chuck Lever wrote:
>
> On Mar 12, 2012, at 4:42 PM, J. Bruce Fields wrote:
>
> > On Mon, Mar 12, 2012 at 04:30:42PM -0400, Nikolaus Rath wrote:
> >> On 03/12/2012 04:15 PM, J. Bruce Fields wrote:
> >>> Looking at the packet details, under the client id field, the clients
> >>> are all using:
> >>>
> >>> "0.0.0.0/192.168.1.2 tcp UNIX 0"
> >>
> >> Hmm. 192.168.1.2 is the server's address on the VPN. Is that supposed to
> >> be there?
> >
> > Yes,and the first ip is usually the ip of the client, which does suggest
> > the client is guessing it's ip wrong; so the "clientaddr=" option will
> > likely help.
>
> I thought 0.0.0.0 was a legal callback address, and means "don't send me CB requests".

Yes, that part's fine, it's using it in the clientid that gets us into
trouble here....

> But if all the clients are using the same nfs_client_id4 string, then no, the server can't distinguish between them, and they will tromp on each other's state.

Yeah.

>
> The question is why can't the clients tell what their own IP address is? mount.nfs is supposed to figure that out automatically. Could be a bug in mount.nfs.

You know that code better than me.... Looks like it does basically
gethostbyname(gethostname()) ?

An strace -f of the mount from Nikolaus might help explain what happened
here.

> > Hm, perhaps the server should be rejecting these SETCLIENTID's with
> > INUSE. It used to do that, and the client would likely recover from
> > that more easily.
>
> INUSE means the client is using multiple authentication flavors when performing RENEW or SETCLIENTID. I can't think of a reason the server should reject these; it's not supposed to look at the contents of the nfs_client_id4 string.

Well, from the trace the requests do appear (from the server's point of
view) to be coming from different IP addresses. We used to use that
fact to return INUSE in this sort of case, which I think would trigger
the client to increment its uniqufier and work around the problem.

In the commit where I changed that I said:

The spec allows clients to change ip address, so we shouldn't be
requiring that setclientid always come from the same address.
For example, a client could reboot and get a new dhcpd address,
but still present the same clientid to the server. In that case
the server should revoke the client's previous state and allow
it to continue, instead of (as it currently does) returning a
CLID_INUSE error.

But maybe I should have applied that reasoning only in the krb5 case--in
the auth_sys case maybe the client ip address is really the only thing
we have to distinguish two clients.

--b.

2012-03-19 18:43:03

by Chuck Lever

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients


On Mar 19, 2012, at 2:39 PM, J. Bruce Fields wrote:

> On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
>>
>> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
>>
>>> On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
>>>>
>>>> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>>>>> Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in the
>>>>> case where we get setclientid's with the same client-provided id.
>>>>> There'd be no change of behavior in the case of multiple clients sharing
>>>>> an IP (which is fine, of course).
>>>>
>>>> The migration draft proposes that clients use the same nfs_client_id4 string for all of a server's IP addresses. Would a server then be obliged to return NFS4ERR_CLID_IN_USE if a client attempts a SETCLIENTID with the same boot verifier and nfs_client_id4 on more than one IP address for the same server?
>>>
>>> That's also not this case, sorry, this time with all the conditions:
>>>
>>> - if the nfs_client_id4 is the same, and
>>> - if the flavor is auth_sys, and
>>> - if the client IP address is different,
>>> - then return NFS4ERR_INUSE.
>>
>> This still breaks for multi-homed servers and UCS clients. The client IP address can be different depending on what server IP address the client is accessing, but all the other parameters are the same.
>
> OK. So probably there's nothing we can do to help here.
>
> As a bandaid maybe a rate-limited log message ("clientid X now in use
> from IP Y") might help debug these things....

Hm, OK. That implies your server implementation assumes that a clientid4 maps to exactly one client IP address at a time.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-19 18:26:24

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

T24gTW9uLCAyMDEyLTAzLTE5IGF0IDEyOjI4IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIE1vbiwgTWFyIDEyLCAyMDEyIGF0IDA1OjI3OjA4UE0gLTA0MDAsIEouIEJydWNlIEZp
ZWxkcyB3cm90ZToNCj4gPiBPbiBNb24sIE1hciAxMiwgMjAxMiBhdCAwNToxNDoxNlBNIC0wNDAw
LCBDaHVjayBMZXZlciB3cm90ZToNCj4gPiA+IElNTywgdGhlIHNlcnZlciBzaG91bGQgZG8gYSBj
b21wYXJpc29uIG9mIHRoZSBuZnNfY2xpZW50X2lkNCBzdHJpbmdzLA0KPiA+ID4gYW5kIG5vdGhp
bmcgZWxzZS4NCj4gPiANCj4gPiBXZSdyZSBzdXBwb3NlZCB0byByZXR1cm4gQ0xJRF9JTlVTRSB3
aGVuIHdlIHNlZSBhIHNldGNsaWVudGlkIGZyb20gYQ0KPiA+ICJkaWZmZXJlbnQiIGNsaWVudCB1
c2luZyB0aGUgc2FtZSBzdHJpbmcsIHRvIGtlZXAgY2xpZW50cyBmcm9tIGRvaW5nDQo+ID4gbWlz
Y2hpZWYgd2l0aCBvdGhlciBjbGllbnRzJyBzdGF0ZSAoZWl0aGVyIG1hbGljaW91c2x5IG9yLCBh
cyBpbiB0aGlzDQo+ID4gY2FzZSwgYWNjaWRlbnRhbGx5KS4NCj4gPiANCj4gPiAiRGlmZmVyZW50
IiBoZXJlIGlzIGRlZmluZWQgYXMgIm5vdCBoYXZpbmcgdGhlIHNhbWUgcHJpbmNpcGFsIi4gIEkg
a25vdw0KPiA+IHdoYXQgdGhhdCBtZWFucyBpbiB0aGUga3JiNSBjYXNlLCBidXQgSSdtIGxlc3Mg
Y2VydGFpbiBpbiB0aGUgYXV0aF9zeXMNCj4gPiBjYXNlLg0KPiANCj4gQ2MnaW5nIHRoZSBpZXRm
IGxpc3QuICBJcyBpdCByZWFzb25hYmxlIGZvciBhIHNlcnZlciB0byBleHBlY3QNCj4gc2V0Y2xp
ZW50aWQncyB0byBjb21lIGZyb20gdGhlIHNhbWUgY2xpZW50IElQIGFkZHJlc3MgYXQgbGVhc3Qg
aW4gdGhlDQo+IGF1dGhfc3lzIGNhc2UsIG9yIGNvdWxkIHRoYXQgYnJlYWsgbXVsdGktaG9tZWQg
Y2xpZW50cz8NCj4gDQo+IEF0IGxlYXN0IGluIHRoZSBhdXRoX3N5cyBjYXNlIElQIGFkZHJlc3Nl
cyBhcmUgb25lIG9mIHRoZSBvbmx5IHRoaW5ncyB3ZQ0KPiBoYXZlIGxlZnQgdG8gZ28gb24gd2hl
biB0aGUgY2xpZW50J3MgaWRlbnRpZmllci1nZW5lcmF0aW9uIGlzIG1lc3NlZCB1cA0KPiAobm90
IHRoYXQgZGlmZmljdWx0KS4NCg0KWWVzLCBidXQgSVAgYWRkcmVzc2VzIGNhbiBiZSByZWFzc2ln
bmVkIGR5bmFtaWNhbGx5LiBUaGF0J3Mgb25lIG9mIHRoZQ0KcmVhc29ucyBmb3Igd2FudGluZyBh
IGNsaWVudCBpZCBpbiB0aGUgZmlyc3QgcGxhY2UuLi4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QN
CkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBu
ZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo=

2012-03-20 16:49:37

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

T24gVHVlLCAyMDEyLTAzLTIwIGF0IDEwOjM2IC0wNDAwLCBOaWtvbGF1cyBSYXRoIHdyb3RlOg0K
PiBPbiAwMy8yMC8yMDEyIDA5OjU1IEFNLCBNeWtsZWJ1c3QsIFRyb25kIHdyb3RlOg0KPiA+IE9u
IFR1ZSwgMjAxMi0wMy0yMCBhdCAwOToyOSAtMDQwMCwgTmlrb2xhdXMgUmF0aCB3cm90ZToNCj4g
Pj4gT24gMDMvMTkvMjAxMiAwNjoyNSBQTSwgUmljayBNYWNrbGVtIHdyb3RlOg0KPiA+Pj4gTmlr
b2xhdXMgUmF0aCB3cm90ZToNCj4gPj4+IHBzOiBBbHNvLCBhbHRob3VnaCBpdCdzIG5vdCB2ZXJ5
IHJlbGV2YW50LCBnZXR0aW5nIHRoZSBNQUMgYWRkcmVzcyBvZg0KPiA+Pj4gICAgIHRoZSBmaXJz
dCBldGhlcm5ldCBpbnRlcmZhY2UgaXNuJ3QgZWFzeSBpbiBGcmVlQlNELiBJIGhhdmUgbm8gaWRl
YQ0KPiA+Pj4gICAgIGlmIHRoZSBzYW1lIGlzIHRydWUgb2YgTGludXguIChJJ2QgYWxzbyBiZSB3
b3JyaWVkIHRoYXQgImZpcnN0Ig0KPiA+Pj4gICAgIG1pZ2h0IG5vdCBiZSBmaXhlZD8pDQo+ID4+
DQo+ID4+IEl0IGRvZXNuJ3QgbmVlZCB0byBiZSB0aGUgc2FtZSBpbnRlcmZhY2UgYWxsIHRoZSB0
aW1lLCBJIGp1c3QgbWVhbnQgdGhlDQo+ID4+IGZpcnN0IGFzIGluIG5vdCBhIHNwZWNpZmljIG9u
ZS4NCj4gPiANCj4gPiBZZXMgaXQgZG9lcyBoYXZlIHRvIGJlIHRoZSBzYW1lIGludGVyZmFjZSBh
bGwgdGhlIHRpbWUuIE90aGVyd2lzZSB0aGUNCj4gPiBzZXJ2ZXIgY2Fubm90IHRlbGwgdGhhdCB0
aGlzIGlzIHRoZSBzYW1lIGNsaWVudCBib290aW5nIHVwIGFnYWluLg0KPiANCj4gVGhlIGxpa2Vs
aWhvb2Qgb2YgdGhlIGludGVyZmFjZSBvcmRlciBjaGFuZ2luZyBpcyBjZXJ0YWlubHkgbXVjaCBs
b3dlcg0KPiB0aGFuIHRoZSBsaWtlbGlob29kIG9mIHRoZSBJUCBhZGRyZXNzIGNoYW5naW5nICh3
aGljaCBJIHVuZGVyc3RhbmQgaXMNCj4gY3VycmVudGx5IHVzZWQgZm9yIHRoZSBjbGllbnRpZCks
IHNvIHRoZSBzaXR1YXRpb24gd291bGQgc3RpbGwgaW1wcm92ZS4NCg0KSXQgaXMgc3RpbGwgZWFz
eSB0byBmaW5kIGV4YW1wbGVzIG9mIHN5c3RlbXMgZm9yIHdoaWNoIHRoaXMgYnJlYWtzOg0KDQpX
aGF0IGRvIHlvdSBkbyBmb3IgYm94ZXMgdGhhdCBvbmx5IGhhdmUgYW4gaW5maW5pYmFuZCBjb25u
ZWN0aW9uLCBmb3INCmluc3RhbmNlPyBBRkFJSywgYSBsb3Qgb2YgSVBvSUIgaW1wbG1lbnRhdGlv
bnMgZG9uJ3QgaGF2ZSByZWJvb3Qtc2FmZQ0KTUFDIGFkZHJlc3Nlcy4NClRoZW4gdGhlcmUgaXMg
dGhlIGZhY3QgdGhhdCBzb21lIG9mIHRoZSBjaGVhcGVyIE5JQ3MgZG9uJ3QgaGF2ZQ0KcGVybWFu
ZW50IE1BQyBhZGRyZXNzZXMuIFRoZSBNQUMgYWRkcmVzcyB1c3VhbGx5IGdldHMgc2V0IGFzIHBh
cnQgb2YgdGhlDQpib290IHByb2Nlc3MuDQoNCkZpbmFsbHksIHRoZXJlIGlzIHRoZSBpc3N1ZSB0
aGF0IHRoZSBOSUMgaXMgYSBnbG9iYWxseSBzaGFyZWQgaGFyZHdhcmUNCmNvbXBvbmVudC4gSWYg
SSBhbSBydW5uaW5nIHZpcnR1YWwgbWFjaGluZXMgaW4gY29udGFpbmVyaXNlZA0KZW52aXJvbm1l
bnRzIChhbmQgeWVzLCBJIGFtIGFib3V0IHRvIG1lcmdlIHRoZSBORlMgc3VwcG9ydCBmb3INCmNv
bnRhaW5lcnMgaW50byBMaW51eC0zLjQpIHRoZW4gaGF2aW5nIHRoZSBORlMgY2xpZW50cyBpbiBh
bGwgdGhlDQpkaWZmZXJlbnQgY29udGFpbmVycyBpZGVudGlmeSB0aGVtc2VsdmVzIHVzaW5nIHRo
ZSBzYW1lIE1BQyBhZGRyZXNzIHdpbGwNCnRvIGxlYWQgdG8gc29tZSBzZXJpb3VzIGJyb2tlbm5l
c3MuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIN
Cg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K

2012-03-12 16:28:59

by Nikolaus Rath

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

Nikolaus Rath <[email protected]> writes:
> The problem is that as soon as more than three clients are accessing the
> NFS shares, any operations on the NFS mountpoints by the clients hang.
> At the same time, CPU usage of the VPN processes becomes very high. If I
> run the VPN in debug mode, all I can see is that it is busy forwarding
> lots of packets. I also ran a packet sniffer which showed me that 90% of
> the packets were NFS related, but I am not familiar enough with NFS to
> be able to tell anything from the packets themselves. I can provide an
> example of the dump if that helps.

I have put a screenshot of the dump on
http://www.rath.org/res/wireshark.png (the full dump is 18 MB, and I'm
not sure which parts are important).

Any suggestions how I could further debug this?

Best,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-12 21:54:46

by Chuck Lever

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients


On Mar 12, 2012, at 5:46 PM, Chuck Lever wrote:

>
> On Mar 12, 2012, at 5:38 PM, Nikolaus Rath wrote:
>
>> On 03/12/2012 05:27 PM, Chuck Lever wrote:
>>>
>>> On Mar 12, 2012, at 5:24 PM, Nikolaus Rath wrote:
>>>
>>>> Alright, it seems that this was the problem. With correct clientaddr, I
>>>> haven't been able to produce any freezes for the last 15 minutes
>>>> (usually it happens in ~20 seconds).
>>>>
>>>> The weird thing is that I cannot reproduce the wrong clientaddr
>>>> autodetection when I mount the NFS volumes from the command line. It
>>>> seems to happen only when the mounting is done by mountall during the
>>>> boot sequence.
>>>>
>>>> In other words, this fstab entry results in freezes and a clientaddr of 0.0.0.0:
>>>>
>>>> spitzer:/opt /opt nfs4 bg 0 0
>>>>
>>>> While this one, followed by a "mount /opt" on the console as soon as I'm
>>>> able to log in, works just fine (and has a correct clientaddr):
>>>>
>>>> spitzer:/opt /opt nfs4 noauto 0 0
>>>
>>> That's almost certainly because networking isn't up during boot. The "bg" option keeps trying the mount until it succeeds, but the system started the mount process before there was a source address on the system.
>>
>> Oh, so the clientaddr detection takes place only once at the beginning,
>> and is not repeated if the mount attempts are repeated?
>>
>> This would explain what's happening. The VPN is not yet up when the
>> system first attempts to mount the share.
>>
>> Is there a rationale behind this? It seems to me that if the mount is
>> retried, it would be reasonable to expect that the clientaddr detection
>> is retried as well.
>
> There's no reason I can recall requiring that this is done only once, other than it's the simplest implementation for mount.nfs. Historically, NFS is deployed on systems with static network configurations that are supposed to be set up before the NFS utilities come into play. As Bruce suggested, perhaps this design assumption needs to be revisited.
>
> I suppose I should file a bug.

See https://bugzilla.linux-nfs.org/show_bug.cgi?id=225

Nikolaus, since your clients' network environment is so volatile, you might consider using autofs for these mounts.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-20 14:38:03

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On 03/20/2012 10:01 AM, Chuck Lever wrote:
>
>
> Sent from my iPad
>
> On Mar 20, 2012, at 9:29 AM, Nikolaus Rath <[email protected]> wrote:
>
>> On 03/19/2012 06:25 PM, Rick Macklem wrote:
>>> Nikolaus Rath wrote:
>>>> On 03/19/2012 02:39 PM, J. Bruce Fields wrote:
>>>>> On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
>>>>>>
>>>>>> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
>>>>>>
>>>>>>> On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
>>>>>>>>
>>>>>>>> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>>>>>>>>> Well, sure, but all I'm proposing here is returning
>>>>>>>>> NFS4ERR_INUSE in the
>>>>>>>>> case where we get setclientid's with the same client-provided
>>>>>>>>> id.
>>>>>>>>> There'd be no change of behavior in the case of multiple clients
>>>>>>>>> sharing
>>>>>>>>> an IP (which is fine, of course).
>>>>>>>>
>>>>>>>> The migration draft proposes that clients use the same
>>>>>>>> nfs_client_id4 string for all of a server's IP addresses. Would a
>>>>>>>> server then be obliged to return NFS4ERR_CLID_IN_USE if a client
>>>>>>>> attempts a SETCLIENTID with the same boot verifier and
>>>>>>>> nfs_client_id4 on more than one IP address for the same server?
>>>>>>>
>>>>>>> That's also not this case, sorry, this time with all the
>>>>>>> conditions:
>>>>>>>
>>>>>>> - if the nfs_client_id4 is the same, and
>>>>>>> - if the flavor is auth_sys, and
>>>>>>> - if the client IP address is different,
>>>>>>> - then return NFS4ERR_INUSE.
>>>>>>
>>>>>> This still breaks for multi-homed servers and UCS clients. The
>>>>>> client IP address can be different depending on what server IP
>>>>>> address the client is accessing, but all the other parameters are
>>>>>> the same.
>>>>>
>>>>> OK. So probably there's nothing we can do to help here.
>>>>>
>>>>> As a bandaid maybe a rate-limited log message ("clientid X now in
>>>>> use
>>>>> from IP Y") might help debug these things....
>>>>
>>>> Since you guys keep Cc'ing me, I'll chime in with a rather naive
>>>> suggestion: if all that's required is a unique id for every client,
>>>> why
>>>> not use the MAC of the first network interface, independent of it
>>>> being
>>>> used for communication with the server?
>>>>
>>> I think this works fairly well for "real hardware", but I'm not so sure
>>> about clients running in VMs. (I don't really know how the VMs assign
>>> MAC addresses to their fake net interfaces and what uniqueness guarantees
>>> those have. I remember the old freebie VMware client for Windows just
>>> had a config file that assigned the MAC. I bet half the installations
>>> on the planet had the same MAC as the default config file:-)
>>
>> But as I understand, the clientid doesn't have to globally unique, just
>> unique for the given NFS server.
>
> Global uniqueness is required to support Transparent State Migration.
>
>> I think if you have two virtual
>> machines with the same MAC connecting to the same NFS server, you have
>> different problems anyway.
>>
>>
>>> ps: Also, although it's not very relevant, getting the MAC address of
>>> the first ethernet interface isn't easy in FreeBSD. I have no idea
>>> if the same is true of Linux. (I'd also be worried that "first"
>>> might not be fixed?)
>>
>> It doesn't need to be the same interface all the time, I just meant the
>> first as in not a specific one.
>
> The nfs_client_id4 string must not change across a reboot. Otherwise the server won't be able to figure out which open and lock state to discard when the client reboots.

But I thought that at the moment one of the client's ips is used for the
clientid? That's certainly changes regurlarly for DHCP or multi-homed
clients.


Best,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-20 15:54:07

by Chuck Lever

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients


On Mar 20, 2012, at 10:38 AM, Nikolaus Rath wrote:

> On 03/20/2012 10:01 AM, Chuck Lever wrote:
>>
>>
>> Sent from my iPad
>>
>> On Mar 20, 2012, at 9:29 AM, Nikolaus Rath <[email protected]> wrote:
>>
>>> On 03/19/2012 06:25 PM, Rick Macklem wrote:
>>>> Nikolaus Rath wrote:
>>>>> On 03/19/2012 02:39 PM, J. Bruce Fields wrote:
>>>>>> On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
>>>>>>>
>>>>>>> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
>>>>>>>
>>>>>>>> On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
>>>>>>>>>
>>>>>>>>> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>>>>>>>>>> Well, sure, but all I'm proposing here is returning
>>>>>>>>>> NFS4ERR_INUSE in the
>>>>>>>>>> case where we get setclientid's with the same client-provided
>>>>>>>>>> id.
>>>>>>>>>> There'd be no change of behavior in the case of multiple clients
>>>>>>>>>> sharing
>>>>>>>>>> an IP (which is fine, of course).
>>>>>>>>>
>>>>>>>>> The migration draft proposes that clients use the same
>>>>>>>>> nfs_client_id4 string for all of a server's IP addresses. Would a
>>>>>>>>> server then be obliged to return NFS4ERR_CLID_IN_USE if a client
>>>>>>>>> attempts a SETCLIENTID with the same boot verifier and
>>>>>>>>> nfs_client_id4 on more than one IP address for the same server?
>>>>>>>>
>>>>>>>> That's also not this case, sorry, this time with all the
>>>>>>>> conditions:
>>>>>>>>
>>>>>>>> - if the nfs_client_id4 is the same, and
>>>>>>>> - if the flavor is auth_sys, and
>>>>>>>> - if the client IP address is different,
>>>>>>>> - then return NFS4ERR_INUSE.
>>>>>>>
>>>>>>> This still breaks for multi-homed servers and UCS clients. The
>>>>>>> client IP address can be different depending on what server IP
>>>>>>> address the client is accessing, but all the other parameters are
>>>>>>> the same.
>>>>>>
>>>>>> OK. So probably there's nothing we can do to help here.
>>>>>>
>>>>>> As a bandaid maybe a rate-limited log message ("clientid X now in
>>>>>> use
>>>>>> from IP Y") might help debug these things....
>>>>>
>>>>> Since you guys keep Cc'ing me, I'll chime in with a rather naive
>>>>> suggestion: if all that's required is a unique id for every client,
>>>>> why
>>>>> not use the MAC of the first network interface, independent of it
>>>>> being
>>>>> used for communication with the server?
>>>>>
>>>> I think this works fairly well for "real hardware", but I'm not so sure
>>>> about clients running in VMs. (I don't really know how the VMs assign
>>>> MAC addresses to their fake net interfaces and what uniqueness guarantees
>>>> those have. I remember the old freebie VMware client for Windows just
>>>> had a config file that assigned the MAC. I bet half the installations
>>>> on the planet had the same MAC as the default config file:-)
>>>
>>> But as I understand, the clientid doesn't have to globally unique, just
>>> unique for the given NFS server.
>>
>> Global uniqueness is required to support Transparent State Migration.
>>
>>> I think if you have two virtual
>>> machines with the same MAC connecting to the same NFS server, you have
>>> different problems anyway.
>>>
>>>
>>>> ps: Also, although it's not very relevant, getting the MAC address of
>>>> the first ethernet interface isn't easy in FreeBSD. I have no idea
>>>> if the same is true of Linux. (I'd also be worried that "first"
>>>> might not be fixed?)
>>>
>>> It doesn't need to be the same interface all the time, I just meant the
>>> first as in not a specific one.
>>
>> The nfs_client_id4 string must not change across a reboot. Otherwise the server won't be able to figure out which open and lock state to discard when the client reboots.
>
> But I thought that at the moment one of the client's ips is used for the
> clientid? That's certainly changes regurlarly for DHCP or multi-homed
> clients.

You are correct that using an IP address in this string is not optimal. Obtaining some other unique identifier (say a MAC address) that won't ever change across a reboot is a difficult challenge. Currently most client implementations do insert both the client IP and server IP address in the nfs_client_id4 string. In fact, RFC 3530 recommends the use of IP addresses as part of this string.

However, NFSv4.1 I believe is driving the adoption of uniform client strings (that is, the nfs_client_id4 string is the same no matter what server the client talks to), and Transparent State Migration support in NFSv4.0 may well also require the use of a uniform client string.

In other words, I think the use of IP addresses in the client strings will be phased out over time. We believe that using a MAC address is not much better than an IP address. I've got a patch for Linux that allows administrators to insert a UUID into this string, set via a kernel boot parameter (much like the root= boot parameter). This is optional, and can remain fixed over the life of the client system. Aside from using the client's FQDN, a UUID is probably the best we can do.

> Best,
>
> -Nikolaus
>
> --
> ?Time flies like an arrow, fruit flies like a Banana.?
>
> PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-19 16:28:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
> On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
> > IMO, the server should do a comparison of the nfs_client_id4 strings,
> > and nothing else.
>
> We're supposed to return CLID_INUSE when we see a setclientid from a
> "different" client using the same string, to keep clients from doing
> mischief with other clients' state (either maliciously or, as in this
> case, accidentally).
>
> "Different" here is defined as "not having the same principal". I know
> what that means in the krb5 case, but I'm less certain in the auth_sys
> case.

Cc'ing the ietf list. Is it reasonable for a server to expect
setclientid's to come from the same client IP address at least in the
auth_sys case, or could that break multi-homed clients?

At least in the auth_sys case IP addresses are one of the only things we
have left to go on when the client's identifier-generation is messed up
(not that difficult).

--b.

2012-03-19 17:47:49

by Chuck Lever

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients


On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:

> On Mon, Mar 19, 2012 at 01:06:47PM -0400, Rick Macklem wrote:
>> I wrote:
>>> J. Bruce Fields wrote:
>>>> On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
>>>>> On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
>>>>>> IMO, the server should do a comparison of the nfs_client_id4
>>>>>> strings,
>>>>>> and nothing else.
>>>>>
>>>>> We're supposed to return CLID_INUSE when we see a setclientid from
>>>>> a
>>>>> "different" client using the same string, to keep clients from
>>>>> doing
>>>>> mischief with other clients' state (either maliciously or, as in
>>>>> this
>>>>> case, accidentally).
>>>>>
>>>>> "Different" here is defined as "not having the same principal". I
>>>>> know
>>>>> what that means in the krb5 case, but I'm less certain in the
>>>>> auth_sys
>>>>> case.
>>>>
>>>> Cc'ing the ietf list. Is it reasonable for a server to expect
>>>> setclientid's to come from the same client IP address at least in
>>>> the
>>>> auth_sys case, or could that break multi-homed clients?
>>>>
>>> I think that even a dhcp lease renewal might result in a different
>>> client
>>> IP, if the client has been partitioned from the dhcp server for a
>>> while.
>
> Yeah, but by that point the client's v4 lease is probably expired anyway
> so the client's not likely to be bothered by the NFS4ERR_INUSE.
>
>>> I'm not convinced that different client IP# implies different client.
>>> (Even "same ip# implies same client" might not be true, if the dhcp
>>> server assigned the IP# to another machine while the client was
>>> partitioned
>>> from the dhcp server, I think? I haven't looked at current dhcp
>>> implementations, but it seems conceivable to me.)
>>>
>> Oh, and what about the case of 2 clients that are sitting behind
>> the same NAT gateway? (I think they'd both be seen as having the
>> client host ip# of the gateway, but with different TCP connections
>> on different client port#s.)
>
> Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in the
> case where we get setclientid's with the same client-provided id.
> There'd be no change of behavior in the case of multiple clients sharing
> an IP (which is fine, of course).

The migration draft proposes that clients use the same nfs_client_id4 string for all of a server's IP addresses. Would a server then be obliged to return NFS4ERR_CLID_IN_USE if a client attempts a SETCLIENTID with the same boot verifier and nfs_client_id4 on more than one IP address for the same server?

IMO the server should not try to sort this situation out.

>>> For AUTH_SYS, all the FreeBSD server does is expect the same uid#.
>
> Yeah, but that's probably usually the same between clients.


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-19 18:57:01

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 19, 2012 at 02:51:02PM -0400, Nikolaus Rath wrote:
> At least in the case that sparked this discussion, it would already be
> enough to return NFS4ERR_INUSE only if the client id is being reassigned
> *and* has a 0.0.0.0 (aka autodetection failed) value.

The clientid's supposed to be totally opaque to the server, so
recognizing that case (even knowing where in the clientid to look for
the "0.0.0.0") would require the server to behave in a way that was
extremely specific to particular versions of the Linux client--something
we definitely want to avoid doing.

--b.

2012-03-19 19:01:29

by Chuck Lever

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients


On Mar 19, 2012, at 2:54 PM, J. Bruce Fields wrote:

> On Mon, Mar 19, 2012 at 02:42:30PM -0400, Chuck Lever wrote:
>>
>> On Mar 19, 2012, at 2:39 PM, J. Bruce Fields wrote:
>>
>>> On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
>>>>
>>>> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
>>>>> That's also not this case, sorry, this time with all the conditions:
>>>>>
>>>>> - if the nfs_client_id4 is the same, and
>>>>> - if the flavor is auth_sys, and
>>>>> - if the client IP address is different,
>>>>> - then return NFS4ERR_INUSE.
>>>>
>>>> This still breaks for multi-homed servers and UCS clients. The client IP address can be different depending on what server IP address the client is accessing, but all the other parameters are the same.
>>>
>>> OK. So probably there's nothing we can do to help here.
>>>
>>> As a bandaid maybe a rate-limited log message ("clientid X now in use
>>> from IP Y") might help debug these things....
>>
>> Hm, OK. That implies your server implementation assumes that a clientid4 maps to exactly one client IP address at a time.
>
> OK, agreed. So how about something like "state for client X previously
> established from IP Y now cleared from IP Z" ??
>
> (Assuming it's only the I-just-rebooted setclientid case that's likely
> to be the sign of a problem.)

We would see that only in the case where the boot verifier and the client IP change at the same time. That can happen legitimately, too, if the client has a dynamically assigned IP address. Maybe this event is only interesting if it happens more than once during the same second.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-12 20:50:02

by Chuck Lever

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients


On Mar 12, 2012, at 4:42 PM, J. Bruce Fields wrote:

> On Mon, Mar 12, 2012 at 04:30:42PM -0400, Nikolaus Rath wrote:
>> On 03/12/2012 04:15 PM, J. Bruce Fields wrote:
>>> On Mon, Mar 12, 2012 at 03:45:05PM -0400, Nikolaus Rath wrote:
>>>> On 03/12/2012 03:31 PM, J. Bruce Fields wrote:
>>>>> On Mon, Mar 12, 2012 at 12:20:17PM -0400, Nikolaus Rath wrote:
>>>>>> Nikolaus Rath <[email protected]> writes:
>>>>>>> The problem is that as soon as more than three clients are accessing the
>>>>>>> NFS shares, any operations on the NFS mountpoints by the clients hang.
>>>>>>> At the same time, CPU usage of the VPN processes becomes very high. If I
>>>>>>> run the VPN in debug mode, all I can see is that it is busy forwarding
>>>>>>> lots of packets. I also ran a packet sniffer which showed me that 90% of
>>>>>>> the packets were NFS related, but I am not familiar enough with NFS to
>>>>>>> be able to tell anything from the packets themselves. I can provide an
>>>>>>> example of the dump if that helps.
>>>>>>
>>>>>> I have put a screenshot of the dump on
>>>>>> http://www.rath.org/res/wireshark.png (the full dump is 18 MB, and I'm
>>>>>> not sure which parts are important).
>>>>>
>>>>> Looks like they're doing SETCLIENTID, SETCLIENTID_CONFIRM, OPEN,
>>>>> OPEN_CONFIRM repeatedly.
>>>>>
>>>>>> Any suggestions how I could further debug this?
>>>>>
>>>>> Could the clients be stepping on each others' state if they all think
>>>>> they have the same IP address (because of something to do with the VPN
>>>>> networking?)
>>>>
>>>> That sounds like promising path of investigation. What determines the IP
>>>> of a client as far as NFS is concerned?
>>>
>>> I don't remember where it gets the ip it uses to construct clientid's
>>> from.... But there is a mount option (clientaddr=) that will let you
>>> change what it uses. So it *might* be worth checking whether using a
>>> clientaddr= option on each client (giving it a different ipaddr on each
>>> client) would change the behavior.
>>
>> I'll try that.
>>
>> Since there seems to be some problem with client identity: all the
>> clients are generated using the same disk image. This image also
>> includes some stuff in /var/lib/nfs. I already tried emptying this on
>> every client and did not help, but maybe there is another directory with
>> state data that could cause problems?
>
> The state in there is used by the v2/v3 client, and by the server with
> v4 as well, but not by the v4 client, so I wouldn't expect that to be an
> issue.
>
>>>>> It'd be interesting to know the fields of the setclientid call, and the
>>>>> errors that the server is responding with to these calls. If you look
>>>>> at the packet details you'll probably see the same thing happening
>>>>> over and over again.
>>>>>
>>>>> Filtering to look at traffic between server and one client at a time
>>>>> might help to see the pattern.
>>>>
>>>> Hmm. I'm looking at the fields, but I just have no idea what any of
>>>> those mean. Would you possibly be willing to take a look? I uploaded a
>>>> pcap dump of a few packets to http://www.rath.org/res/sample.pcap.
>>>
>>> Looking at the packet details, under the client id field, the clients
>>> are all using:
>>>
>>> "0.0.0.0/192.168.1.2 tcp UNIX 0"
>>
>> Hmm. 192.168.1.2 is the server's address on the VPN. Is that supposed to
>> be there?
>
> Yes,and the first ip is usually the ip of the client, which does suggest
> the client is guessing it's ip wrong; so the "clientaddr=" option will
> likely help.

I thought 0.0.0.0 was a legal callback address, and means "don't send me CB requests".

But if all the clients are using the same nfs_client_id4 string, then no, the server can't distinguish between them, and they will tromp on each other's state.

The question is why can't the clients tell what their own IP address is? mount.nfs is supposed to figure that out automatically. Could be a bug in mount.nfs.

> Hm, perhaps the server should be rejecting these SETCLIENTID's with
> INUSE. It used to do that, and the client would likely recover from
> that more easily.

INUSE means the client is using multiple authentication flavors when performing RENEW or SETCLIENTID. I can't think of a reason the server should reject these; it's not supposed to look at the contents of the nfs_client_id4 string.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-19 18:51:05

by Nikolaus Rath

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

"J. Bruce Fields" <[email protected]> writes:
> > On Mon, Mar 19, 2012 at 01:06:47PM -0400, Rick Macklem wrote:
>> >> I wrote:
>>> >> > J. Bruce Fields wrote:
>>>> >> > > On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
>>>>> >> > > > On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
>>>>>> >> > > > > IMO, the server should do a comparison of the nfs_client_id4
>>>>>> >> > > > > strings, and nothing else.
>>>>> >> > > >
>>>>> >> > > > We're supposed to return CLID_INUSE when we see a setclientid
>>>>> >> > > > from a "different" client using the same string, to keep
>>>>> >> > > > clients from doing mischief with other clients' state (either
>>>>> >> > > > maliciously or, as in this case, accidentally).
>>>>> >> > > >
>>>>> >> > > > "Different" here is defined as "not having the same principal".
>>>>> >> > > > I know what that means in the krb5 case, but I'm less certain
>>>>> >> > > > in the auth_sys case.
>>>> >> > >
>>>> >> > > Cc'ing the ietf list. Is it reasonable for a server to expect
>>>> >> > > setclientid's to come from the same client IP address at least in
>>>> >> > > the auth_sys case, or could that break multi-homed clients?
>>>> >> > >
>>> >> > I think that even a dhcp lease renewal might result in a different
>>> >> > client IP, if the client has been partitioned from the dhcp server
>>> >> > for a while.
> >
> > Yeah, but by that point the client's v4 lease is probably expired
> > anyway so the client's not likely to be bothered by the NFS4ERR_INUSE.
> >
>>> >> > I'm not convinced that different client IP# implies different
>>> >> > client. (Even "same ip# implies same client" might not be true, if
>>> >> > the dhcp server assigned the IP# to another machine while the
>>> >> > client was partitioned from the dhcp server, I think? I haven't
>>> >> > looked at current dhcp implementations, but it seems conceivable to
>>> >> > me.)
>>> >> >
>> >> Oh, and what about the case of 2 clients that are sitting behind the
>> >> same NAT gateway? (I think they'd both be seen as having the client
>> >> host ip# of the gateway, but with different TCP connections on
>> >> different client port#s.)
> >
> > Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in the
> > case where we get setclientid's with the same client-provided id.

At least in the case that sparked this discussion, it would already be
enough to return NFS4ERR_INUSE only if the client id is being reassigned
*and* has a 0.0.0.0 (aka autodetection failed) value.


Best,

-Nikolaus

--
?Time flies like an arrow, fruit flies like a Banana.?

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-19 22:31:33

by Rick Macklem

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

Chuck Lever wrote:
> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>
> > On Mon, Mar 19, 2012 at 01:06:47PM -0400, Rick Macklem wrote:
> >> I wrote:
> >>> J. Bruce Fields wrote:
> >>>> On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
> >>>>> On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
> >>>>>> IMO, the server should do a comparison of the nfs_client_id4
> >>>>>> strings,
> >>>>>> and nothing else.
> >>>>>
> >>>>> We're supposed to return CLID_INUSE when we see a setclientid
> >>>>> from
> >>>>> a
> >>>>> "different" client using the same string, to keep clients from
> >>>>> doing
> >>>>> mischief with other clients' state (either maliciously or, as in
> >>>>> this
> >>>>> case, accidentally).
> >>>>>
> >>>>> "Different" here is defined as "not having the same principal".
> >>>>> I
> >>>>> know
> >>>>> what that means in the krb5 case, but I'm less certain in the
> >>>>> auth_sys
> >>>>> case.
> >>>>
> >>>> Cc'ing the ietf list. Is it reasonable for a server to expect
> >>>> setclientid's to come from the same client IP address at least in
> >>>> the
> >>>> auth_sys case, or could that break multi-homed clients?
> >>>>
> >>> I think that even a dhcp lease renewal might result in a different
> >>> client
> >>> IP, if the client has been partitioned from the dhcp server for a
> >>> while.
> >
> > Yeah, but by that point the client's v4 lease is probably expired
> > anyway
> > so the client's not likely to be bothered by the NFS4ERR_INUSE.
> >
> >>> I'm not convinced that different client IP# implies different
> >>> client.
> >>> (Even "same ip# implies same client" might not be true, if the
> >>> dhcp
> >>> server assigned the IP# to another machine while the client was
> >>> partitioned
> >>> from the dhcp server, I think? I haven't looked at current dhcp
> >>> implementations, but it seems conceivable to me.)
> >>>
> >> Oh, and what about the case of 2 clients that are sitting behind
> >> the same NAT gateway? (I think they'd both be seen as having the
> >> client host ip# of the gateway, but with different TCP connections
> >> on different client port#s.)
> >
> > Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in
> > the
> > case where we get setclientid's with the same client-provided id.
> > There'd be no change of behavior in the case of multiple clients
> > sharing
> > an IP (which is fine, of course).
>
> The migration draft proposes that clients use the same nfs_client_id4
> string for all of a server's IP addresses. Would a server then be
> obliged to return NFS4ERR_CLID_IN_USE if a client attempts a
> SETCLIENTID with the same boot verifier and nfs_client_id4 on more
> than one IP address for the same server?
>
> IMO the server should not try to sort this situation out.
>
> >>> For AUTH_SYS, all the FreeBSD server does is expect the same uid#.
> >
> > Yeah, but that's probably usually the same between clients.
>
0 maybe;-) But that's all the RFC says, so I think all a server can
do is hope the clients do a good just of generating unique
nfs_client_id4 strings?

rick
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com

2012-03-19 18:24:30

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

T24gTW9uLCAyMDEyLTAzLTE5IGF0IDEzOjQ3IC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
VGhlIG1pZ3JhdGlvbiBkcmFmdCBwcm9wb3NlcyB0aGF0IGNsaWVudHMgdXNlIHRoZSBzYW1lIG5m
c19jbGllbnRfaWQ0IHN0cmluZyBmb3IgYWxsIG9mIGEgc2VydmVyJ3MgSVAgYWRkcmVzc2VzLiAg
V291bGQgYSBzZXJ2ZXIgdGhlbiBiZSBvYmxpZ2VkIHRvIHJldHVybiBORlM0RVJSX0NMSURfSU5f
VVNFIGlmIGEgY2xpZW50IGF0dGVtcHRzIGEgU0VUQ0xJRU5USUQgd2l0aCB0aGUgc2FtZSBib290
IHZlcmlmaWVyIGFuZCBuZnNfY2xpZW50X2lkNCBvbiBtb3JlIHRoYW4gb25lIElQIGFkZHJlc3Mg
Zm9yIHRoZSBzYW1lIHNlcnZlcj8NCg0KT25seSBpZiB0aGUgY2xpZW50IHVzZXMgdGhlIHdyb25n
IHByaW5jaXBhbC4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFp
bnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBw
LmNvbQ0KDQo=

2012-03-12 20:30:43

by Nikolaus Rath

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On 03/12/2012 04:15 PM, J. Bruce Fields wrote:
> On Mon, Mar 12, 2012 at 03:45:05PM -0400, Nikolaus Rath wrote:
>> On 03/12/2012 03:31 PM, J. Bruce Fields wrote:
>>> On Mon, Mar 12, 2012 at 12:20:17PM -0400, Nikolaus Rath wrote:
>>>> Nikolaus Rath <[email protected]> writes:
>>>>> The problem is that as soon as more than three clients are accessing the
>>>>> NFS shares, any operations on the NFS mountpoints by the clients hang.
>>>>> At the same time, CPU usage of the VPN processes becomes very high. If I
>>>>> run the VPN in debug mode, all I can see is that it is busy forwarding
>>>>> lots of packets. I also ran a packet sniffer which showed me that 90% of
>>>>> the packets were NFS related, but I am not familiar enough with NFS to
>>>>> be able to tell anything from the packets themselves. I can provide an
>>>>> example of the dump if that helps.
>>>>
>>>> I have put a screenshot of the dump on
>>>> http://www.rath.org/res/wireshark.png (the full dump is 18 MB, and I'm
>>>> not sure which parts are important).
>>>
>>> Looks like they're doing SETCLIENTID, SETCLIENTID_CONFIRM, OPEN,
>>> OPEN_CONFIRM repeatedly.
>>>
>>>> Any suggestions how I could further debug this?
>>>
>>> Could the clients be stepping on each others' state if they all think
>>> they have the same IP address (because of something to do with the VPN
>>> networking?)
>>
>> That sounds like promising path of investigation. What determines the IP
>> of a client as far as NFS is concerned?
>
> I don't remember where it gets the ip it uses to construct clientid's
> from.... But there is a mount option (clientaddr=) that will let you
> change what it uses. So it *might* be worth checking whether using a
> clientaddr= option on each client (giving it a different ipaddr on each
> client) would change the behavior.

I'll try that.

Since there seems to be some problem with client identity: all the
clients are generated using the same disk image. This image also
includes some stuff in /var/lib/nfs. I already tried emptying this on
every client and did not help, but maybe there is another directory with
state data that could cause problems?

>>> It'd be interesting to know the fields of the setclientid call, and the
>>> errors that the server is responding with to these calls. If you look
>>> at the packet details you'll probably see the same thing happening
>>> over and over again.
>>>
>>> Filtering to look at traffic between server and one client at a time
>>> might help to see the pattern.
>>
>> Hmm. I'm looking at the fields, but I just have no idea what any of
>> those mean. Would you possibly be willing to take a look? I uploaded a
>> pcap dump of a few packets to http://www.rath.org/res/sample.pcap.
>
> Looking at the packet details, under the client id field, the clients
> are all using:
>
> "0.0.0.0/192.168.1.2 tcp UNIX 0"

Hmm. 192.168.1.2 is the server's address on the VPN. Is that supposed to
be there?


Thanks for your help!

-Nikolaus

--
?Time flies like an arrow, fruit flies like a Banana.?

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-19 17:36:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 19, 2012 at 01:06:47PM -0400, Rick Macklem wrote:
> I wrote:
> > J. Bruce Fields wrote:
> > > On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
> > > > On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
> > > > > IMO, the server should do a comparison of the nfs_client_id4
> > > > > strings,
> > > > > and nothing else.
> > > >
> > > > We're supposed to return CLID_INUSE when we see a setclientid from
> > > > a
> > > > "different" client using the same string, to keep clients from
> > > > doing
> > > > mischief with other clients' state (either maliciously or, as in
> > > > this
> > > > case, accidentally).
> > > >
> > > > "Different" here is defined as "not having the same principal". I
> > > > know
> > > > what that means in the krb5 case, but I'm less certain in the
> > > > auth_sys
> > > > case.
> > >
> > > Cc'ing the ietf list. Is it reasonable for a server to expect
> > > setclientid's to come from the same client IP address at least in
> > > the
> > > auth_sys case, or could that break multi-homed clients?
> > >
> > I think that even a dhcp lease renewal might result in a different
> > client
> > IP, if the client has been partitioned from the dhcp server for a
> > while.

Yeah, but by that point the client's v4 lease is probably expired anyway
so the client's not likely to be bothered by the NFS4ERR_INUSE.

> > I'm not convinced that different client IP# implies different client.
> > (Even "same ip# implies same client" might not be true, if the dhcp
> > server assigned the IP# to another machine while the client was
> > partitioned
> > from the dhcp server, I think? I haven't looked at current dhcp
> > implementations, but it seems conceivable to me.)
> >
> Oh, and what about the case of 2 clients that are sitting behind
> the same NAT gateway? (I think they'd both be seen as having the
> client host ip# of the gateway, but with different TCP connections
> on different client port#s.)

Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in the
case where we get setclientid's with the same client-provided id.
There'd be no change of behavior in the case of multiple clients sharing
an IP (which is fine, of course).

> > For AUTH_SYS, all the FreeBSD server does is expect the same uid#.

Yeah, but that's probably usually the same between clients.

--b.

2012-03-12 21:58:00

by Myklebust, Trond

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

T24gTW9uLCAyMDEyLTAzLTEyIGF0IDE3OjQ2IC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
T24gTWFyIDEyLCAyMDEyLCBhdCA1OjM4IFBNLCBOaWtvbGF1cyBSYXRoIHdyb3RlOg0KPiANCj4g
PiBPbiAwMy8xMi8yMDEyIDA1OjI3IFBNLCBDaHVjayBMZXZlciB3cm90ZToNCj4gPj4gDQo+ID4+
IE9uIE1hciAxMiwgMjAxMiwgYXQgNToyNCBQTSwgTmlrb2xhdXMgUmF0aCB3cm90ZToNCj4gPj4g
DQo+ID4+PiBBbHJpZ2h0LCBpdCBzZWVtcyB0aGF0IHRoaXMgd2FzIHRoZSBwcm9ibGVtLiBXaXRo
IGNvcnJlY3QgY2xpZW50YWRkciwgSQ0KPiA+Pj4gaGF2ZW4ndCBiZWVuIGFibGUgdG8gcHJvZHVj
ZSBhbnkgZnJlZXplcyBmb3IgdGhlIGxhc3QgMTUgbWludXRlcw0KPiA+Pj4gKHVzdWFsbHkgaXQg
aGFwcGVucyBpbiB+MjAgc2Vjb25kcykuDQo+ID4+PiANCj4gPj4+IFRoZSB3ZWlyZCB0aGluZyBp
cyB0aGF0IEkgY2Fubm90IHJlcHJvZHVjZSB0aGUgd3JvbmcgY2xpZW50YWRkcg0KPiA+Pj4gYXV0
b2RldGVjdGlvbiB3aGVuIEkgbW91bnQgdGhlIE5GUyB2b2x1bWVzIGZyb20gdGhlIGNvbW1hbmQg
bGluZS4gSXQNCj4gPj4+IHNlZW1zIHRvIGhhcHBlbiBvbmx5IHdoZW4gdGhlIG1vdW50aW5nIGlz
IGRvbmUgYnkgbW91bnRhbGwgZHVyaW5nIHRoZQ0KPiA+Pj4gYm9vdCBzZXF1ZW5jZS4NCj4gPj4+
IA0KPiA+Pj4gSW4gb3RoZXIgd29yZHMsIHRoaXMgZnN0YWIgZW50cnkgcmVzdWx0cyBpbiBmcmVl
emVzIGFuZCBhIGNsaWVudGFkZHIgb2YgMC4wLjAuMDoNCj4gPj4+IA0KPiA+Pj4gc3BpdHplcjov
b3B0IC9vcHQgICAgICAgIG5mczQgICAgYmcgMCAgMA0KPiA+Pj4gDQo+ID4+PiBXaGlsZSB0aGlz
IG9uZSwgZm9sbG93ZWQgYnkgYSAibW91bnQgL29wdCIgb24gdGhlIGNvbnNvbGUgYXMgc29vbiBh
cyBJJ20NCj4gPj4+IGFibGUgdG8gbG9nIGluLCB3b3JrcyBqdXN0IGZpbmUgKGFuZCBoYXMgYSBj
b3JyZWN0IGNsaWVudGFkZHIpOg0KPiA+Pj4gDQo+ID4+PiBzcGl0emVyOi9vcHQgL29wdCAgICAg
ICAgbmZzNCAgICBub2F1dG8gMCAgMA0KPiA+PiANCj4gPj4gVGhhdCdzIGFsbW9zdCBjZXJ0YWlu
bHkgYmVjYXVzZSBuZXR3b3JraW5nIGlzbid0IHVwIGR1cmluZyBib290LiAgVGhlICJiZyIgb3B0
aW9uIGtlZXBzIHRyeWluZyB0aGUgbW91bnQgdW50aWwgaXQgc3VjY2VlZHMsIGJ1dCB0aGUgc3lz
dGVtIHN0YXJ0ZWQgdGhlIG1vdW50IHByb2Nlc3MgYmVmb3JlIHRoZXJlIHdhcyBhIHNvdXJjZSBh
ZGRyZXNzIG9uIHRoZSBzeXN0ZW0uDQo+ID4gDQo+ID4gT2gsIHNvIHRoZSBjbGllbnRhZGRyIGRl
dGVjdGlvbiB0YWtlcyBwbGFjZSBvbmx5IG9uY2UgYXQgdGhlIGJlZ2lubmluZywNCj4gPiBhbmQg
aXMgbm90IHJlcGVhdGVkIGlmIHRoZSBtb3VudCBhdHRlbXB0cyBhcmUgcmVwZWF0ZWQ/DQo+ID4g
DQo+ID4gVGhpcyB3b3VsZCBleHBsYWluIHdoYXQncyBoYXBwZW5pbmcuIFRoZSBWUE4gaXMgbm90
IHlldCB1cCB3aGVuIHRoZQ0KPiA+IHN5c3RlbSBmaXJzdCBhdHRlbXB0cyB0byBtb3VudCB0aGUg
c2hhcmUuDQo+ID4gDQo+ID4gSXMgdGhlcmUgYSByYXRpb25hbGUgYmVoaW5kIHRoaXM/IEl0IHNl
ZW1zIHRvIG1lIHRoYXQgaWYgdGhlIG1vdW50IGlzDQo+ID4gcmV0cmllZCwgaXQgd291bGQgYmUg
cmVhc29uYWJsZSB0byBleHBlY3QgdGhhdCB0aGUgY2xpZW50YWRkciBkZXRlY3Rpb24NCj4gPiBp
cyByZXRyaWVkIGFzIHdlbGwuDQo+IA0KPiBUaGVyZSdzIG5vIHJlYXNvbiBJIGNhbiByZWNhbGwg
cmVxdWlyaW5nIHRoYXQgdGhpcyBpcyBkb25lIG9ubHkgb25jZSwgb3RoZXIgdGhhbiBpdCdzIHRo
ZSBzaW1wbGVzdCBpbXBsZW1lbnRhdGlvbiBmb3IgbW91bnQubmZzLiAgSGlzdG9yaWNhbGx5LCBO
RlMgaXMgZGVwbG95ZWQgb24gc3lzdGVtcyB3aXRoIHN0YXRpYyBuZXR3b3JrIGNvbmZpZ3VyYXRp
b25zIHRoYXQgYXJlIHN1cHBvc2VkIHRvIGJlIHNldCB1cCBiZWZvcmUgdGhlIE5GUyB1dGlsaXRp
ZXMgY29tZSBpbnRvIHBsYXkuICBBcyBCcnVjZSBzdWdnZXN0ZWQsIHBlcmhhcHMgdGhpcyBkZXNp
Z24gYXNzdW1wdGlvbiBuZWVkcyB0byBiZSByZXZpc2l0ZWQuDQo+IA0KPiBJIHN1cHBvc2UgSSBz
aG91bGQgZmlsZSBhIGJ1Zy4NCg0KQ29uc2lkZXIgdGhlIHdob2xlICdiZycgb3B0aW9uIHRvIGJl
IGEgYnVnIGF0IHRoaXMgcG9pbnQuDQoNCk5vdyB0aGF0IHdlIGhhdmUgd29ya2luZyBhdXRvZnMg
c3VwcG9ydCBmb3IgZGlyZWN0IG1vdW50cywgdGhlcmUgaXMgbm8NCnJlYXNvbiB0byBrZWVwIHRo
ZSAnYmcnIG1vdW50IG9wdGlvbiBvbiBsaWZlIHN1cHBvcnQgYW55IG1vcmUuDQoNCkNoZWVycw0K
ICBUcm9uZA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5l
cg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K
DQo=

2012-03-19 18:30:20

by Chuck Lever

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients


On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:

> On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
>>
>> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
>>
>>> On Mon, Mar 19, 2012 at 01:06:47PM -0400, Rick Macklem wrote:
>>>> I wrote:
>>>>> J. Bruce Fields wrote:
>>>>>> On Mon, Mar 12, 2012 at 05:27:08PM -0400, J. Bruce Fields wrote:
>>>>>>> On Mon, Mar 12, 2012 at 05:14:16PM -0400, Chuck Lever wrote:
>>>>>>>> IMO, the server should do a comparison of the nfs_client_id4
>>>>>>>> strings,
>>>>>>>> and nothing else.
>>>>>>>
>>>>>>> We're supposed to return CLID_INUSE when we see a setclientid from
>>>>>>> a
>>>>>>> "different" client using the same string, to keep clients from
>>>>>>> doing
>>>>>>> mischief with other clients' state (either maliciously or, as in
>>>>>>> this
>>>>>>> case, accidentally).
>>>>>>>
>>>>>>> "Different" here is defined as "not having the same principal". I
>>>>>>> know
>>>>>>> what that means in the krb5 case, but I'm less certain in the
>>>>>>> auth_sys
>>>>>>> case.
>>>>>>
>>>>>> Cc'ing the ietf list. Is it reasonable for a server to expect
>>>>>> setclientid's to come from the same client IP address at least in
>>>>>> the
>>>>>> auth_sys case, or could that break multi-homed clients?
>>>>>>
>>>>> I think that even a dhcp lease renewal might result in a different
>>>>> client
>>>>> IP, if the client has been partitioned from the dhcp server for a
>>>>> while.
>>>
>>> Yeah, but by that point the client's v4 lease is probably expired anyway
>>> so the client's not likely to be bothered by the NFS4ERR_INUSE.
>>>
>>>>> I'm not convinced that different client IP# implies different client.
>>>>> (Even "same ip# implies same client" might not be true, if the dhcp
>>>>> server assigned the IP# to another machine while the client was
>>>>> partitioned
>>>>> from the dhcp server, I think? I haven't looked at current dhcp
>>>>> implementations, but it seems conceivable to me.)
>>>>>
>>>> Oh, and what about the case of 2 clients that are sitting behind
>>>> the same NAT gateway? (I think they'd both be seen as having the
>>>> client host ip# of the gateway, but with different TCP connections
>>>> on different client port#s.)
>>>
>>> Well, sure, but all I'm proposing here is returning NFS4ERR_INUSE in the
>>> case where we get setclientid's with the same client-provided id.
>>> There'd be no change of behavior in the case of multiple clients sharing
>>> an IP (which is fine, of course).
>>
>> The migration draft proposes that clients use the same nfs_client_id4 string for all of a server's IP addresses. Would a server then be obliged to return NFS4ERR_CLID_IN_USE if a client attempts a SETCLIENTID with the same boot verifier and nfs_client_id4 on more than one IP address for the same server?
>
> That's also not this case, sorry, this time with all the conditions:
>
> - if the nfs_client_id4 is the same, and
> - if the flavor is auth_sys, and
> - if the client IP address is different,
> - then return NFS4ERR_INUSE.

This still breaks for multi-homed servers and UCS clients. The client IP address can be different depending on what server IP address the client is accessing, but all the other parameters are the same.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2012-03-12 19:31:17

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 12, 2012 at 12:20:17PM -0400, Nikolaus Rath wrote:
> Nikolaus Rath <[email protected]> writes:
> > The problem is that as soon as more than three clients are accessing the
> > NFS shares, any operations on the NFS mountpoints by the clients hang.
> > At the same time, CPU usage of the VPN processes becomes very high. If I
> > run the VPN in debug mode, all I can see is that it is busy forwarding
> > lots of packets. I also ran a packet sniffer which showed me that 90% of
> > the packets were NFS related, but I am not familiar enough with NFS to
> > be able to tell anything from the packets themselves. I can provide an
> > example of the dump if that helps.
>
> I have put a screenshot of the dump on
> http://www.rath.org/res/wireshark.png (the full dump is 18 MB, and I'm
> not sure which parts are important).

Looks like they're doing SETCLIENTID, SETCLIENTID_CONFIRM, OPEN,
OPEN_CONFIRM repeatedly.

> Any suggestions how I could further debug this?

Could the clients be stepping on each others' state if they all think
they have the same IP address (because of something to do with the VPN
networking?)

It'd be interesting to know the fields of the setclientid call, and the
errors that the server is responding with to these calls. If you look
at the packet details you'll probably see the same thing happening
over and over again.

Filtering to look at traffic between server and one client at a time
might help to see the pattern.

--b.

2012-03-12 21:38:09

by Nikolaus Rath

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On 03/12/2012 05:27 PM, Chuck Lever wrote:
>
> On Mar 12, 2012, at 5:24 PM, Nikolaus Rath wrote:
>
>> Alright, it seems that this was the problem. With correct clientaddr, I
>> haven't been able to produce any freezes for the last 15 minutes
>> (usually it happens in ~20 seconds).
>>
>> The weird thing is that I cannot reproduce the wrong clientaddr
>> autodetection when I mount the NFS volumes from the command line. It
>> seems to happen only when the mounting is done by mountall during the
>> boot sequence.
>>
>> In other words, this fstab entry results in freezes and a clientaddr of 0.0.0.0:
>>
>> spitzer:/opt /opt nfs4 bg 0 0
>>
>> While this one, followed by a "mount /opt" on the console as soon as I'm
>> able to log in, works just fine (and has a correct clientaddr):
>>
>> spitzer:/opt /opt nfs4 noauto 0 0
>
> That's almost certainly because networking isn't up during boot. The "bg" option keeps trying the mount until it succeeds, but the system started the mount process before there was a source address on the system.

Oh, so the clientaddr detection takes place only once at the beginning,
and is not repeated if the mount attempts are repeated?

This would explain what's happening. The VPN is not yet up when the
system first attempts to mount the share.

Is there a rationale behind this? It seems to me that if the mount is
retried, it would be reasonable to expect that the clientaddr detection
is retried as well.


Best,

-Nikolaus

--
?Time flies like an arrow, fruit flies like a Banana.?

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

2012-03-12 20:42:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients

On Mon, Mar 12, 2012 at 04:30:42PM -0400, Nikolaus Rath wrote:
> On 03/12/2012 04:15 PM, J. Bruce Fields wrote:
> > On Mon, Mar 12, 2012 at 03:45:05PM -0400, Nikolaus Rath wrote:
> >> On 03/12/2012 03:31 PM, J. Bruce Fields wrote:
> >>> On Mon, Mar 12, 2012 at 12:20:17PM -0400, Nikolaus Rath wrote:
> >>>> Nikolaus Rath <[email protected]> writes:
> >>>>> The problem is that as soon as more than three clients are accessing the
> >>>>> NFS shares, any operations on the NFS mountpoints by the clients hang.
> >>>>> At the same time, CPU usage of the VPN processes becomes very high. If I
> >>>>> run the VPN in debug mode, all I can see is that it is busy forwarding
> >>>>> lots of packets. I also ran a packet sniffer which showed me that 90% of
> >>>>> the packets were NFS related, but I am not familiar enough with NFS to
> >>>>> be able to tell anything from the packets themselves. I can provide an
> >>>>> example of the dump if that helps.
> >>>>
> >>>> I have put a screenshot of the dump on
> >>>> http://www.rath.org/res/wireshark.png (the full dump is 18 MB, and I'm
> >>>> not sure which parts are important).
> >>>
> >>> Looks like they're doing SETCLIENTID, SETCLIENTID_CONFIRM, OPEN,
> >>> OPEN_CONFIRM repeatedly.
> >>>
> >>>> Any suggestions how I could further debug this?
> >>>
> >>> Could the clients be stepping on each others' state if they all think
> >>> they have the same IP address (because of something to do with the VPN
> >>> networking?)
> >>
> >> That sounds like promising path of investigation. What determines the IP
> >> of a client as far as NFS is concerned?
> >
> > I don't remember where it gets the ip it uses to construct clientid's
> > from.... But there is a mount option (clientaddr=) that will let you
> > change what it uses. So it *might* be worth checking whether using a
> > clientaddr= option on each client (giving it a different ipaddr on each
> > client) would change the behavior.
>
> I'll try that.
>
> Since there seems to be some problem with client identity: all the
> clients are generated using the same disk image. This image also
> includes some stuff in /var/lib/nfs. I already tried emptying this on
> every client and did not help, but maybe there is another directory with
> state data that could cause problems?

The state in there is used by the v2/v3 client, and by the server with
v4 as well, but not by the v4 client, so I wouldn't expect that to be an
issue.

> >>> It'd be interesting to know the fields of the setclientid call, and the
> >>> errors that the server is responding with to these calls. If you look
> >>> at the packet details you'll probably see the same thing happening
> >>> over and over again.
> >>>
> >>> Filtering to look at traffic between server and one client at a time
> >>> might help to see the pattern.
> >>
> >> Hmm. I'm looking at the fields, but I just have no idea what any of
> >> those mean. Would you possibly be willing to take a look? I uploaded a
> >> pcap dump of a few packets to http://www.rath.org/res/sample.pcap.
> >
> > Looking at the packet details, under the client id field, the clients
> > are all using:
> >
> > "0.0.0.0/192.168.1.2 tcp UNIX 0"
>
> Hmm. 192.168.1.2 is the server's address on the VPN. Is that supposed to
> be there?

Yes,and the first ip is usually the ip of the client, which does suggest
the client is guessing it's ip wrong; so the "clientaddr=" option will
likely help.

Hm, perhaps the server should be rejecting these SETCLIENTID's with
INUSE. It used to do that, and the client would likely recover from
that more easily.

--b.

2012-03-19 22:25:34

by Rick Macklem

[permalink] [raw]
Subject: Re: [nfsv4] NFS4 over VPN hangs when connecting > 2 clients

Nikolaus Rath wrote:
> On 03/19/2012 02:39 PM, J. Bruce Fields wrote:
> > On Mon, Mar 19, 2012 at 02:29:46PM -0400, Chuck Lever wrote:
> >>
> >> On Mar 19, 2012, at 2:27 PM, J. Bruce Fields wrote:
> >>
> >>> On Mon, Mar 19, 2012 at 01:47:14PM -0400, Chuck Lever wrote:
> >>>>
> >>>> On Mar 19, 2012, at 1:36 PM, J. Bruce Fields wrote:
> >>>>> Well, sure, but all I'm proposing here is returning
> >>>>> NFS4ERR_INUSE in the
> >>>>> case where we get setclientid's with the same client-provided
> >>>>> id.
> >>>>> There'd be no change of behavior in the case of multiple clients
> >>>>> sharing
> >>>>> an IP (which is fine, of course).
> >>>>
> >>>> The migration draft proposes that clients use the same
> >>>> nfs_client_id4 string for all of a server's IP addresses. Would a
> >>>> server then be obliged to return NFS4ERR_CLID_IN_USE if a client
> >>>> attempts a SETCLIENTID with the same boot verifier and
> >>>> nfs_client_id4 on more than one IP address for the same server?
> >>>
> >>> That's also not this case, sorry, this time with all the
> >>> conditions:
> >>>
> >>> - if the nfs_client_id4 is the same, and
> >>> - if the flavor is auth_sys, and
> >>> - if the client IP address is different,
> >>> - then return NFS4ERR_INUSE.
> >>
> >> This still breaks for multi-homed servers and UCS clients. The
> >> client IP address can be different depending on what server IP
> >> address the client is accessing, but all the other parameters are
> >> the same.
> >
> > OK. So probably there's nothing we can do to help here.
> >
> > As a bandaid maybe a rate-limited log message ("clientid X now in
> > use
> > from IP Y") might help debug these things....
>
> Since you guys keep Cc'ing me, I'll chime in with a rather naive
> suggestion: if all that's required is a unique id for every client,
> why
> not use the MAC of the first network interface, independent of it
> being
> used for communication with the server?
>
I think this works fairly well for "real hardware", but I'm not so sure
about clients running in VMs. (I don't really know how the VMs assign
MAC addresses to their fake net interfaces and what uniqueness guarantees
those have. I remember the old freebie VMware client for Windows just
had a config file that assigned the MAC. I bet half the installations
on the planet had the same MAC as the default config file:-)

rick
ps: Also, although it's not very relevant, getting the MAC address of
the first ethernet interface isn't easy in FreeBSD. I have no idea
if the same is true of Linux. (I'd also be worried that "first"
might not be fixed?)

>
> Best,
>
> -Nikolaus
>
> --
> »Time flies like an arrow, fruit flies like a Banana.«
>
> PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4

2012-03-12 21:28:03

by Chuck Lever

[permalink] [raw]
Subject: Re: NFS4 over VPN hangs when connecting > 2 clients


On Mar 12, 2012, at 5:24 PM, Nikolaus Rath wrote:

> Alright, it seems that this was the problem. With correct clientaddr, I
> haven't been able to produce any freezes for the last 15 minutes
> (usually it happens in ~20 seconds).
>
> The weird thing is that I cannot reproduce the wrong clientaddr
> autodetection when I mount the NFS volumes from the command line. It
> seems to happen only when the mounting is done by mountall during the
> boot sequence.
>
> In other words, this fstab entry results in freezes and a clientaddr of 0.0.0.0:
>
> spitzer:/opt /opt nfs4 bg 0 0
>
> While this one, followed by a "mount /opt" on the console as soon as I'm
> able to log in, works just fine (and has a correct cliendaddr):
>
> spitzer:/opt /opt nfs4 noauto 0 0

That's almost certainly because networking isn't up during boot. The "bg" option keeps trying the mount until it succeeds, but the system started the mount process before there was a source address on the system.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com