2017-05-12 13:27:23

by Stefan Hajnoczi

[permalink] [raw]
Subject: EXCHANGE_ID with same network address but different server owner

Hi,
I've been working on NFS over the AF_VSOCK transport
(https://www.spinics.net/lists/linux-nfs/msg60292.html). AF_VSOCK
resets established network connections when the virtual machine is
migrated to a new host.

The NFS client expects file handles and other state to remain valid upon
reconnecting. This is not the case after VM live migration since the
new host does not have the NFS server state from the old host.

Volatile file handles have been suggested as a way to reflect that state
does not persist across reconnect, but the Linux NFS client does not
support volatile file handles.

I saw NFS 4.1 has a way for a new server running with the same network
address of an old server to communicate that it is indeed a new server
instance. If the server owner/scope in the EXCHANGE_ID response does
not match the previous server's values then the server is a new
instance.

The implications of encountering a new server owner/scope upon reconnect
aren't clear to me and I'm not sure to what extent the Linux
implementation handles this case. Can anyone explain what happens if
the NFS client finds a new server owner/scope after reconnecting?

Thanks,
Stefan


Attachments:
(No filename) (1.14 kB)
signature.asc (455.00 B)
Download all attachments

2017-05-12 14:34:20

by J. Bruce Fields

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Fri, May 12, 2017 at 09:27:21AM -0400, Stefan Hajnoczi wrote:
> Hi,
> I've been working on NFS over the AF_VSOCK transport
> (https://www.spinics.net/lists/linux-nfs/msg60292.html). AF_VSOCK
> resets established network connections when the virtual machine is
> migrated to a new host.
>
> The NFS client expects file handles and other state to remain valid upon
> reconnecting. This is not the case after VM live migration since the
> new host does not have the NFS server state from the old host.
>
> Volatile file handles have been suggested as a way to reflect that state
> does not persist across reconnect, but the Linux NFS client does not
> support volatile file handles.

That's unlikely to change; the protocol allows the server to advertise
volatile filehandles, but doesn't really give any tools to implement
them reliably.

> I saw NFS 4.1 has a way for a new server running with the same network
> address of an old server to communicate that it is indeed a new server
> instance. If the server owner/scope in the EXCHANGE_ID response does
> not match the previous server's values then the server is a new
> instance.
>
> The implications of encountering a new server owner/scope upon reconnect
> aren't clear to me and I'm not sure to what extent the Linux
> implementation handles this case. Can anyone explain what happens if
> the NFS client finds a new server owner/scope after reconnecting?

I haven't tested it, but if it reconnects to the same IP address and
finds out it's no longer talking to the same server, I think the only
correct thing it could do would be to just fail all further access.

There's no easy solution.

To migrate between NFS servers you need some sort of clustered NFS
service with shared storage. We can't currently support concurrent
access to shared storage from multiple NFS servers, so all that's
possible active/passive failover. Also, people that set that up
normally depend on a floating IP address--I'm not sure if there's an
equivalent for VSOCK.

--b.

2017-05-12 15:01:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

T24gRnJpLCAyMDE3LTA1LTEyIGF0IDEwOjM0IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIEZyaSwgTWF5IDEyLCAyMDE3IGF0IDA5OjI3OjIxQU0gLTA0MDAsIFN0ZWZhbiBIYWpu
b2N6aSB3cm90ZToNCj4gPiBIaSwNCj4gPiBJJ3ZlIGJlZW4gd29ya2luZyBvbiBORlMgb3ZlciB0
aGUgQUZfVlNPQ0sgdHJhbnNwb3J0DQo+ID4gKGh0dHBzOi8vd3d3LnNwaW5pY3MubmV0L2xpc3Rz
L2xpbnV4LW5mcy9tc2c2MDI5Mi5odG1sKS7CoMKgQUZfVlNPQ0sNCj4gPiByZXNldHMgZXN0YWJs
aXNoZWQgbmV0d29yayBjb25uZWN0aW9ucyB3aGVuIHRoZSB2aXJ0dWFsIG1hY2hpbmUgaXMNCj4g
PiBtaWdyYXRlZCB0byBhIG5ldyBob3N0Lg0KPiA+IA0KPiA+IFRoZSBORlMgY2xpZW50IGV4cGVj
dHMgZmlsZSBoYW5kbGVzIGFuZCBvdGhlciBzdGF0ZSB0byByZW1haW4gdmFsaWQNCj4gPiB1cG9u
DQo+ID4gcmVjb25uZWN0aW5nLsKgwqBUaGlzIGlzIG5vdCB0aGUgY2FzZSBhZnRlciBWTSBsaXZl
IG1pZ3JhdGlvbiBzaW5jZQ0KPiA+IHRoZQ0KPiA+IG5ldyBob3N0IGRvZXMgbm90IGhhdmUgdGhl
IE5GUyBzZXJ2ZXIgc3RhdGUgZnJvbSB0aGUgb2xkIGhvc3QuDQo+ID4gDQo+ID4gVm9sYXRpbGUg
ZmlsZSBoYW5kbGVzIGhhdmUgYmVlbiBzdWdnZXN0ZWQgYXMgYSB3YXkgdG8gcmVmbGVjdCB0aGF0
DQo+ID4gc3RhdGUNCj4gPiBkb2VzIG5vdCBwZXJzaXN0IGFjcm9zcyByZWNvbm5lY3QsIGJ1dCB0
aGUgTGludXggTkZTIGNsaWVudCBkb2VzDQo+ID4gbm90DQo+ID4gc3VwcG9ydCB2b2xhdGlsZSBm
aWxlIGhhbmRsZXMuDQo+IA0KPiBUaGF0J3MgdW5saWtlbHkgdG8gY2hhbmdlOyB0aGUgcHJvdG9j
b2wgYWxsb3dzIHRoZSBzZXJ2ZXIgdG8NCj4gYWR2ZXJ0aXNlDQo+IHZvbGF0aWxlIGZpbGVoYW5k
bGVzLCBidXQgZG9lc24ndCByZWFsbHkgZ2l2ZSBhbnkgdG9vbHMgdG8gaW1wbGVtZW50DQo+IHRo
ZW0gcmVsaWFibHkuDQo+IA0KPiA+IEkgc2F3IE5GUyA0LjEgaGFzIGEgd2F5IGZvciBhIG5ldyBz
ZXJ2ZXIgcnVubmluZyB3aXRoIHRoZSBzYW1lDQo+ID4gbmV0d29yaw0KPiA+IGFkZHJlc3Mgb2Yg
YW4gb2xkIHNlcnZlciB0byBjb21tdW5pY2F0ZSB0aGF0IGl0IGlzIGluZGVlZCBhIG5ldw0KPiA+
IHNlcnZlcg0KPiA+IGluc3RhbmNlLsKgwqBJZiB0aGUgc2VydmVyIG93bmVyL3Njb3BlIGluIHRo
ZSBFWENIQU5HRV9JRCByZXNwb25zZQ0KPiA+IGRvZXMNCj4gPiBub3QgbWF0Y2ggdGhlIHByZXZp
b3VzIHNlcnZlcidzIHZhbHVlcyB0aGVuIHRoZSBzZXJ2ZXIgaXMgYSBuZXcNCj4gPiBpbnN0YW5j
ZS4NCj4gPiANCj4gPiBUaGUgaW1wbGljYXRpb25zIG9mIGVuY291bnRlcmluZyBhIG5ldyBzZXJ2
ZXIgb3duZXIvc2NvcGUgdXBvbg0KPiA+IHJlY29ubmVjdA0KPiA+IGFyZW4ndCBjbGVhciB0byBt
ZSBhbmQgSSdtIG5vdCBzdXJlIHRvIHdoYXQgZXh0ZW50IHRoZSBMaW51eA0KPiA+IGltcGxlbWVu
dGF0aW9uIGhhbmRsZXMgdGhpcyBjYXNlLsKgwqBDYW4gYW55b25lIGV4cGxhaW4gd2hhdCBoYXBw
ZW5zDQo+ID4gaWYNCj4gPiB0aGUgTkZTIGNsaWVudCBmaW5kcyBhIG5ldyBzZXJ2ZXIgb3duZXIv
c2NvcGUgYWZ0ZXIgcmVjb25uZWN0aW5nPw0KPiANCj4gSSBoYXZlbid0IHRlc3RlZCBpdCwgYnV0
IGlmIGl0IHJlY29ubmVjdHMgdG8gdGhlIHNhbWUgSVAgYWRkcmVzcyBhbmQNCj4gZmluZHMgb3V0
IGl0J3Mgbm8gbG9uZ2VyIHRhbGtpbmcgdG8gdGhlIHNhbWUgc2VydmVyLCBJIHRoaW5rIHRoZSBv
bmx5DQo+IGNvcnJlY3QgdGhpbmcgaXQgY291bGQgZG8gd291bGQgYmUgdG8ganVzdCBmYWlsIGFs
bCBmdXJ0aGVyIGFjY2Vzcy4NCj4gDQo+IFRoZXJlJ3Mgbm8gZWFzeSBzb2x1dGlvbi4NCj4gDQo+
IFRvIG1pZ3JhdGUgYmV0d2VlbiBORlMgc2VydmVycyB5b3UgbmVlZCBzb21lIHNvcnQgb2YgY2x1
c3RlcmVkIE5GUw0KPiBzZXJ2aWNlIHdpdGggc2hhcmVkIHN0b3JhZ2UuwqDCoFdlIGNhbid0IGN1
cnJlbnRseSBzdXBwb3J0IGNvbmN1cnJlbnQNCj4gYWNjZXNzIHRvIHNoYXJlZCBzdG9yYWdlIGZy
b20gbXVsdGlwbGUgTkZTIHNlcnZlcnMsIHNvIGFsbCB0aGF0J3MNCj4gcG9zc2libGUgYWN0aXZl
L3Bhc3NpdmUgZmFpbG92ZXIuwqDCoEFsc28sIHBlb3BsZSB0aGF0IHNldCB0aGF0IHVwDQo+IG5v
cm1hbGx5IGRlcGVuZCBvbiBhIGZsb2F0aW5nIElQIGFkZHJlc3MtLUknbSBub3Qgc3VyZSBpZiB0
aGVyZSdzIGFuDQo+IGVxdWl2YWxlbnQgZm9yIFZTT0NLLg0KPiANCg0KQWN0dWFsbHksIHRoaXMg
bWlnaHQgYmUgYSB1c2UgY2FzZSBmb3IgcmUtZXhwb3J0aW5nIE5GUy4gSWYgdGhlIGhvc3QNCmNv
dWxkIHJlLWV4cG9ydCBhIE5GUyBtb3VudCB0byB0aGUgZ3Vlc3RzLCB0aGVuIHlvdSBkb24ndCBu
ZWNlc3NhcmlseQ0KbmVlZCBhIGNsdXN0ZXJlZCBmaWxlc3lzdGVtLg0KDQpPVE9ILCB0aGlzIHdv
dWxkIG5vdCBzb2x2ZSB0aGUgcHJvYmxlbSBvZiBtaWdyYXRpbmcgbG9ja3MsIHdoaWNoIGlzIG5v
dA0KcmVhbGx5IGVhc3kgdG8gc3VwcG9ydCBpbiB0aGUgY3VycmVudCBzdGF0ZSBtb2RlbCBmb3Ig
TkZTdjQueC4NCg0KPiAtLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWlu
dGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K


2017-05-12 17:01:04

by Chuck Lever III

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner


> On May 12, 2017, at 11:01 AM, Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2017-05-12 at 10:34 -0400, J. Bruce Fields wrote:
>> On Fri, May 12, 2017 at 09:27:21AM -0400, Stefan Hajnoczi wrote:
>>> Hi,
>>> I've been working on NFS over the AF_VSOCK transport
>>> (https://www.spinics.net/lists/linux-nfs/msg60292.html). AF_VSOCK
>>> resets established network connections when the virtual machine is
>>> migrated to a new host.
>>>
>>> The NFS client expects file handles and other state to remain valid
>>> upon
>>> reconnecting. This is not the case after VM live migration since
>>> the
>>> new host does not have the NFS server state from the old host.
>>>
>>> Volatile file handles have been suggested as a way to reflect that
>>> state
>>> does not persist across reconnect, but the Linux NFS client does
>>> not
>>> support volatile file handles.
>>
>> That's unlikely to change; the protocol allows the server to
>> advertise
>> volatile filehandles, but doesn't really give any tools to implement
>> them reliably.
>>
>>> I saw NFS 4.1 has a way for a new server running with the same
>>> network
>>> address of an old server to communicate that it is indeed a new
>>> server
>>> instance. If the server owner/scope in the EXCHANGE_ID response
>>> does
>>> not match the previous server's values then the server is a new
>>> instance.
>>>
>>> The implications of encountering a new server owner/scope upon
>>> reconnect
>>> aren't clear to me and I'm not sure to what extent the Linux
>>> implementation handles this case. Can anyone explain what happens
>>> if
>>> the NFS client finds a new server owner/scope after reconnecting?
>>
>> I haven't tested it, but if it reconnects to the same IP address and
>> finds out it's no longer talking to the same server, I think the only
>> correct thing it could do would be to just fail all further access.
>>
>> There's no easy solution.
>>
>> To migrate between NFS servers you need some sort of clustered NFS
>> service with shared storage. We can't currently support concurrent
>> access to shared storage from multiple NFS servers, so all that's
>> possible active/passive failover. Also, people that set that up
>> normally depend on a floating IP address--I'm not sure if there's an
>> equivalent for VSOCK.
>>
>
> Actually, this might be a use case for re-exporting NFS. If the host
> could re-export a NFS mount to the guests, then you don't necessarily
> need a clustered filesystem.
>
> OTOH, this would not solve the problem of migrating locks, which is not
> really easy to support in the current state model for NFSv4.x.

Some alternatives:

- Make the local NFS server's exports read-only, NFSv3
only, and do not support locking. Ensure that the
filehandles and namespace are the same on every NFS
server.

- As Trond suggested, all the local NFS servers accessed
via AF_SOCK should re-export NFS filesystems that
are located elsewhere and are visible everywhere.

- Ensure there is an accompanying NFSv4 FS migration event
that moves the client's files (and possibly its open and
lock state) from the local NFS server to the destination
NFS server concurrent with the live migration.

If the client is aware of the FS migration, it will expect
the filehandles to be the same, but it can reconstruct
the open and lock state on the destination server (if that
server allows GRACEful recovery for that client).

This is possible in the protocol and implemented in the
Linux NFS client, but none of it is implemented in the
Linux NFS server.

--
Chuck Lever




2017-05-15 14:43:18

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Fri, May 12, 2017 at 01:00:47PM -0400, Chuck Lever wrote:
>
> > On May 12, 2017, at 11:01 AM, Trond Myklebust <[email protected]> wrote:
> >
> > On Fri, 2017-05-12 at 10:34 -0400, J. Bruce Fields wrote:
> >> On Fri, May 12, 2017 at 09:27:21AM -0400, Stefan Hajnoczi wrote:
> >>> Hi,
> >>> I've been working on NFS over the AF_VSOCK transport
> >>> (https://www.spinics.net/lists/linux-nfs/msg60292.html). AF_VSOCK
> >>> resets established network connections when the virtual machine is
> >>> migrated to a new host.
> >>>
> >>> The NFS client expects file handles and other state to remain valid
> >>> upon
> >>> reconnecting. This is not the case after VM live migration since
> >>> the
> >>> new host does not have the NFS server state from the old host.
> >>>
> >>> Volatile file handles have been suggested as a way to reflect that
> >>> state
> >>> does not persist across reconnect, but the Linux NFS client does
> >>> not
> >>> support volatile file handles.
> >>
> >> That's unlikely to change; the protocol allows the server to
> >> advertise
> >> volatile filehandles, but doesn't really give any tools to implement
> >> them reliably.
> >>
> >>> I saw NFS 4.1 has a way for a new server running with the same
> >>> network
> >>> address of an old server to communicate that it is indeed a new
> >>> server
> >>> instance. If the server owner/scope in the EXCHANGE_ID response
> >>> does
> >>> not match the previous server's values then the server is a new
> >>> instance.
> >>>
> >>> The implications of encountering a new server owner/scope upon
> >>> reconnect
> >>> aren't clear to me and I'm not sure to what extent the Linux
> >>> implementation handles this case. Can anyone explain what happens
> >>> if
> >>> the NFS client finds a new server owner/scope after reconnecting?
> >>
> >> I haven't tested it, but if it reconnects to the same IP address and
> >> finds out it's no longer talking to the same server, I think the only
> >> correct thing it could do would be to just fail all further access.
> >>
> >> There's no easy solution.
> >>
> >> To migrate between NFS servers you need some sort of clustered NFS
> >> service with shared storage. We can't currently support concurrent
> >> access to shared storage from multiple NFS servers, so all that's
> >> possible active/passive failover. Also, people that set that up
> >> normally depend on a floating IP address--I'm not sure if there's an
> >> equivalent for VSOCK.
> >>
> >
> > Actually, this might be a use case for re-exporting NFS. If the host
> > could re-export a NFS mount to the guests, then you don't necessarily
> > need a clustered filesystem.
> >
> > OTOH, this would not solve the problem of migrating locks, which is not
> > really easy to support in the current state model for NFSv4.x.
>
> Some alternatives:
>
> - Make the local NFS server's exports read-only, NFSv3
> only, and do not support locking. Ensure that the
> filehandles and namespace are the same on every NFS
> server.
>
> - As Trond suggested, all the local NFS servers accessed
> via AF_SOCK should re-export NFS filesystems that
> are located elsewhere and are visible everywhere.
>
> - Ensure there is an accompanying NFSv4 FS migration event
> that moves the client's files (and possibly its open and
> lock state) from the local NFS server to the destination
> NFS server concurrent with the live migration.
>
> If the client is aware of the FS migration, it will expect
> the filehandles to be the same, but it can reconstruct
> the open and lock state on the destination server (if that
> server allows GRACEful recovery for that client).
>
> This is possible in the protocol and implemented in the
> Linux NFS client, but none of it is implemented in the
> Linux NFS server.

Great, thanks for the pointers everyone.

It's clear to me that AF_VSOCK won't get NFS migration for free.
Initially live migration will not be supported.

Re-exporting sounds interesting - perhaps the new host could re-export
the old host's file systems. I'll look into the spec and code.

Stefan


Attachments:
(No filename) (4.02 kB)
signature.asc (455.00 B)
Download all attachments

2017-05-15 16:02:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Mon, May 15, 2017 at 03:43:06PM +0100, Stefan Hajnoczi wrote:
> On Fri, May 12, 2017 at 01:00:47PM -0400, Chuck Lever wrote:
> >
> > > On May 12, 2017, at 11:01 AM, Trond Myklebust <[email protected]> wrote:
> > > Actually, this might be a use case for re-exporting NFS. If the host
> > > could re-export a NFS mount to the guests, then you don't necessarily
> > > need a clustered filesystem.
> > >
> > > OTOH, this would not solve the problem of migrating locks, which is not
> > > really easy to support in the current state model for NFSv4.x.
> >
> > Some alternatives:
> >
> > - Make the local NFS server's exports read-only, NFSv3
> > only, and do not support locking. Ensure that the
> > filehandles and namespace are the same on every NFS
> > server.
> >
> > - As Trond suggested, all the local NFS servers accessed
> > via AF_SOCK should re-export NFS filesystems that
> > are located elsewhere and are visible everywhere.
> >
> > - Ensure there is an accompanying NFSv4 FS migration event
> > that moves the client's files (and possibly its open and
> > lock state) from the local NFS server to the destination
> > NFS server concurrent with the live migration.
> >
> > If the client is aware of the FS migration, it will expect
> > the filehandles to be the same, but it can reconstruct
> > the open and lock state on the destination server (if that
> > server allows GRACEful recovery for that client).
> >
> > This is possible in the protocol and implemented in the
> > Linux NFS client, but none of it is implemented in the
> > Linux NFS server.
>
> Great, thanks for the pointers everyone.
>
> It's clear to me that AF_VSOCK won't get NFS migration for free.
> Initially live migration will not be supported.
>
> Re-exporting sounds interesting - perhaps the new host could re-export
> the old host's file systems. I'll look into the spec and code.

I've since forgotten the limitations of the nfs reexport series.

Locking (lock recovery, specifically) seems like the biggest problem to
solve to improve clustered nfs service; without that, it might actually
be easier than reexporting, I don't know. If there's a use case for
clustered nfs service that doesn't support file locking, maybe we should
look into it.

--b.

2017-05-16 13:11:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

I think you explained this before, perhaps you could just offer a
pointer: remind us what your requirements or use cases are especially
for VM migration?

--b.

2017-05-16 13:33:43

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Mon, May 15, 2017 at 12:02:48PM -0400, J. Bruce Fields wrote:
> On Mon, May 15, 2017 at 03:43:06PM +0100, Stefan Hajnoczi wrote:
> > On Fri, May 12, 2017 at 01:00:47PM -0400, Chuck Lever wrote:
> > >
> > > > On May 12, 2017, at 11:01 AM, Trond Myklebust <[email protected]> wrote:
> > > > Actually, this might be a use case for re-exporting NFS. If the host
> > > > could re-export a NFS mount to the guests, then you don't necessarily
> > > > need a clustered filesystem.
> > > >
> > > > OTOH, this would not solve the problem of migrating locks, which is not
> > > > really easy to support in the current state model for NFSv4.x.
> > >
> > > Some alternatives:
> > >
> > > - Make the local NFS server's exports read-only, NFSv3
> > > only, and do not support locking. Ensure that the
> > > filehandles and namespace are the same on every NFS
> > > server.
> > >
> > > - As Trond suggested, all the local NFS servers accessed
> > > via AF_SOCK should re-export NFS filesystems that
> > > are located elsewhere and are visible everywhere.
> > >
> > > - Ensure there is an accompanying NFSv4 FS migration event
> > > that moves the client's files (and possibly its open and
> > > lock state) from the local NFS server to the destination
> > > NFS server concurrent with the live migration.
> > >
> > > If the client is aware of the FS migration, it will expect
> > > the filehandles to be the same, but it can reconstruct
> > > the open and lock state on the destination server (if that
> > > server allows GRACEful recovery for that client).
> > >
> > > This is possible in the protocol and implemented in the
> > > Linux NFS client, but none of it is implemented in the
> > > Linux NFS server.
> >
> > Great, thanks for the pointers everyone.
> >
> > It's clear to me that AF_VSOCK won't get NFS migration for free.
> > Initially live migration will not be supported.
> >
> > Re-exporting sounds interesting - perhaps the new host could re-export
> > the old host's file systems. I'll look into the spec and code.
>
> I've since forgotten the limitations of the nfs reexport series.
>
> Locking (lock recovery, specifically) seems like the biggest problem to
> solve to improve clustered nfs service; without that, it might actually
> be easier than reexporting, I don't know. If there's a use case for
> clustered nfs service that doesn't support file locking, maybe we should
> look into it.

I suspect many guests will have a dedicated/private export. The guest
will be the only client accessing its export. This could simplify the
locking issues.

That said, it would be nice to support full clustered operation.

Stefan


Attachments:
(No filename) (2.62 kB)
signature.asc (455.00 B)
Download all attachments

2017-05-16 13:36:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Tue, May 16, 2017 at 02:33:38PM +0100, Stefan Hajnoczi wrote:
> I suspect many guests will have a dedicated/private export. The guest
> will be the only client accessing its export. This could simplify the
> locking issues.

So why not migrate filesystem images instead of using NFS?

--b.

2017-05-17 14:33:17

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Tue, May 16, 2017 at 09:36:03AM -0400, J. Bruce Fields wrote:
> On Tue, May 16, 2017 at 02:33:38PM +0100, Stefan Hajnoczi wrote:
> > I suspect many guests will have a dedicated/private export. The guest
> > will be the only client accessing its export. This could simplify the
> > locking issues.
>
> So why not migrate filesystem images instead of using NFS?

Some users consider disk image files inconvenient because they cannot be
inspected and manipulated with regular shell utilities.

Especially scenarios where many VMs are launched with VM-specific data
files can benefit from using files directly instead of building disk
images.

I'm not saying all users just export per-VM directories, but it's a
common case and may be a good starting point if a general solution is
very hard.


Attachments:
(No filename) (795.00 B)
signature.asc (455.00 B)
Download all attachments

2017-05-18 13:34:44

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Tue, May 16, 2017 at 09:11:42AM -0400, J. Bruce Fields wrote:
> I think you explained this before, perhaps you could just offer a
> pointer: remind us what your requirements or use cases are especially
> for VM migration?

The NFS over AF_VSOCK configuration is:

A guest running on host mounts an NFS export from the host. The NFS
server may be kernel nfsd or an NFS frontend to a distributed storage
system like Ceph. A little more about these cases below.

Kernel nfsd is useful for sharing files. For example, the guest may
read some files from the host when it launches and/or it may write out
result files to the host when it shuts down. The user may also wish to
share their home directory between the guest and the host.

NFS frontends are a different use case. They hide distributed storage
systems from guests in cloud environments. This way guests don't see
the details of the Ceph, Gluster, etc nodes. Besides benefiting
security it also allows NFS-capable guests to run without installing
specific drivers for the distributed storage system. This use case is
"filesystem as a service".

The reason for using AF_VSOCK instead of TCP/IP is that traditional
networking configuration is fragile. Automatically adding a dedicated
NIC to the guest and choosing an IP subnet has a high chance of
conflicts (subnet collisions, network interface naming, firewall rules,
network management tools). AF_VSOCK is a zero-configuration
communications channel so it avoids these problems.

On to migration. For the most part, guests can be live migrated between
hosts without significant downtime or manual steps. PCI passthrough is
an example of a feature that makes it very hard to live migrate. I hope
we can allow migration with NFS, although some limitations may be
necessary to make it feasible.

There are two NFS over AF_VSOCK migration scenarios:

1. The files live on host H1 and host H2 cannot access the files
directly. There is no way for an NFS server on H2 to access those
same files unless the directory is copied along with the guest or H2
proxies to the NFS server on H1.

2. The files are accessible from both host H1 and host H2 because they
are on shared storage or distributed storage system. Here the
problem is "just" migrating the state from H1's NFS server to H2 so
that file handles remain valid.

Stefan


Attachments:
(No filename) (2.31 kB)
signature.asc (455.00 B)
Download all attachments

2017-05-18 14:28:23

by Chuck Lever III

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner


> On May 18, 2017, at 9:34 AM, Stefan Hajnoczi <[email protected]> wrote:
>
> On Tue, May 16, 2017 at 09:11:42AM -0400, J. Bruce Fields wrote:
>> I think you explained this before, perhaps you could just offer a
>> pointer: remind us what your requirements or use cases are especially
>> for VM migration?
>
> The NFS over AF_VSOCK configuration is:
>
> A guest running on host mounts an NFS export from the host. The NFS
> server may be kernel nfsd or an NFS frontend to a distributed storage
> system like Ceph. A little more about these cases below.
>
> Kernel nfsd is useful for sharing files. For example, the guest may
> read some files from the host when it launches and/or it may write out
> result files to the host when it shuts down. The user may also wish to
> share their home directory between the guest and the host.
>
> NFS frontends are a different use case. They hide distributed storage
> systems from guests in cloud environments. This way guests don't see
> the details of the Ceph, Gluster, etc nodes. Besides benefiting
> security it also allows NFS-capable guests to run without installing
> specific drivers for the distributed storage system. This use case is
> "filesystem as a service".
>
> The reason for using AF_VSOCK instead of TCP/IP is that traditional
> networking configuration is fragile. Automatically adding a dedicated
> NIC to the guest and choosing an IP subnet has a high chance of
> conflicts (subnet collisions, network interface naming, firewall rules,
> network management tools). AF_VSOCK is a zero-configuration
> communications channel so it avoids these problems.
>
> On to migration. For the most part, guests can be live migrated between
> hosts without significant downtime or manual steps. PCI passthrough is
> an example of a feature that makes it very hard to live migrate. I hope
> we can allow migration with NFS, although some limitations may be
> necessary to make it feasible.
>
> There are two NFS over AF_VSOCK migration scenarios:
>
> 1. The files live on host H1 and host H2 cannot access the files
> directly. There is no way for an NFS server on H2 to access those
> same files unless the directory is copied along with the guest or H2
> proxies to the NFS server on H1.

Having managed (and shared) storage on the physical host is
awkward. I know some cloud providers might do this today by
copying guest disk images down to the host's local disk, but
generally it's not a flexible primary deployment choice.

There's no good way to expand or replicate this pool of
storage. A backup scheme would need to access all physical
hosts. And the files are visible only on specific hosts.

IMO you want to treat local storage on each physical host as
a cache tier rather than as a back-end tier.


> 2. The files are accessible from both host H1 and host H2 because they
> are on shared storage or distributed storage system. Here the
> problem is "just" migrating the state from H1's NFS server to H2 so
> that file handles remain valid.

Essentially this is the re-export case, and this makes a lot
more sense to me from a storage administration point of view.

The pool of administered storage is not local to the physical
hosts running the guests, which is how I think cloud providers
would prefer to operate.

User storage would be accessible via an NFS share, but managed
in a Ceph object (with redundancy, a common high throughput
backup facility, and secure central management of user
identities).

Each host's NFS server could be configured to expose only the
the cloud storage resources for the tenants on that host. The
back-end storage (ie, Ceph) could operate on a private storage
area network for better security.

The only missing piece here is support in Linux-based NFS
servers for transparent state migration.


--
Chuck Lever




2017-05-18 15:04:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

T24gVGh1LCAyMDE3LTA1LTE4IGF0IDEwOjI4IC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
PiBPbiBNYXkgMTgsIDIwMTcsIGF0IDk6MzQgQU0sIFN0ZWZhbiBIYWpub2N6aSA8c3RlZmFuaGFA
cmVkaGF0LmNvbT4NCj4gPiB3cm90ZToNCj4gPiANCj4gPiBPbiBUdWUsIE1heSAxNiwgMjAxNyBh
dCAwOToxMTo0MkFNIC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6DQo+ID4gPiBJIHRoaW5r
IHlvdSBleHBsYWluZWQgdGhpcyBiZWZvcmUsIHBlcmhhcHMgeW91IGNvdWxkIGp1c3Qgb2ZmZXIg
YQ0KPiA+ID4gcG9pbnRlcjogcmVtaW5kIHVzIHdoYXQgeW91ciByZXF1aXJlbWVudHMgb3IgdXNl
IGNhc2VzIGFyZQ0KPiA+ID4gZXNwZWNpYWxseQ0KPiA+ID4gZm9yIFZNIG1pZ3JhdGlvbj8NCj4g
PiANCj4gPiBUaGUgTkZTIG92ZXIgQUZfVlNPQ0sgY29uZmlndXJhdGlvbiBpczoNCj4gPiANCj4g
PiBBIGd1ZXN0IHJ1bm5pbmcgb24gaG9zdCBtb3VudHMgYW4gTkZTIGV4cG9ydCBmcm9tIHRoZSBo
b3N0LsKgwqBUaGUNCj4gPiBORlMNCj4gPiBzZXJ2ZXIgbWF5IGJlIGtlcm5lbCBuZnNkIG9yIGFu
IE5GUyBmcm9udGVuZCB0byBhIGRpc3RyaWJ1dGVkDQo+ID4gc3RvcmFnZQ0KPiA+IHN5c3RlbSBs
aWtlIENlcGguwqDCoEEgbGl0dGxlIG1vcmUgYWJvdXQgdGhlc2UgY2FzZXMgYmVsb3cuDQo+ID4g
DQo+ID4gS2VybmVsIG5mc2QgaXMgdXNlZnVsIGZvciBzaGFyaW5nIGZpbGVzLsKgwqBGb3IgZXhh
bXBsZSwgdGhlIGd1ZXN0DQo+ID4gbWF5DQo+ID4gcmVhZCBzb21lIGZpbGVzIGZyb20gdGhlIGhv
c3Qgd2hlbiBpdCBsYXVuY2hlcyBhbmQvb3IgaXQgbWF5IHdyaXRlDQo+ID4gb3V0DQo+ID4gcmVz
dWx0IGZpbGVzIHRvIHRoZSBob3N0IHdoZW4gaXQgc2h1dHMgZG93bi7CoMKgVGhlIHVzZXIgbWF5
IGFsc28NCj4gPiB3aXNoIHRvDQo+ID4gc2hhcmUgdGhlaXIgaG9tZSBkaXJlY3RvcnkgYmV0d2Vl
biB0aGUgZ3Vlc3QgYW5kIHRoZSBob3N0Lg0KPiA+IA0KPiA+IE5GUyBmcm9udGVuZHMgYXJlIGEg
ZGlmZmVyZW50IHVzZSBjYXNlLsKgwqBUaGV5IGhpZGUgZGlzdHJpYnV0ZWQNCj4gPiBzdG9yYWdl
DQo+ID4gc3lzdGVtcyBmcm9tIGd1ZXN0cyBpbiBjbG91ZCBlbnZpcm9ubWVudHMuwqDCoFRoaXMg
d2F5IGd1ZXN0cyBkb24ndA0KPiA+IHNlZQ0KPiA+IHRoZSBkZXRhaWxzIG9mIHRoZSBDZXBoLCBH
bHVzdGVyLCBldGMgbm9kZXMuwqDCoEJlc2lkZXMgYmVuZWZpdGluZw0KPiA+IHNlY3VyaXR5IGl0
IGFsc28gYWxsb3dzIE5GUy1jYXBhYmxlIGd1ZXN0cyB0byBydW4gd2l0aG91dA0KPiA+IGluc3Rh
bGxpbmcNCj4gPiBzcGVjaWZpYyBkcml2ZXJzIGZvciB0aGUgZGlzdHJpYnV0ZWQgc3RvcmFnZSBz
eXN0ZW0uwqDCoFRoaXMgdXNlIGNhc2UNCj4gPiBpcw0KPiA+ICJmaWxlc3lzdGVtIGFzIGEgc2Vy
dmljZSIuDQo+ID4gDQo+ID4gVGhlIHJlYXNvbiBmb3IgdXNpbmcgQUZfVlNPQ0sgaW5zdGVhZCBv
ZiBUQ1AvSVAgaXMgdGhhdCB0cmFkaXRpb25hbA0KPiA+IG5ldHdvcmtpbmcgY29uZmlndXJhdGlv
biBpcyBmcmFnaWxlLsKgwqBBdXRvbWF0aWNhbGx5IGFkZGluZyBhDQo+ID4gZGVkaWNhdGVkDQo+
ID4gTklDIHRvIHRoZSBndWVzdCBhbmQgY2hvb3NpbmcgYW4gSVAgc3VibmV0IGhhcyBhIGhpZ2gg
Y2hhbmNlIG9mDQo+ID4gY29uZmxpY3RzIChzdWJuZXQgY29sbGlzaW9ucywgbmV0d29yayBpbnRl
cmZhY2UgbmFtaW5nLCBmaXJld2FsbA0KPiA+IHJ1bGVzLA0KPiA+IG5ldHdvcmsgbWFuYWdlbWVu
dCB0b29scykuwqDCoEFGX1ZTT0NLIGlzIGEgemVyby1jb25maWd1cmF0aW9uDQo+ID4gY29tbXVu
aWNhdGlvbnMgY2hhbm5lbCBzbyBpdCBhdm9pZHMgdGhlc2UgcHJvYmxlbXMuDQo+ID4gDQo+ID4g
T24gdG8gbWlncmF0aW9uLsKgwqBGb3IgdGhlIG1vc3QgcGFydCwgZ3Vlc3RzIGNhbiBiZSBsaXZl
IG1pZ3JhdGVkDQo+ID4gYmV0d2Vlbg0KPiA+IGhvc3RzIHdpdGhvdXQgc2lnbmlmaWNhbnQgZG93
bnRpbWUgb3IgbWFudWFsIHN0ZXBzLsKgwqBQQ0kNCj4gPiBwYXNzdGhyb3VnaCBpcw0KPiA+IGFu
IGV4YW1wbGUgb2YgYSBmZWF0dXJlIHRoYXQgbWFrZXMgaXQgdmVyeSBoYXJkIHRvIGxpdmUgbWln
cmF0ZS7CoMKgSQ0KPiA+IGhvcGUNCj4gPiB3ZSBjYW4gYWxsb3cgbWlncmF0aW9uIHdpdGggTkZT
LCBhbHRob3VnaCBzb21lIGxpbWl0YXRpb25zIG1heSBiZQ0KPiA+IG5lY2Vzc2FyeSB0byBtYWtl
IGl0IGZlYXNpYmxlLg0KPiA+IA0KPiA+IFRoZXJlIGFyZSB0d28gTkZTIG92ZXIgQUZfVlNPQ0sg
bWlncmF0aW9uIHNjZW5hcmlvczoNCj4gPiANCj4gPiAxLiBUaGUgZmlsZXMgbGl2ZSBvbiBob3N0
IEgxIGFuZCBob3N0IEgyIGNhbm5vdCBhY2Nlc3MgdGhlIGZpbGVzDQo+ID4gwqAgZGlyZWN0bHku
wqDCoFRoZXJlIGlzIG5vIHdheSBmb3IgYW4gTkZTIHNlcnZlciBvbiBIMiB0byBhY2Nlc3MNCj4g
PiB0aG9zZQ0KPiA+IMKgIHNhbWUgZmlsZXMgdW5sZXNzIHRoZSBkaXJlY3RvcnkgaXMgY29waWVk
IGFsb25nIHdpdGggdGhlIGd1ZXN0IG9yDQo+ID4gSDINCj4gPiDCoCBwcm94aWVzIHRvIHRoZSBO
RlMgc2VydmVyIG9uIEgxLg0KPiANCj4gSGF2aW5nIG1hbmFnZWQgKGFuZCBzaGFyZWQpIHN0b3Jh
Z2Ugb24gdGhlIHBoeXNpY2FsIGhvc3QgaXMNCj4gYXdrd2FyZC4gSSBrbm93IHNvbWUgY2xvdWQg
cHJvdmlkZXJzIG1pZ2h0IGRvIHRoaXMgdG9kYXkgYnkNCj4gY29weWluZyBndWVzdCBkaXNrIGlt
YWdlcyBkb3duIHRvIHRoZSBob3N0J3MgbG9jYWwgZGlzaywgYnV0DQo+IGdlbmVyYWxseSBpdCdz
IG5vdCBhIGZsZXhpYmxlIHByaW1hcnkgZGVwbG95bWVudCBjaG9pY2UuDQo+IA0KPiBUaGVyZSdz
IG5vIGdvb2Qgd2F5IHRvIGV4cGFuZCBvciByZXBsaWNhdGUgdGhpcyBwb29sIG9mDQo+IHN0b3Jh
Z2UuIEEgYmFja3VwIHNjaGVtZSB3b3VsZCBuZWVkIHRvIGFjY2VzcyBhbGwgcGh5c2ljYWwNCj4g
aG9zdHMuIEFuZCB0aGUgZmlsZXMgYXJlIHZpc2libGUgb25seSBvbiBzcGVjaWZpYyBob3N0cy4N
Cj4gDQo+IElNTyB5b3Ugd2FudCB0byB0cmVhdCBsb2NhbCBzdG9yYWdlIG9uIGVhY2ggcGh5c2lj
YWwgaG9zdCBhcw0KPiBhIGNhY2hlIHRpZXIgcmF0aGVyIHRoYW4gYXMgYSBiYWNrLWVuZCB0aWVy
Lg0KPiANCj4gDQo+ID4gMi4gVGhlIGZpbGVzIGFyZSBhY2Nlc3NpYmxlIGZyb20gYm90aCBob3N0
IEgxIGFuZCBob3N0IEgyIGJlY2F1c2UNCj4gPiB0aGV5DQo+ID4gwqAgYXJlIG9uIHNoYXJlZCBz
dG9yYWdlIG9yIGRpc3RyaWJ1dGVkIHN0b3JhZ2Ugc3lzdGVtLsKgwqBIZXJlIHRoZQ0KPiA+IMKg
IHByb2JsZW0gaXMgImp1c3QiIG1pZ3JhdGluZyB0aGUgc3RhdGUgZnJvbSBIMSdzIE5GUyBzZXJ2
ZXIgdG8gSDINCj4gPiBzbw0KPiA+IMKgIHRoYXQgZmlsZSBoYW5kbGVzIHJlbWFpbiB2YWxpZC4N
Cj4gDQo+IEVzc2VudGlhbGx5IHRoaXMgaXMgdGhlIHJlLWV4cG9ydCBjYXNlLCBhbmQgdGhpcyBt
YWtlcyBhIGxvdA0KPiBtb3JlIHNlbnNlIHRvIG1lIGZyb20gYSBzdG9yYWdlIGFkbWluaXN0cmF0
aW9uIHBvaW50IG9mIHZpZXcuDQo+IA0KPiBUaGUgcG9vbCBvZiBhZG1pbmlzdGVyZWQgc3RvcmFn
ZSBpcyBub3QgbG9jYWwgdG8gdGhlIHBoeXNpY2FsDQo+IGhvc3RzIHJ1bm5pbmcgdGhlIGd1ZXN0
cywgd2hpY2ggaXMgaG93IEkgdGhpbmsgY2xvdWQgcHJvdmlkZXJzDQo+IHdvdWxkIHByZWZlciB0
byBvcGVyYXRlLg0KPiANCj4gVXNlciBzdG9yYWdlIHdvdWxkIGJlIGFjY2Vzc2libGUgdmlhIGFu
IE5GUyBzaGFyZSwgYnV0IG1hbmFnZWQNCj4gaW4gYSBDZXBoIG9iamVjdCAod2l0aCByZWR1bmRh
bmN5LCBhIGNvbW1vbiBoaWdoIHRocm91Z2hwdXQNCj4gYmFja3VwIGZhY2lsaXR5LCBhbmQgc2Vj
dXJlIGNlbnRyYWwgbWFuYWdlbWVudCBvZiB1c2VyDQo+IGlkZW50aXRpZXMpLg0KPiANCj4gRWFj
aCBob3N0J3MgTkZTIHNlcnZlciBjb3VsZCBiZSBjb25maWd1cmVkIHRvIGV4cG9zZSBvbmx5IHRo
ZQ0KPiB0aGUgY2xvdWQgc3RvcmFnZSByZXNvdXJjZXMgZm9yIHRoZSB0ZW5hbnRzIG9uIHRoYXQg
aG9zdC4gVGhlDQo+IGJhY2stZW5kIHN0b3JhZ2UgKGllLCBDZXBoKSBjb3VsZCBvcGVyYXRlIG9u
IGEgcHJpdmF0ZSBzdG9yYWdlDQo+IGFyZWEgbmV0d29yayBmb3IgYmV0dGVyIHNlY3VyaXR5Lg0K
PiANCj4gVGhlIG9ubHkgbWlzc2luZyBwaWVjZSBoZXJlIGlzIHN1cHBvcnQgaW4gTGludXgtYmFz
ZWQgTkZTDQo+IHNlcnZlcnMgZm9yIHRyYW5zcGFyZW50IHN0YXRlIG1pZ3JhdGlvbi4NCg0KTm90
IHJlYWxseS4gSW4gYSBjb250YWluZXJpc2VkIHdvcmxkLCB3ZSdyZSBnb2luZyB0byBzZWUgbW9y
ZSBhbmQgbW9yZQ0KY2FzZXMgd2hlcmUganVzdCBhIHNpbmdsZSBwcm9jZXNzL2FwcGxpY2F0aW9u
IGdldHMgbWlncmF0ZWQgZnJvbSBvbmUNCk5GUyBjbGllbnQgdG8gYW5vdGhlciAoYW5kIHllcywg
YSByZS1leHBvcnRlci9wcm94eSBvZiBORlMgaXMganVzdA0KYW5vdGhlciBjbGllbnQgYXMgZmFy
IGFzIHRoZSBvcmlnaW5hbCBzZXJ2ZXIgaXMgY29uY2VybmVkKS4NCklPVzogSSB0aGluayB3ZSB3
YW50IHRvIGFsbG93IGEgY2xpZW50IHRvIG1pZ3JhdGUgc29tZSBwYXJ0cyBvZiBpdHMNCmxvY2sg
c3RhdGUgdG8gYW5vdGhlciBjbGllbnQsIHdpdGhvdXQgbmVjZXNzYXJpbHkgcmVxdWlyaW5nIGV2
ZXJ5DQpwcm9jZXNzIGJlaW5nIG1pZ3JhdGVkIHRvIGhhdmUgaXRzIG93biBjbGllbnRpZC4NCg0K
SSdtIGluIHRoZSBwcm9jZXNzIG9mIGJ1aWxkaW5nIHVwIGEgbGF1bmRyeSBsaXN0IG9mIHByb2Js
ZW1zIHRoYXQgSSdkDQpsaWtlIHRvIHNlZSBzb2x2ZWQgYXMgcGFydCBvZiB0aGUgbmV3IElFVEYg
V0cgY2hhcnRlci4gVGhpcyBpcyBvbmUNCmlzc3VlIHRoYXQgSSB0aGluayBzaG91bGQgYmUgb24g
dGhhdCBsaXN0Lg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWlu
dGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K


2017-05-18 15:08:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Thu, May 18, 2017 at 03:04:50PM +0000, Trond Myklebust wrote:
> On Thu, 2017-05-18 at 10:28 -0400, Chuck Lever wrote:
> > > On May 18, 2017, at 9:34 AM, Stefan Hajnoczi <[email protected]>
> > > wrote:
> > >
> > > On Tue, May 16, 2017 at 09:11:42AM -0400, J. Bruce Fields wrote:
> > > > I think you explained this before, perhaps you could just offer a
> > > > pointer: remind us what your requirements or use cases are
> > > > especially
> > > > for VM migration?
> > >
> > > The NFS over AF_VSOCK configuration is:
> > >
> > > A guest running on host mounts an NFS export from the host.??The
> > > NFS
> > > server may be kernel nfsd or an NFS frontend to a distributed
> > > storage
> > > system like Ceph.??A little more about these cases below.
> > >
> > > Kernel nfsd is useful for sharing files.??For example, the guest
> > > may
> > > read some files from the host when it launches and/or it may write
> > > out
> > > result files to the host when it shuts down.??The user may also
> > > wish to
> > > share their home directory between the guest and the host.
> > >
> > > NFS frontends are a different use case.??They hide distributed
> > > storage
> > > systems from guests in cloud environments.??This way guests don't
> > > see
> > > the details of the Ceph, Gluster, etc nodes.??Besides benefiting
> > > security it also allows NFS-capable guests to run without
> > > installing
> > > specific drivers for the distributed storage system.??This use case
> > > is
> > > "filesystem as a service".
> > >
> > > The reason for using AF_VSOCK instead of TCP/IP is that traditional
> > > networking configuration is fragile.??Automatically adding a
> > > dedicated
> > > NIC to the guest and choosing an IP subnet has a high chance of
> > > conflicts (subnet collisions, network interface naming, firewall
> > > rules,
> > > network management tools).??AF_VSOCK is a zero-configuration
> > > communications channel so it avoids these problems.
> > >
> > > On to migration.??For the most part, guests can be live migrated
> > > between
> > > hosts without significant downtime or manual steps.??PCI
> > > passthrough is
> > > an example of a feature that makes it very hard to live migrate.??I
> > > hope
> > > we can allow migration with NFS, although some limitations may be
> > > necessary to make it feasible.
> > >
> > > There are two NFS over AF_VSOCK migration scenarios:
> > >
> > > 1. The files live on host H1 and host H2 cannot access the files
> > > ? directly.??There is no way for an NFS server on H2 to access
> > > those
> > > ? same files unless the directory is copied along with the guest or
> > > H2
> > > ? proxies to the NFS server on H1.
> >
> > Having managed (and shared) storage on the physical host is
> > awkward. I know some cloud providers might do this today by
> > copying guest disk images down to the host's local disk, but
> > generally it's not a flexible primary deployment choice.
> >
> > There's no good way to expand or replicate this pool of
> > storage. A backup scheme would need to access all physical
> > hosts. And the files are visible only on specific hosts.
> >
> > IMO you want to treat local storage on each physical host as
> > a cache tier rather than as a back-end tier.
> >
> >
> > > 2. The files are accessible from both host H1 and host H2 because
> > > they
> > > ? are on shared storage or distributed storage system.??Here the
> > > ? problem is "just" migrating the state from H1's NFS server to H2
> > > so
> > > ? that file handles remain valid.
> >
> > Essentially this is the re-export case, and this makes a lot
> > more sense to me from a storage administration point of view.
> >
> > The pool of administered storage is not local to the physical
> > hosts running the guests, which is how I think cloud providers
> > would prefer to operate.
> >
> > User storage would be accessible via an NFS share, but managed
> > in a Ceph object (with redundancy, a common high throughput
> > backup facility, and secure central management of user
> > identities).
> >
> > Each host's NFS server could be configured to expose only the
> > the cloud storage resources for the tenants on that host. The
> > back-end storage (ie, Ceph) could operate on a private storage
> > area network for better security.
> >
> > The only missing piece here is support in Linux-based NFS
> > servers for transparent state migration.
>
> Not really. In a containerised world, we're going to see more and more
> cases where just a single process/application gets migrated from one
> NFS client to another (and yes, a re-exporter/proxy of NFS is just
> another client as far as the original server is concerned).
> IOW: I think we want to allow a client to migrate some parts of its
> lock state to another client, without necessarily requiring every
> process being migrated to have its own clientid.

It wouldn't have to be every process, it'd be every container, right?
What's the disadvantage of per-container clientids? I guess you lose
the chance to share delegations and caches.

--b.

2017-05-18 15:15:24

by Chuck Lever III

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner


> On May 18, 2017, at 11:08 AM, J. Bruce Fields <[email protected]> wrote:
>
> On Thu, May 18, 2017 at 03:04:50PM +0000, Trond Myklebust wrote:
>> On Thu, 2017-05-18 at 10:28 -0400, Chuck Lever wrote:
>>>> On May 18, 2017, at 9:34 AM, Stefan Hajnoczi <[email protected]>
>>>> wrote:
>>>>
>>>> On Tue, May 16, 2017 at 09:11:42AM -0400, J. Bruce Fields wrote:
>>>>> I think you explained this before, perhaps you could just offer a
>>>>> pointer: remind us what your requirements or use cases are
>>>>> especially
>>>>> for VM migration?
>>>>
>>>> The NFS over AF_VSOCK configuration is:
>>>>
>>>> A guest running on host mounts an NFS export from the host. The
>>>> NFS
>>>> server may be kernel nfsd or an NFS frontend to a distributed
>>>> storage
>>>> system like Ceph. A little more about these cases below.
>>>>
>>>> Kernel nfsd is useful for sharing files. For example, the guest
>>>> may
>>>> read some files from the host when it launches and/or it may write
>>>> out
>>>> result files to the host when it shuts down. The user may also
>>>> wish to
>>>> share their home directory between the guest and the host.
>>>>
>>>> NFS frontends are a different use case. They hide distributed
>>>> storage
>>>> systems from guests in cloud environments. This way guests don't
>>>> see
>>>> the details of the Ceph, Gluster, etc nodes. Besides benefiting
>>>> security it also allows NFS-capable guests to run without
>>>> installing
>>>> specific drivers for the distributed storage system. This use case
>>>> is
>>>> "filesystem as a service".
>>>>
>>>> The reason for using AF_VSOCK instead of TCP/IP is that traditional
>>>> networking configuration is fragile. Automatically adding a
>>>> dedicated
>>>> NIC to the guest and choosing an IP subnet has a high chance of
>>>> conflicts (subnet collisions, network interface naming, firewall
>>>> rules,
>>>> network management tools). AF_VSOCK is a zero-configuration
>>>> communications channel so it avoids these problems.
>>>>
>>>> On to migration. For the most part, guests can be live migrated
>>>> between
>>>> hosts without significant downtime or manual steps. PCI
>>>> passthrough is
>>>> an example of a feature that makes it very hard to live migrate. I
>>>> hope
>>>> we can allow migration with NFS, although some limitations may be
>>>> necessary to make it feasible.
>>>>
>>>> There are two NFS over AF_VSOCK migration scenarios:
>>>>
>>>> 1. The files live on host H1 and host H2 cannot access the files
>>>> directly. There is no way for an NFS server on H2 to access
>>>> those
>>>> same files unless the directory is copied along with the guest or
>>>> H2
>>>> proxies to the NFS server on H1.
>>>
>>> Having managed (and shared) storage on the physical host is
>>> awkward. I know some cloud providers might do this today by
>>> copying guest disk images down to the host's local disk, but
>>> generally it's not a flexible primary deployment choice.
>>>
>>> There's no good way to expand or replicate this pool of
>>> storage. A backup scheme would need to access all physical
>>> hosts. And the files are visible only on specific hosts.
>>>
>>> IMO you want to treat local storage on each physical host as
>>> a cache tier rather than as a back-end tier.
>>>
>>>
>>>> 2. The files are accessible from both host H1 and host H2 because
>>>> they
>>>> are on shared storage or distributed storage system. Here the
>>>> problem is "just" migrating the state from H1's NFS server to H2
>>>> so
>>>> that file handles remain valid.
>>>
>>> Essentially this is the re-export case, and this makes a lot
>>> more sense to me from a storage administration point of view.
>>>
>>> The pool of administered storage is not local to the physical
>>> hosts running the guests, which is how I think cloud providers
>>> would prefer to operate.
>>>
>>> User storage would be accessible via an NFS share, but managed
>>> in a Ceph object (with redundancy, a common high throughput
>>> backup facility, and secure central management of user
>>> identities).
>>>
>>> Each host's NFS server could be configured to expose only the
>>> the cloud storage resources for the tenants on that host. The
>>> back-end storage (ie, Ceph) could operate on a private storage
>>> area network for better security.
>>>
>>> The only missing piece here is support in Linux-based NFS
>>> servers for transparent state migration.
>>
>> Not really. In a containerised world, we're going to see more and more
>> cases where just a single process/application gets migrated from one
>> NFS client to another (and yes, a re-exporter/proxy of NFS is just
>> another client as far as the original server is concerned).
>> IOW: I think we want to allow a client to migrate some parts of its
>> lock state to another client, without necessarily requiring every
>> process being migrated to have its own clientid.
>
> It wouldn't have to be every process, it'd be every container, right?
> What's the disadvantage of per-container clientids? I guess you lose
> the chance to share delegations and caches.

Can't each container have it's own net namespace, and each net
namespace have its own client ID?

(I agree, btw, this class of problems should be considered in
the new nfsv4 WG charter. Thanks for doing that, Trond).


--
Chuck Lever




2017-05-18 15:17:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

T24gVGh1LCAyMDE3LTA1LTE4IGF0IDExOjA4IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIFRodSwgTWF5IDE4LCAyMDE3IGF0IDAzOjA0OjUwUE0gKzAwMDAsIFRyb25kIE15a2xl
YnVzdCB3cm90ZToNCj4gPiBPbiBUaHUsIDIwMTctMDUtMTggYXQgMTA6MjggLTA0MDAsIENodWNr
IExldmVyIHdyb3RlOg0KPiA+ID4gPiBPbiBNYXkgMTgsIDIwMTcsIGF0IDk6MzQgQU0sIFN0ZWZh
biBIYWpub2N6aSA8c3RlZmFuaGFAcmVkaGF0LmMNCj4gPiA+ID4gb20+DQo+ID4gPiA+IHdyb3Rl
Og0KPiA+ID4gPiANCj4gPiA+ID4gT24gVHVlLCBNYXkgMTYsIDIwMTcgYXQgMDk6MTE6NDJBTSAt
MDQwMCwgSi4gQnJ1Y2UgRmllbGRzDQo+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+IEkgdGhpbmsg
eW91IGV4cGxhaW5lZCB0aGlzIGJlZm9yZSwgcGVyaGFwcyB5b3UgY291bGQganVzdA0KPiA+ID4g
PiA+IG9mZmVyIGENCj4gPiA+ID4gPiBwb2ludGVyOiByZW1pbmQgdXMgd2hhdCB5b3VyIHJlcXVp
cmVtZW50cyBvciB1c2UgY2FzZXMgYXJlDQo+ID4gPiA+ID4gZXNwZWNpYWxseQ0KPiA+ID4gPiA+
IGZvciBWTSBtaWdyYXRpb24/DQo+ID4gPiA+IA0KPiA+ID4gPiBUaGUgTkZTIG92ZXIgQUZfVlNP
Q0sgY29uZmlndXJhdGlvbiBpczoNCj4gPiA+ID4gDQo+ID4gPiA+IEEgZ3Vlc3QgcnVubmluZyBv
biBob3N0IG1vdW50cyBhbiBORlMgZXhwb3J0IGZyb20gdGhlDQo+ID4gPiA+IGhvc3QuwqDCoFRo
ZQ0KPiA+ID4gPiBORlMNCj4gPiA+ID4gc2VydmVyIG1heSBiZSBrZXJuZWwgbmZzZCBvciBhbiBO
RlMgZnJvbnRlbmQgdG8gYSBkaXN0cmlidXRlZA0KPiA+ID4gPiBzdG9yYWdlDQo+ID4gPiA+IHN5
c3RlbSBsaWtlIENlcGguwqDCoEEgbGl0dGxlIG1vcmUgYWJvdXQgdGhlc2UgY2FzZXMgYmVsb3cu
DQo+ID4gPiA+IA0KPiA+ID4gPiBLZXJuZWwgbmZzZCBpcyB1c2VmdWwgZm9yIHNoYXJpbmcgZmls
ZXMuwqDCoEZvciBleGFtcGxlLCB0aGUNCj4gPiA+ID4gZ3Vlc3QNCj4gPiA+ID4gbWF5DQo+ID4g
PiA+IHJlYWQgc29tZSBmaWxlcyBmcm9tIHRoZSBob3N0IHdoZW4gaXQgbGF1bmNoZXMgYW5kL29y
IGl0IG1heQ0KPiA+ID4gPiB3cml0ZQ0KPiA+ID4gPiBvdXQNCj4gPiA+ID4gcmVzdWx0IGZpbGVz
IHRvIHRoZSBob3N0IHdoZW4gaXQgc2h1dHMgZG93bi7CoMKgVGhlIHVzZXIgbWF5IGFsc28NCj4g
PiA+ID4gd2lzaCB0bw0KPiA+ID4gPiBzaGFyZSB0aGVpciBob21lIGRpcmVjdG9yeSBiZXR3ZWVu
IHRoZSBndWVzdCBhbmQgdGhlIGhvc3QuDQo+ID4gPiA+IA0KPiA+ID4gPiBORlMgZnJvbnRlbmRz
IGFyZSBhIGRpZmZlcmVudCB1c2UgY2FzZS7CoMKgVGhleSBoaWRlIGRpc3RyaWJ1dGVkDQo+ID4g
PiA+IHN0b3JhZ2UNCj4gPiA+ID4gc3lzdGVtcyBmcm9tIGd1ZXN0cyBpbiBjbG91ZCBlbnZpcm9u
bWVudHMuwqDCoFRoaXMgd2F5IGd1ZXN0cw0KPiA+ID4gPiBkb24ndA0KPiA+ID4gPiBzZWUNCj4g
PiA+ID4gdGhlIGRldGFpbHMgb2YgdGhlIENlcGgsIEdsdXN0ZXIsIGV0YyBub2Rlcy7CoMKgQmVz
aWRlcw0KPiA+ID4gPiBiZW5lZml0aW5nDQo+ID4gPiA+IHNlY3VyaXR5IGl0IGFsc28gYWxsb3dz
IE5GUy1jYXBhYmxlIGd1ZXN0cyB0byBydW4gd2l0aG91dA0KPiA+ID4gPiBpbnN0YWxsaW5nDQo+
ID4gPiA+IHNwZWNpZmljIGRyaXZlcnMgZm9yIHRoZSBkaXN0cmlidXRlZCBzdG9yYWdlIHN5c3Rl
bS7CoMKgVGhpcyB1c2UNCj4gPiA+ID4gY2FzZQ0KPiA+ID4gPiBpcw0KPiA+ID4gPiAiZmlsZXN5
c3RlbSBhcyBhIHNlcnZpY2UiLg0KPiA+ID4gPiANCj4gPiA+ID4gVGhlIHJlYXNvbiBmb3IgdXNp
bmcgQUZfVlNPQ0sgaW5zdGVhZCBvZiBUQ1AvSVAgaXMgdGhhdA0KPiA+ID4gPiB0cmFkaXRpb25h
bA0KPiA+ID4gPiBuZXR3b3JraW5nIGNvbmZpZ3VyYXRpb24gaXMgZnJhZ2lsZS7CoMKgQXV0b21h
dGljYWxseSBhZGRpbmcgYQ0KPiA+ID4gPiBkZWRpY2F0ZWQNCj4gPiA+ID4gTklDIHRvIHRoZSBn
dWVzdCBhbmQgY2hvb3NpbmcgYW4gSVAgc3VibmV0IGhhcyBhIGhpZ2ggY2hhbmNlIG9mDQo+ID4g
PiA+IGNvbmZsaWN0cyAoc3VibmV0IGNvbGxpc2lvbnMsIG5ldHdvcmsgaW50ZXJmYWNlIG5hbWlu
ZywNCj4gPiA+ID4gZmlyZXdhbGwNCj4gPiA+ID4gcnVsZXMsDQo+ID4gPiA+IG5ldHdvcmsgbWFu
YWdlbWVudCB0b29scykuwqDCoEFGX1ZTT0NLIGlzIGEgemVyby1jb25maWd1cmF0aW9uDQo+ID4g
PiA+IGNvbW11bmljYXRpb25zIGNoYW5uZWwgc28gaXQgYXZvaWRzIHRoZXNlIHByb2JsZW1zLg0K
PiA+ID4gPiANCj4gPiA+ID4gT24gdG8gbWlncmF0aW9uLsKgwqBGb3IgdGhlIG1vc3QgcGFydCwg
Z3Vlc3RzIGNhbiBiZSBsaXZlDQo+ID4gPiA+IG1pZ3JhdGVkDQo+ID4gPiA+IGJldHdlZW4NCj4g
PiA+ID4gaG9zdHMgd2l0aG91dCBzaWduaWZpY2FudCBkb3dudGltZSBvciBtYW51YWwgc3RlcHMu
wqDCoFBDSQ0KPiA+ID4gPiBwYXNzdGhyb3VnaCBpcw0KPiA+ID4gPiBhbiBleGFtcGxlIG9mIGEg
ZmVhdHVyZSB0aGF0IG1ha2VzIGl0IHZlcnkgaGFyZCB0byBsaXZlDQo+ID4gPiA+IG1pZ3JhdGUu
wqDCoEkNCj4gPiA+ID4gaG9wZQ0KPiA+ID4gPiB3ZSBjYW4gYWxsb3cgbWlncmF0aW9uIHdpdGgg
TkZTLCBhbHRob3VnaCBzb21lIGxpbWl0YXRpb25zIG1heQ0KPiA+ID4gPiBiZQ0KPiA+ID4gPiBu
ZWNlc3NhcnkgdG8gbWFrZSBpdCBmZWFzaWJsZS4NCj4gPiA+ID4gDQo+ID4gPiA+IFRoZXJlIGFy
ZSB0d28gTkZTIG92ZXIgQUZfVlNPQ0sgbWlncmF0aW9uIHNjZW5hcmlvczoNCj4gPiA+ID4gDQo+
ID4gPiA+IDEuIFRoZSBmaWxlcyBsaXZlIG9uIGhvc3QgSDEgYW5kIGhvc3QgSDIgY2Fubm90IGFj
Y2VzcyB0aGUNCj4gPiA+ID4gZmlsZXMNCj4gPiA+ID4gwqAgZGlyZWN0bHkuwqDCoFRoZXJlIGlz
IG5vIHdheSBmb3IgYW4gTkZTIHNlcnZlciBvbiBIMiB0byBhY2Nlc3MNCj4gPiA+ID4gdGhvc2UN
Cj4gPiA+ID4gwqAgc2FtZSBmaWxlcyB1bmxlc3MgdGhlIGRpcmVjdG9yeSBpcyBjb3BpZWQgYWxv
bmcgd2l0aCB0aGUNCj4gPiA+ID4gZ3Vlc3Qgb3INCj4gPiA+ID4gSDINCj4gPiA+ID4gwqAgcHJv
eGllcyB0byB0aGUgTkZTIHNlcnZlciBvbiBIMS4NCj4gPiA+IA0KPiA+ID4gSGF2aW5nIG1hbmFn
ZWQgKGFuZCBzaGFyZWQpIHN0b3JhZ2Ugb24gdGhlIHBoeXNpY2FsIGhvc3QgaXMNCj4gPiA+IGF3
a3dhcmQuIEkga25vdyBzb21lIGNsb3VkIHByb3ZpZGVycyBtaWdodCBkbyB0aGlzIHRvZGF5IGJ5
DQo+ID4gPiBjb3B5aW5nIGd1ZXN0IGRpc2sgaW1hZ2VzIGRvd24gdG8gdGhlIGhvc3QncyBsb2Nh
bCBkaXNrLCBidXQNCj4gPiA+IGdlbmVyYWxseSBpdCdzIG5vdCBhIGZsZXhpYmxlIHByaW1hcnkg
ZGVwbG95bWVudCBjaG9pY2UuDQo+ID4gPiANCj4gPiA+IFRoZXJlJ3Mgbm8gZ29vZCB3YXkgdG8g
ZXhwYW5kIG9yIHJlcGxpY2F0ZSB0aGlzIHBvb2wgb2YNCj4gPiA+IHN0b3JhZ2UuIEEgYmFja3Vw
IHNjaGVtZSB3b3VsZCBuZWVkIHRvIGFjY2VzcyBhbGwgcGh5c2ljYWwNCj4gPiA+IGhvc3RzLiBB
bmQgdGhlIGZpbGVzIGFyZSB2aXNpYmxlIG9ubHkgb24gc3BlY2lmaWMgaG9zdHMuDQo+ID4gPiAN
Cj4gPiA+IElNTyB5b3Ugd2FudCB0byB0cmVhdCBsb2NhbCBzdG9yYWdlIG9uIGVhY2ggcGh5c2lj
YWwgaG9zdCBhcw0KPiA+ID4gYSBjYWNoZSB0aWVyIHJhdGhlciB0aGFuIGFzIGEgYmFjay1lbmQg
dGllci4NCj4gPiA+IA0KPiA+ID4gDQo+ID4gPiA+IDIuIFRoZSBmaWxlcyBhcmUgYWNjZXNzaWJs
ZSBmcm9tIGJvdGggaG9zdCBIMSBhbmQgaG9zdCBIMg0KPiA+ID4gPiBiZWNhdXNlDQo+ID4gPiA+
IHRoZXkNCj4gPiA+ID4gwqAgYXJlIG9uIHNoYXJlZCBzdG9yYWdlIG9yIGRpc3RyaWJ1dGVkIHN0
b3JhZ2Ugc3lzdGVtLsKgwqBIZXJlDQo+ID4gPiA+IHRoZQ0KPiA+ID4gPiDCoCBwcm9ibGVtIGlz
ICJqdXN0IiBtaWdyYXRpbmcgdGhlIHN0YXRlIGZyb20gSDEncyBORlMgc2VydmVyIHRvDQo+ID4g
PiA+IEgyDQo+ID4gPiA+IHNvDQo+ID4gPiA+IMKgIHRoYXQgZmlsZSBoYW5kbGVzIHJlbWFpbiB2
YWxpZC4NCj4gPiA+IA0KPiA+ID4gRXNzZW50aWFsbHkgdGhpcyBpcyB0aGUgcmUtZXhwb3J0IGNh
c2UsIGFuZCB0aGlzIG1ha2VzIGEgbG90DQo+ID4gPiBtb3JlIHNlbnNlIHRvIG1lIGZyb20gYSBz
dG9yYWdlIGFkbWluaXN0cmF0aW9uIHBvaW50IG9mIHZpZXcuDQo+ID4gPiANCj4gPiA+IFRoZSBw
b29sIG9mIGFkbWluaXN0ZXJlZCBzdG9yYWdlIGlzIG5vdCBsb2NhbCB0byB0aGUgcGh5c2ljYWwN
Cj4gPiA+IGhvc3RzIHJ1bm5pbmcgdGhlIGd1ZXN0cywgd2hpY2ggaXMgaG93IEkgdGhpbmsgY2xv
dWQgcHJvdmlkZXJzDQo+ID4gPiB3b3VsZCBwcmVmZXIgdG8gb3BlcmF0ZS4NCj4gPiA+IA0KPiA+
ID4gVXNlciBzdG9yYWdlIHdvdWxkIGJlIGFjY2Vzc2libGUgdmlhIGFuIE5GUyBzaGFyZSwgYnV0
IG1hbmFnZWQNCj4gPiA+IGluIGEgQ2VwaCBvYmplY3QgKHdpdGggcmVkdW5kYW5jeSwgYSBjb21t
b24gaGlnaCB0aHJvdWdocHV0DQo+ID4gPiBiYWNrdXAgZmFjaWxpdHksIGFuZCBzZWN1cmUgY2Vu
dHJhbCBtYW5hZ2VtZW50IG9mIHVzZXINCj4gPiA+IGlkZW50aXRpZXMpLg0KPiA+ID4gDQo+ID4g
PiBFYWNoIGhvc3QncyBORlMgc2VydmVyIGNvdWxkIGJlIGNvbmZpZ3VyZWQgdG8gZXhwb3NlIG9u
bHkgdGhlDQo+ID4gPiB0aGUgY2xvdWQgc3RvcmFnZSByZXNvdXJjZXMgZm9yIHRoZSB0ZW5hbnRz
IG9uIHRoYXQgaG9zdC4gVGhlDQo+ID4gPiBiYWNrLWVuZCBzdG9yYWdlIChpZSwgQ2VwaCkgY291
bGQgb3BlcmF0ZSBvbiBhIHByaXZhdGUgc3RvcmFnZQ0KPiA+ID4gYXJlYSBuZXR3b3JrIGZvciBi
ZXR0ZXIgc2VjdXJpdHkuDQo+ID4gPiANCj4gPiA+IFRoZSBvbmx5IG1pc3NpbmcgcGllY2UgaGVy
ZSBpcyBzdXBwb3J0IGluIExpbnV4LWJhc2VkIE5GUw0KPiA+ID4gc2VydmVycyBmb3IgdHJhbnNw
YXJlbnQgc3RhdGUgbWlncmF0aW9uLg0KPiA+IA0KPiA+IE5vdCByZWFsbHkuIEluIGEgY29udGFp
bmVyaXNlZCB3b3JsZCwgd2UncmUgZ29pbmcgdG8gc2VlIG1vcmUgYW5kDQo+ID4gbW9yZQ0KPiA+
IGNhc2VzIHdoZXJlIGp1c3QgYSBzaW5nbGUgcHJvY2Vzcy9hcHBsaWNhdGlvbiBnZXRzIG1pZ3Jh
dGVkIGZyb20NCj4gPiBvbmUNCj4gPiBORlMgY2xpZW50IHRvIGFub3RoZXIgKGFuZCB5ZXMsIGEg
cmUtZXhwb3J0ZXIvcHJveHkgb2YgTkZTIGlzIGp1c3QNCj4gPiBhbm90aGVyIGNsaWVudCBhcyBm
YXIgYXMgdGhlIG9yaWdpbmFsIHNlcnZlciBpcyBjb25jZXJuZWQpLg0KPiA+IElPVzogSSB0aGlu
ayB3ZSB3YW50IHRvIGFsbG93IGEgY2xpZW50IHRvIG1pZ3JhdGUgc29tZSBwYXJ0cyBvZiBpdHMN
Cj4gPiBsb2NrIHN0YXRlIHRvIGFub3RoZXIgY2xpZW50LCB3aXRob3V0IG5lY2Vzc2FyaWx5IHJl
cXVpcmluZyBldmVyeQ0KPiA+IHByb2Nlc3MgYmVpbmcgbWlncmF0ZWQgdG8gaGF2ZSBpdHMgb3du
IGNsaWVudGlkLg0KPiANCj4gSXQgd291bGRuJ3QgaGF2ZSB0byBiZSBldmVyeSBwcm9jZXNzLCBp
dCdkIGJlIGV2ZXJ5IGNvbnRhaW5lciwgcmlnaHQ/DQo+IFdoYXQncyB0aGUgZGlzYWR2YW50YWdl
IG9mIHBlci1jb250YWluZXIgY2xpZW50aWRzP8KgwqBJIGd1ZXNzIHlvdSBsb3NlDQo+IHRoZSBj
aGFuY2UgdG8gc2hhcmUgZGVsZWdhdGlvbnMgYW5kIGNhY2hlcy4NCj4gDQoNCkZvciB0aGUgY2Fz
ZSB0aGF0IFN0ZWZhbiBpcyBkaXNjdXNzaW5nIChrdm0pIGl0IHdvdWxkIGxpdGVyYWxseSBiZSBh
DQpzaW5nbGUgcHJvY2VzcyB0aGF0IGlzIGJlaW5nIG1pZ3JhdGVkLiBGb3IgbHhjIGFuZCBkb2Nr
ZXIva3ViZXJuZXRlcy0NCnN0eWxlIGNvbnRhaW5lcnMsIGl0IHdvdWxkIGJlIGEgY29sbGVjdGlv
biBvZiBwcm9jZXNzZXMuDQoNClRoZSBtb3VudHBvaW50cyB1c2VkIGJ5IHRoZXNlIGNvbnRhaW5l
cnMgYXJlIG9mdGVuIG93bmVkIGJ5IHRoZSBob3N0Ow0KdGhleSBhcmUgdHlwaWNhbGx5IHNldCB1
cCBiZWZvcmUgc3RhcnRpbmcgdGhlIGNvbnRhaW5lcmlzZWQgcHJvY2Vzc2VzLg0KRnVydGhlcm1v
cmUsIHRoZXJlIGlzIHR5cGljYWxseSBubyAic3RhcnQgY29udGFpbmVyIiBzeXN0ZW0gY2FsbCB0
aGF0DQp3ZSBjYW4gdXNlIHRvIGlkZW50aWZ5IHdoaWNoIHNldCBvZiBwcm9jZXNzZXMgKG9yIGNn
cm91cHMpIGFyZQ0KY29udGFpbmVyaXNlZCwgYW5kIHNob3VsZCBzaGFyZSBhIGNsaWVudGlkLg0K
DQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmlt
YXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K


2017-05-18 15:18:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

T24gVGh1LCAyMDE3LTA1LTE4IGF0IDExOjE1IC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
PiBPbiBNYXkgMTgsIDIwMTcsIGF0IDExOjA4IEFNLCBKLiBCcnVjZSBGaWVsZHMgPGJmaWVsZHNA
cmVkaGF0LmNvbT4NCj4gPiB3cm90ZToNCj4gPiANCj4gPiBPbiBUaHUsIE1heSAxOCwgMjAxNyBh
dCAwMzowNDo1MFBNICswMDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+ID4gPiBPbiBUaHUs
IDIwMTctMDUtMTggYXQgMTA6MjggLTA0MDAsIENodWNrIExldmVyIHdyb3RlOg0KPiA+ID4gPiA+
IE9uIE1heSAxOCwgMjAxNywgYXQgOTozNCBBTSwgU3RlZmFuIEhham5vY3ppIDxzdGVmYW5oYUBy
ZWRoYXQNCj4gPiA+ID4gPiAuY29tPg0KPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+IA0KPiA+
ID4gPiA+IE9uIFR1ZSwgTWF5IDE2LCAyMDE3IGF0IDA5OjExOjQyQU0gLTA0MDAsIEouIEJydWNl
IEZpZWxkcw0KPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gSSB0aGluayB5b3UgZXhwbGFp
bmVkIHRoaXMgYmVmb3JlLCBwZXJoYXBzIHlvdSBjb3VsZCBqdXN0DQo+ID4gPiA+ID4gPiBvZmZl
ciBhDQo+ID4gPiA+ID4gPiBwb2ludGVyOiByZW1pbmQgdXMgd2hhdCB5b3VyIHJlcXVpcmVtZW50
cyBvciB1c2UgY2FzZXMgYXJlDQo+ID4gPiA+ID4gPiBlc3BlY2lhbGx5DQo+ID4gPiA+ID4gPiBm
b3IgVk0gbWlncmF0aW9uPw0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+IFRoZSBORlMgb3ZlciBBRl9W
U09DSyBjb25maWd1cmF0aW9uIGlzOg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+IEEgZ3Vlc3QgcnVu
bmluZyBvbiBob3N0IG1vdW50cyBhbiBORlMgZXhwb3J0IGZyb20gdGhlDQo+ID4gPiA+ID4gaG9z
dC7CoMKgVGhlDQo+ID4gPiA+ID4gTkZTDQo+ID4gPiA+ID4gc2VydmVyIG1heSBiZSBrZXJuZWwg
bmZzZCBvciBhbiBORlMgZnJvbnRlbmQgdG8gYSBkaXN0cmlidXRlZA0KPiA+ID4gPiA+IHN0b3Jh
Z2UNCj4gPiA+ID4gPiBzeXN0ZW0gbGlrZSBDZXBoLsKgwqBBIGxpdHRsZSBtb3JlIGFib3V0IHRo
ZXNlIGNhc2VzIGJlbG93Lg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+IEtlcm5lbCBuZnNkIGlzIHVz
ZWZ1bCBmb3Igc2hhcmluZyBmaWxlcy7CoMKgRm9yIGV4YW1wbGUsIHRoZQ0KPiA+ID4gPiA+IGd1
ZXN0DQo+ID4gPiA+ID4gbWF5DQo+ID4gPiA+ID4gcmVhZCBzb21lIGZpbGVzIGZyb20gdGhlIGhv
c3Qgd2hlbiBpdCBsYXVuY2hlcyBhbmQvb3IgaXQgbWF5DQo+ID4gPiA+ID4gd3JpdGUNCj4gPiA+
ID4gPiBvdXQNCj4gPiA+ID4gPiByZXN1bHQgZmlsZXMgdG8gdGhlIGhvc3Qgd2hlbiBpdCBzaHV0
cyBkb3duLsKgwqBUaGUgdXNlciBtYXkNCj4gPiA+ID4gPiBhbHNvDQo+ID4gPiA+ID4gd2lzaCB0
bw0KPiA+ID4gPiA+IHNoYXJlIHRoZWlyIGhvbWUgZGlyZWN0b3J5IGJldHdlZW4gdGhlIGd1ZXN0
IGFuZCB0aGUgaG9zdC4NCj4gPiA+ID4gPiANCj4gPiA+ID4gPiBORlMgZnJvbnRlbmRzIGFyZSBh
IGRpZmZlcmVudCB1c2UgY2FzZS7CoMKgVGhleSBoaWRlDQo+ID4gPiA+ID4gZGlzdHJpYnV0ZWQN
Cj4gPiA+ID4gPiBzdG9yYWdlDQo+ID4gPiA+ID4gc3lzdGVtcyBmcm9tIGd1ZXN0cyBpbiBjbG91
ZCBlbnZpcm9ubWVudHMuwqDCoFRoaXMgd2F5IGd1ZXN0cw0KPiA+ID4gPiA+IGRvbid0DQo+ID4g
PiA+ID4gc2VlDQo+ID4gPiA+ID4gdGhlIGRldGFpbHMgb2YgdGhlIENlcGgsIEdsdXN0ZXIsIGV0
YyBub2Rlcy7CoMKgQmVzaWRlcw0KPiA+ID4gPiA+IGJlbmVmaXRpbmcNCj4gPiA+ID4gPiBzZWN1
cml0eSBpdCBhbHNvIGFsbG93cyBORlMtY2FwYWJsZSBndWVzdHMgdG8gcnVuIHdpdGhvdXQNCj4g
PiA+ID4gPiBpbnN0YWxsaW5nDQo+ID4gPiA+ID4gc3BlY2lmaWMgZHJpdmVycyBmb3IgdGhlIGRp
c3RyaWJ1dGVkIHN0b3JhZ2Ugc3lzdGVtLsKgwqBUaGlzDQo+ID4gPiA+ID4gdXNlIGNhc2UNCj4g
PiA+ID4gPiBpcw0KPiA+ID4gPiA+ICJmaWxlc3lzdGVtIGFzIGEgc2VydmljZSIuDQo+ID4gPiA+
ID4gDQo+ID4gPiA+ID4gVGhlIHJlYXNvbiBmb3IgdXNpbmcgQUZfVlNPQ0sgaW5zdGVhZCBvZiBU
Q1AvSVAgaXMgdGhhdA0KPiA+ID4gPiA+IHRyYWRpdGlvbmFsDQo+ID4gPiA+ID4gbmV0d29ya2lu
ZyBjb25maWd1cmF0aW9uIGlzIGZyYWdpbGUuwqDCoEF1dG9tYXRpY2FsbHkgYWRkaW5nIGENCj4g
PiA+ID4gPiBkZWRpY2F0ZWQNCj4gPiA+ID4gPiBOSUMgdG8gdGhlIGd1ZXN0IGFuZCBjaG9vc2lu
ZyBhbiBJUCBzdWJuZXQgaGFzIGEgaGlnaCBjaGFuY2UNCj4gPiA+ID4gPiBvZg0KPiA+ID4gPiA+
IGNvbmZsaWN0cyAoc3VibmV0IGNvbGxpc2lvbnMsIG5ldHdvcmsgaW50ZXJmYWNlIG5hbWluZywN
Cj4gPiA+ID4gPiBmaXJld2FsbA0KPiA+ID4gPiA+IHJ1bGVzLA0KPiA+ID4gPiA+IG5ldHdvcmsg
bWFuYWdlbWVudCB0b29scykuwqDCoEFGX1ZTT0NLIGlzIGEgemVyby1jb25maWd1cmF0aW9uDQo+
ID4gPiA+ID4gY29tbXVuaWNhdGlvbnMgY2hhbm5lbCBzbyBpdCBhdm9pZHMgdGhlc2UgcHJvYmxl
bXMuDQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gT24gdG8gbWlncmF0aW9uLsKgwqBGb3IgdGhlIG1v
c3QgcGFydCwgZ3Vlc3RzIGNhbiBiZSBsaXZlDQo+ID4gPiA+ID4gbWlncmF0ZWQNCj4gPiA+ID4g
PiBiZXR3ZWVuDQo+ID4gPiA+ID4gaG9zdHMgd2l0aG91dCBzaWduaWZpY2FudCBkb3dudGltZSBv
ciBtYW51YWwgc3RlcHMuwqDCoFBDSQ0KPiA+ID4gPiA+IHBhc3N0aHJvdWdoIGlzDQo+ID4gPiA+
ID4gYW4gZXhhbXBsZSBvZiBhIGZlYXR1cmUgdGhhdCBtYWtlcyBpdCB2ZXJ5IGhhcmQgdG8gbGl2
ZQ0KPiA+ID4gPiA+IG1pZ3JhdGUuwqDCoEkNCj4gPiA+ID4gPiBob3BlDQo+ID4gPiA+ID4gd2Ug
Y2FuIGFsbG93IG1pZ3JhdGlvbiB3aXRoIE5GUywgYWx0aG91Z2ggc29tZSBsaW1pdGF0aW9ucw0K
PiA+ID4gPiA+IG1heSBiZQ0KPiA+ID4gPiA+IG5lY2Vzc2FyeSB0byBtYWtlIGl0IGZlYXNpYmxl
Lg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+IFRoZXJlIGFyZSB0d28gTkZTIG92ZXIgQUZfVlNPQ0sg
bWlncmF0aW9uIHNjZW5hcmlvczoNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiAxLiBUaGUgZmlsZXMg
bGl2ZSBvbiBob3N0IEgxIGFuZCBob3N0IEgyIGNhbm5vdCBhY2Nlc3MgdGhlDQo+ID4gPiA+ID4g
ZmlsZXMNCj4gPiA+ID4gPiDCoCBkaXJlY3RseS7CoMKgVGhlcmUgaXMgbm8gd2F5IGZvciBhbiBO
RlMgc2VydmVyIG9uIEgyIHRvDQo+ID4gPiA+ID4gYWNjZXNzDQo+ID4gPiA+ID4gdGhvc2UNCj4g
PiA+ID4gPiDCoCBzYW1lIGZpbGVzIHVubGVzcyB0aGUgZGlyZWN0b3J5IGlzIGNvcGllZCBhbG9u
ZyB3aXRoIHRoZQ0KPiA+ID4gPiA+IGd1ZXN0IG9yDQo+ID4gPiA+ID4gSDINCj4gPiA+ID4gPiDC
oCBwcm94aWVzIHRvIHRoZSBORlMgc2VydmVyIG9uIEgxLg0KPiA+ID4gPiANCj4gPiA+ID4gSGF2
aW5nIG1hbmFnZWQgKGFuZCBzaGFyZWQpIHN0b3JhZ2Ugb24gdGhlIHBoeXNpY2FsIGhvc3QgaXMN
Cj4gPiA+ID4gYXdrd2FyZC4gSSBrbm93IHNvbWUgY2xvdWQgcHJvdmlkZXJzIG1pZ2h0IGRvIHRo
aXMgdG9kYXkgYnkNCj4gPiA+ID4gY29weWluZyBndWVzdCBkaXNrIGltYWdlcyBkb3duIHRvIHRo
ZSBob3N0J3MgbG9jYWwgZGlzaywgYnV0DQo+ID4gPiA+IGdlbmVyYWxseSBpdCdzIG5vdCBhIGZs
ZXhpYmxlIHByaW1hcnkgZGVwbG95bWVudCBjaG9pY2UuDQo+ID4gPiA+IA0KPiA+ID4gPiBUaGVy
ZSdzIG5vIGdvb2Qgd2F5IHRvIGV4cGFuZCBvciByZXBsaWNhdGUgdGhpcyBwb29sIG9mDQo+ID4g
PiA+IHN0b3JhZ2UuIEEgYmFja3VwIHNjaGVtZSB3b3VsZCBuZWVkIHRvIGFjY2VzcyBhbGwgcGh5
c2ljYWwNCj4gPiA+ID4gaG9zdHMuIEFuZCB0aGUgZmlsZXMgYXJlIHZpc2libGUgb25seSBvbiBz
cGVjaWZpYyBob3N0cy4NCj4gPiA+ID4gDQo+ID4gPiA+IElNTyB5b3Ugd2FudCB0byB0cmVhdCBs
b2NhbCBzdG9yYWdlIG9uIGVhY2ggcGh5c2ljYWwgaG9zdCBhcw0KPiA+ID4gPiBhIGNhY2hlIHRp
ZXIgcmF0aGVyIHRoYW4gYXMgYSBiYWNrLWVuZCB0aWVyLg0KPiA+ID4gPiANCj4gPiA+ID4gDQo+
ID4gPiA+ID4gMi4gVGhlIGZpbGVzIGFyZSBhY2Nlc3NpYmxlIGZyb20gYm90aCBob3N0IEgxIGFu
ZCBob3N0IEgyDQo+ID4gPiA+ID4gYmVjYXVzZQ0KPiA+ID4gPiA+IHRoZXkNCj4gPiA+ID4gPiDC
oCBhcmUgb24gc2hhcmVkIHN0b3JhZ2Ugb3IgZGlzdHJpYnV0ZWQgc3RvcmFnZSBzeXN0ZW0uwqDC
oEhlcmUNCj4gPiA+ID4gPiB0aGUNCj4gPiA+ID4gPiDCoCBwcm9ibGVtIGlzICJqdXN0IiBtaWdy
YXRpbmcgdGhlIHN0YXRlIGZyb20gSDEncyBORlMgc2VydmVyDQo+ID4gPiA+ID4gdG8gSDINCj4g
PiA+ID4gPiBzbw0KPiA+ID4gPiA+IMKgIHRoYXQgZmlsZSBoYW5kbGVzIHJlbWFpbiB2YWxpZC4N
Cj4gPiA+ID4gDQo+ID4gPiA+IEVzc2VudGlhbGx5IHRoaXMgaXMgdGhlIHJlLWV4cG9ydCBjYXNl
LCBhbmQgdGhpcyBtYWtlcyBhIGxvdA0KPiA+ID4gPiBtb3JlIHNlbnNlIHRvIG1lIGZyb20gYSBz
dG9yYWdlIGFkbWluaXN0cmF0aW9uIHBvaW50IG9mIHZpZXcuDQo+ID4gPiA+IA0KPiA+ID4gPiBU
aGUgcG9vbCBvZiBhZG1pbmlzdGVyZWQgc3RvcmFnZSBpcyBub3QgbG9jYWwgdG8gdGhlIHBoeXNp
Y2FsDQo+ID4gPiA+IGhvc3RzIHJ1bm5pbmcgdGhlIGd1ZXN0cywgd2hpY2ggaXMgaG93IEkgdGhp
bmsgY2xvdWQgcHJvdmlkZXJzDQo+ID4gPiA+IHdvdWxkIHByZWZlciB0byBvcGVyYXRlLg0KPiA+
ID4gPiANCj4gPiA+ID4gVXNlciBzdG9yYWdlIHdvdWxkIGJlIGFjY2Vzc2libGUgdmlhIGFuIE5G
UyBzaGFyZSwgYnV0IG1hbmFnZWQNCj4gPiA+ID4gaW4gYSBDZXBoIG9iamVjdCAod2l0aCByZWR1
bmRhbmN5LCBhIGNvbW1vbiBoaWdoIHRocm91Z2hwdXQNCj4gPiA+ID4gYmFja3VwIGZhY2lsaXR5
LCBhbmQgc2VjdXJlIGNlbnRyYWwgbWFuYWdlbWVudCBvZiB1c2VyDQo+ID4gPiA+IGlkZW50aXRp
ZXMpLg0KPiA+ID4gPiANCj4gPiA+ID4gRWFjaCBob3N0J3MgTkZTIHNlcnZlciBjb3VsZCBiZSBj
b25maWd1cmVkIHRvIGV4cG9zZSBvbmx5IHRoZQ0KPiA+ID4gPiB0aGUgY2xvdWQgc3RvcmFnZSBy
ZXNvdXJjZXMgZm9yIHRoZSB0ZW5hbnRzIG9uIHRoYXQgaG9zdC4gVGhlDQo+ID4gPiA+IGJhY2st
ZW5kIHN0b3JhZ2UgKGllLCBDZXBoKSBjb3VsZCBvcGVyYXRlIG9uIGEgcHJpdmF0ZSBzdG9yYWdl
DQo+ID4gPiA+IGFyZWEgbmV0d29yayBmb3IgYmV0dGVyIHNlY3VyaXR5Lg0KPiA+ID4gPiANCj4g
PiA+ID4gVGhlIG9ubHkgbWlzc2luZyBwaWVjZSBoZXJlIGlzIHN1cHBvcnQgaW4gTGludXgtYmFz
ZWQgTkZTDQo+ID4gPiA+IHNlcnZlcnMgZm9yIHRyYW5zcGFyZW50IHN0YXRlIG1pZ3JhdGlvbi4N
Cj4gPiA+IA0KPiA+ID4gTm90IHJlYWxseS4gSW4gYSBjb250YWluZXJpc2VkIHdvcmxkLCB3ZSdy
ZSBnb2luZyB0byBzZWUgbW9yZSBhbmQNCj4gPiA+IG1vcmUNCj4gPiA+IGNhc2VzIHdoZXJlIGp1
c3QgYSBzaW5nbGUgcHJvY2Vzcy9hcHBsaWNhdGlvbiBnZXRzIG1pZ3JhdGVkIGZyb20NCj4gPiA+
IG9uZQ0KPiA+ID4gTkZTIGNsaWVudCB0byBhbm90aGVyIChhbmQgeWVzLCBhIHJlLWV4cG9ydGVy
L3Byb3h5IG9mIE5GUyBpcw0KPiA+ID4ganVzdA0KPiA+ID4gYW5vdGhlciBjbGllbnQgYXMgZmFy
IGFzIHRoZSBvcmlnaW5hbCBzZXJ2ZXIgaXMgY29uY2VybmVkKS4NCj4gPiA+IElPVzogSSB0aGlu
ayB3ZSB3YW50IHRvIGFsbG93IGEgY2xpZW50IHRvIG1pZ3JhdGUgc29tZSBwYXJ0cyBvZg0KPiA+
ID4gaXRzDQo+ID4gPiBsb2NrIHN0YXRlIHRvIGFub3RoZXIgY2xpZW50LCB3aXRob3V0IG5lY2Vz
c2FyaWx5IHJlcXVpcmluZyBldmVyeQ0KPiA+ID4gcHJvY2VzcyBiZWluZyBtaWdyYXRlZCB0byBo
YXZlIGl0cyBvd24gY2xpZW50aWQuDQo+ID4gDQo+ID4gSXQgd291bGRuJ3QgaGF2ZSB0byBiZSBl
dmVyeSBwcm9jZXNzLCBpdCdkIGJlIGV2ZXJ5IGNvbnRhaW5lciwNCj4gPiByaWdodD8NCj4gPiBX
aGF0J3MgdGhlIGRpc2FkdmFudGFnZSBvZiBwZXItY29udGFpbmVyIGNsaWVudGlkcz/CoMKgSSBn
dWVzcyB5b3UNCj4gPiBsb3NlDQo+ID4gdGhlIGNoYW5jZSB0byBzaGFyZSBkZWxlZ2F0aW9ucyBh
bmQgY2FjaGVzLg0KPiANCj4gQ2FuJ3QgZWFjaCBjb250YWluZXIgaGF2ZSBpdCdzIG93biBuZXQg
bmFtZXNwYWNlLCBhbmQgZWFjaCBuZXQNCj4gbmFtZXNwYWNlIGhhdmUgaXRzIG93biBjbGllbnQg
SUQ/DQoNClBvc3NpYmx5LCBidXQgdGhhdCB3b3VsZG4ndCBjb3ZlciBTdGVmYW4ncyBjYXNlIG9m
IGEgc2luZ2xlIGt2bQ0KcHJvY2Vzcy4g4pi6DQoNCj4gKEkgYWdyZWUsIGJ0dywgdGhpcyBjbGFz
cyBvZiBwcm9ibGVtcyBzaG91bGQgYmUgY29uc2lkZXJlZCBpbg0KPiB0aGUgbmV3IG5mc3Y0IFdH
IGNoYXJ0ZXIuIFRoYW5rcyBmb3IgZG9pbmcgdGhhdCwgVHJvbmQpLg0KPiANCi0tIA0KVHJvbmQg
TXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9u
ZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo=


2017-05-18 15:28:23

by J. Bruce Fields

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Thu, May 18, 2017 at 03:17:11PM +0000, Trond Myklebust wrote:
> For the case that Stefan is discussing (kvm) it would literally be a
> single process that is being migrated. For lxc and docker/kubernetes-
> style containers, it would be a collection of processes.
>
> The mountpoints used by these containers are often owned by the host;
> they are typically set up before starting the containerised processes.
> Furthermore, there is typically no "start container" system call that
> we can use to identify which set of processes (or cgroups) are
> containerised, and should share a clientid.

Is that such a hard problem?

In any case, from the protocol point of view these all sound like client
implementation details.

The only problem I see with multiple client ID's is that you'd like to
keep their delegations from conflicting with each other so they can
share cache.

But, maybe I'm missing something else.

--b.

2017-05-18 16:09:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

T24gVGh1LCAyMDE3LTA1LTE4IGF0IDExOjI4IC0wNDAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3
cm90ZToNCj4gT24gVGh1LCBNYXkgMTgsIDIwMTcgYXQgMDM6MTc6MTFQTSArMDAwMCwgVHJvbmQg
TXlrbGVidXN0IHdyb3RlOg0KPiA+IEZvciB0aGUgY2FzZSB0aGF0IFN0ZWZhbiBpcyBkaXNjdXNz
aW5nIChrdm0pIGl0IHdvdWxkIGxpdGVyYWxseSBiZQ0KPiA+IGENCj4gPiBzaW5nbGUgcHJvY2Vz
cyB0aGF0IGlzIGJlaW5nIG1pZ3JhdGVkLiBGb3IgbHhjIGFuZA0KPiA+IGRvY2tlci9rdWJlcm5l
dGVzLQ0KPiA+IHN0eWxlIGNvbnRhaW5lcnMsIGl0IHdvdWxkIGJlIGEgY29sbGVjdGlvbiBvZiBw
cm9jZXNzZXMuDQo+ID4gDQo+ID4gVGhlIG1vdW50cG9pbnRzIHVzZWQgYnkgdGhlc2UgY29udGFp
bmVycyBhcmUgb2Z0ZW4gb3duZWQgYnkgdGhlDQo+ID4gaG9zdDsNCj4gPiB0aGV5IGFyZSB0eXBp
Y2FsbHkgc2V0IHVwIGJlZm9yZSBzdGFydGluZyB0aGUgY29udGFpbmVyaXNlZA0KPiA+IHByb2Nl
c3Nlcy4NCj4gPiBGdXJ0aGVybW9yZSwgdGhlcmUgaXMgdHlwaWNhbGx5IG5vICJzdGFydCBjb250
YWluZXIiIHN5c3RlbSBjYWxsDQo+ID4gdGhhdA0KPiA+IHdlIGNhbiB1c2UgdG8gaWRlbnRpZnkg
d2hpY2ggc2V0IG9mIHByb2Nlc3NlcyAob3IgY2dyb3VwcykgYXJlDQo+ID4gY29udGFpbmVyaXNl
ZCwgYW5kIHNob3VsZCBzaGFyZSBhIGNsaWVudGlkLg0KPiANCj4gSXMgdGhhdCBzdWNoIGEgaGFy
ZCBwcm9ibGVtPw0KPiANCg0KRXJyLCB5ZXMuLi4gaXNuJ3QgaXQ/IEhvdyBkbyBJIGlkZW50aWZ5
IGEgY29udGFpbmVyIGFuZCBrbm93IHdoZXJlIHRvDQpzZXQgdGhlIGxlYXNlIGJvdW5kYXJ5Pw0K
DQpCZWFyIGluIG1pbmQgdGhhdCB0aGUgZGVmaW5pdGlvbiBvZiAiY29udGFpbmVyIiBpcyBub24t
ZXhpc3RlbnQgYmV5b25kDQp0aGUgb2J2aW91cyAiYSBsb29zZSBjb2xsZWN0aW9uIG9mIHByb2Nl
c3NlcyIuIEl0IHZhcmllcyBmcm9tIHRoZQ0KZG9ja2VyL2x4Yy92aXJ0dW96em8gc3R5bGUgY29u
dGFpbmVyLCB3aGljaCB1c2VzIG5hbWVzcGFjZXMgdG8gYm91bmQNCnRoZSBwcm9jZXNzZXMsIHRv
IHRoZSBHb29nbGUgdHlwZSBvZiAiY29udGFpbmVyIiB0aGF0IGlzIGFjdHVhbGx5IGp1c3QNCmEg
c2V0IG9mIGNncm91cHMgYW5kIHRvIHRoZSBrdm0vcWVtdSBzaW5nbGUgcHJvY2Vzcy4NCg0KPiBJ
biBhbnkgY2FzZSwgZnJvbSB0aGUgcHJvdG9jb2wgcG9pbnQgb2YgdmlldyB0aGVzZSBhbGwgc291
bmQgbGlrZQ0KPiBjbGllbnQNCj4gaW1wbGVtZW50YXRpb24gZGV0YWlscy4NCg0KSWYgeW91IGFy
ZSBzZWVpbmcgYW4gb2J2aW91cyBhcmNoaXRlY3R1cmUgZm9yIHRoZSBjbGllbnQsIHRoZW4gcGxl
YXNlDQpzaGFyZS4uLg0KDQo+IFRoZSBvbmx5IHByb2JsZW0gSSBzZWUgd2l0aCBtdWx0aXBsZSBj
bGllbnQgSUQncyBpcyB0aGF0IHlvdSdkIGxpa2UNCj4gdG8NCj4ga2VlcCB0aGVpciBkZWxlZ2F0
aW9ucyBmcm9tIGNvbmZsaWN0aW5nIHdpdGggZWFjaCBvdGhlciBzbyB0aGV5IGNhbg0KPiBzaGFy
ZSBjYWNoZS4NCj4gDQo+IEJ1dCwgbWF5YmUgSSdtIG1pc3Npbmcgc29tZXRoaW5nIGVsc2UuDQoN
CkhhdmluZyB0byBhbiBFWENIQU5HRV9JRCArIENSRUFURV9TRVNTSU9OIG9uIGV2ZXJ5IGNhbGwg
dG8NCmZvcmsoKS9jbG9uZSgpIGFuZCBhIERFU1RST1lfU0VTU0lPTi9ERVNUUk9ZX0VYQ0hBTkdF
SUQgaW4gZWFjaCBwcm9jZXNzDQpkZXN0cnVjdG9yPyBMZWFzZSByZW5ld2FsIHBpbmdzIGZyb20g
MTAwMCBwcm9jZXNzZXMgcnVubmluZyBvbiAxMDAwDQpjbGllbnRzPw0KDQpUaGlzIGlzIHdoYXQg
SSBtZWFuIGFib3V0IGNvbnRhaW5lciBib3VuZGFyaWVzLiBJZiB0aGV5IGFyZW4ndCB3ZWxsDQpk
ZWZpbmVkLCB0aGVuIHdlJ3JlIGRvd24gdG8gZG9pbmcgcHJlY2lzZWx5IHRoZSBhYm92ZS4NCg0K
LS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFy
eURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg==


2017-05-18 16:32:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Thu, May 18, 2017 at 04:09:10PM +0000, Trond Myklebust wrote:
> On Thu, 2017-05-18 at 11:28 -0400, [email protected] wrote:
> > On Thu, May 18, 2017 at 03:17:11PM +0000, Trond Myklebust wrote:
> > > For the case that Stefan is discussing (kvm) it would literally be
> > > a
> > > single process that is being migrated. For lxc and
> > > docker/kubernetes-
> > > style containers, it would be a collection of processes.
> > >
> > > The mountpoints used by these containers are often owned by the
> > > host;
> > > they are typically set up before starting the containerised
> > > processes.
> > > Furthermore, there is typically no "start container" system call
> > > that
> > > we can use to identify which set of processes (or cgroups) are
> > > containerised, and should share a clientid.
> >
> > Is that such a hard problem?
> >
>
> Err, yes... isn't it? How do I identify a container and know where to
> set the lease boundary?
>
> Bear in mind that the definition of "container" is non-existent beyond
> the obvious "a loose collection of processes". It varies from the
> docker/lxc/virtuozzo style container, which uses namespaces to bound
> the processes, to the Google type of "container" that is actually just
> a set of cgroups and to the kvm/qemu single process.

Sure, but, can't we pick *something* to use as the boundary (network
namespace?), document it, and let userspace use that to tell us what it
wants?

> > In any case, from the protocol point of view these all sound like
> > client
> > implementation details.
>
> If you are seeing an obvious architecture for the client, then please
> share...

Make clientids per-network-namespace and store them in nfs_net? (Maybe
that's what's already done, I can't tell.)

> > The only problem I see with multiple client ID's is that you'd like
> > to
> > keep their delegations from conflicting with each other so they can
> > share cache.
> >
> > But, maybe I'm missing something else.
>
> Having to an EXCHANGE_ID + CREATE_SESSION on every call to
> fork()/clone() and a DESTROY_SESSION/DESTROY_EXCHANGEID in each process
> destructor? Lease renewal pings from 1000 processes running on 1000
> clients?
>
> This is what I mean about container boundaries. If they aren't well
> defined, then we're down to doing precisely the above.

Again this sounds like a complaint about the kernel api rather than
about the protocol. If the container management system knows what it
wants and we give it a way to explain it to us, then we avoid most of
that, right?

--b.

2017-05-18 17:13:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

T24gVGh1LCAyMDE3LTA1LTE4IGF0IDEyOjMyIC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIFRodSwgTWF5IDE4LCAyMDE3IGF0IDA0OjA5OjEwUE0gKzAwMDAsIFRyb25kIE15a2xl
YnVzdCB3cm90ZToNCj4gPiBPbiBUaHUsIDIwMTctMDUtMTggYXQgMTE6MjggLTA0MDAsIGJmaWVs
ZHNAZmllbGRzZXMub3JnIHdyb3RlOg0KPiA+ID4gT24gVGh1LCBNYXkgMTgsIDIwMTcgYXQgMDM6
MTc6MTFQTSArMDAwMCwgVHJvbmQgTXlrbGVidXN0IHdyb3RlOg0KPiA+ID4gPiBGb3IgdGhlIGNh
c2UgdGhhdCBTdGVmYW4gaXMgZGlzY3Vzc2luZyAoa3ZtKSBpdCB3b3VsZCBsaXRlcmFsbHkNCj4g
PiA+ID4gYmUNCj4gPiA+ID4gYQ0KPiA+ID4gPiBzaW5nbGUgcHJvY2VzcyB0aGF0IGlzIGJlaW5n
IG1pZ3JhdGVkLiBGb3IgbHhjIGFuZA0KPiA+ID4gPiBkb2NrZXIva3ViZXJuZXRlcy0NCj4gPiA+
ID4gc3R5bGUgY29udGFpbmVycywgaXQgd291bGQgYmUgYSBjb2xsZWN0aW9uIG9mIHByb2Nlc3Nl
cy4NCj4gPiA+ID4gDQo+ID4gPiA+IFRoZSBtb3VudHBvaW50cyB1c2VkIGJ5IHRoZXNlIGNvbnRh
aW5lcnMgYXJlIG9mdGVuIG93bmVkIGJ5IHRoZQ0KPiA+ID4gPiBob3N0Ow0KPiA+ID4gPiB0aGV5
IGFyZSB0eXBpY2FsbHkgc2V0IHVwIGJlZm9yZSBzdGFydGluZyB0aGUgY29udGFpbmVyaXNlZA0K
PiA+ID4gPiBwcm9jZXNzZXMuDQo+ID4gPiA+IEZ1cnRoZXJtb3JlLCB0aGVyZSBpcyB0eXBpY2Fs
bHkgbm8gInN0YXJ0IGNvbnRhaW5lciIgc3lzdGVtDQo+ID4gPiA+IGNhbGwNCj4gPiA+ID4gdGhh
dA0KPiA+ID4gPiB3ZSBjYW4gdXNlIHRvIGlkZW50aWZ5IHdoaWNoIHNldCBvZiBwcm9jZXNzZXMg
KG9yIGNncm91cHMpIGFyZQ0KPiA+ID4gPiBjb250YWluZXJpc2VkLCBhbmQgc2hvdWxkIHNoYXJl
IGEgY2xpZW50aWQuDQo+ID4gPiANCj4gPiA+IElzIHRoYXQgc3VjaCBhIGhhcmQgcHJvYmxlbT8N
Cj4gPiA+IA0KPiA+IA0KPiA+IEVyciwgeWVzLi4uIGlzbid0IGl0PyBIb3cgZG8gSSBpZGVudGlm
eSBhIGNvbnRhaW5lciBhbmQga25vdyB3aGVyZQ0KPiA+IHRvDQo+ID4gc2V0IHRoZSBsZWFzZSBi
b3VuZGFyeT8NCj4gPiANCj4gPiBCZWFyIGluIG1pbmQgdGhhdCB0aGUgZGVmaW5pdGlvbiBvZiAi
Y29udGFpbmVyIiBpcyBub24tZXhpc3RlbnQNCj4gPiBiZXlvbmQNCj4gPiB0aGUgb2J2aW91cyAi
YSBsb29zZSBjb2xsZWN0aW9uIG9mIHByb2Nlc3NlcyIuIEl0IHZhcmllcyBmcm9tIHRoZQ0KPiA+
IGRvY2tlci9seGMvdmlydHVvenpvIHN0eWxlIGNvbnRhaW5lciwgd2hpY2ggdXNlcyBuYW1lc3Bh
Y2VzIHRvDQo+ID4gYm91bmQNCj4gPiB0aGUgcHJvY2Vzc2VzLCB0byB0aGUgR29vZ2xlIHR5cGUg
b2YgImNvbnRhaW5lciIgdGhhdCBpcyBhY3R1YWxseQ0KPiA+IGp1c3QNCj4gPiBhIHNldCBvZiBj
Z3JvdXBzIGFuZCB0byB0aGUga3ZtL3FlbXUgc2luZ2xlIHByb2Nlc3MuDQo+IA0KPiBTdXJlLCBi
dXQsIGNhbid0IHdlIHBpY2sgKnNvbWV0aGluZyogdG8gdXNlIGFzIHRoZSBib3VuZGFyeSAobmV0
d29yaw0KPiBuYW1lc3BhY2U/KSwgZG9jdW1lbnQgaXQsIGFuZCBsZXQgdXNlcnNwYWNlIHVzZSB0
aGF0IHRvIHRlbGwgdXMgd2hhdA0KPiBpdA0KPiB3YW50cz8NCj4gDQo+ID4gPiBJbiBhbnkgY2Fz
ZSwgZnJvbSB0aGUgcHJvdG9jb2wgcG9pbnQgb2YgdmlldyB0aGVzZSBhbGwgc291bmQgbGlrZQ0K
PiA+ID4gY2xpZW50DQo+ID4gPiBpbXBsZW1lbnRhdGlvbiBkZXRhaWxzLg0KPiA+IA0KPiA+IElm
IHlvdSBhcmUgc2VlaW5nIGFuIG9idmlvdXMgYXJjaGl0ZWN0dXJlIGZvciB0aGUgY2xpZW50LCB0
aGVuDQo+ID4gcGxlYXNlDQo+ID4gc2hhcmUuLi4NCj4gDQo+IE1ha2UgY2xpZW50aWRzIHBlci1u
ZXR3b3JrLW5hbWVzcGFjZSBhbmQgc3RvcmUgdGhlbSBpbg0KPiBuZnNfbmV0P8KgwqAoTWF5YmUN
Cj4gdGhhdCdzIHdoYXQncyBhbHJlYWR5IGRvbmUsIEkgY2FuJ3QgdGVsbC4pDQo+IA0KPiA+ID4g
VGhlIG9ubHkgcHJvYmxlbSBJIHNlZSB3aXRoIG11bHRpcGxlIGNsaWVudCBJRCdzIGlzIHRoYXQg
eW91J2QNCj4gPiA+IGxpa2UNCj4gPiA+IHRvDQo+ID4gPiBrZWVwIHRoZWlyIGRlbGVnYXRpb25z
IGZyb20gY29uZmxpY3Rpbmcgd2l0aCBlYWNoIG90aGVyIHNvIHRoZXkNCj4gPiA+IGNhbg0KPiA+
ID4gc2hhcmUgY2FjaGUuDQo+ID4gPiANCj4gPiA+IEJ1dCwgbWF5YmUgSSdtIG1pc3Npbmcgc29t
ZXRoaW5nIGVsc2UuDQo+ID4gDQo+ID4gSGF2aW5nIHRvIGFuIEVYQ0hBTkdFX0lEICsgQ1JFQVRF
X1NFU1NJT04gb24gZXZlcnkgY2FsbCB0bw0KPiA+IGZvcmsoKS9jbG9uZSgpIGFuZCBhIERFU1RS
T1lfU0VTU0lPTi9ERVNUUk9ZX0VYQ0hBTkdFSUQgaW4gZWFjaA0KPiA+IHByb2Nlc3MNCj4gPiBk
ZXN0cnVjdG9yPyBMZWFzZSByZW5ld2FsIHBpbmdzIGZyb20gMTAwMCBwcm9jZXNzZXMgcnVubmlu
ZyBvbiAxMDAwDQo+ID4gY2xpZW50cz8NCj4gPiANCj4gPiBUaGlzIGlzIHdoYXQgSSBtZWFuIGFi
b3V0IGNvbnRhaW5lciBib3VuZGFyaWVzLiBJZiB0aGV5IGFyZW4ndCB3ZWxsDQo+ID4gZGVmaW5l
ZCwgdGhlbiB3ZSdyZSBkb3duIHRvIGRvaW5nIHByZWNpc2VseSB0aGUgYWJvdmUuDQo+IA0KPiBB
Z2FpbiB0aGlzIHNvdW5kcyBsaWtlIGEgY29tcGxhaW50IGFib3V0IHRoZSBrZXJuZWwgYXBpIHJh
dGhlciB0aGFuDQo+IGFib3V0IHRoZSBwcm90b2NvbC7CoMKgSWYgdGhlIGNvbnRhaW5lciBtYW5h
Z2VtZW50IHN5c3RlbSBrbm93cyB3aGF0IGl0DQo+IHdhbnRzIGFuZCB3ZSBnaXZlIGl0IGEgd2F5
IHRvIGV4cGxhaW4gaXQgdG8gdXMsIHRoZW4gd2UgYXZvaWQgbW9zdCBvZg0KPiB0aGF0LCByaWdo
dD8NCj4gDQoNCk9LLCBzbyBjb25zaWRlciB0aGUgdXNlIGNhc2UgdGhhdCBpbnNwaXJlZCB0aGlz
IGNvbnZlcnNhdGlvbjogbmFtZWx5DQp1c2luZyBuZnNkIG9uIHRoZSBzZXJ2ZXIgdG8gcHJveHkg
Zm9yIGEgY2xpZW50IHJ1bm5pbmcgaW4ga3ZtIGFuZCB1c2luZw0KdGhlIHZzb2NrIGludGVyZmFj
ZS4NCg0KSG93IGRvIEkgYXJjaGl0ZWN0IGtuZnNkIHNvIHRoYXQgaXQgaGFuZGxlcyB0aGF0IHVz
ZSBjYXNlPyBBcmUgeW91DQpzYXlpbmcgdGhhdCBJIG5lZWQgdG8gc2V0IHVwIGEgY29udGFpbmVy
IG9mIGtuZnNkIHRocmVhZHMganVzdCB0byBzZXJ2ZQ0KdGhpcyBvbmUga3ZtIGluc3RhbmNlPyBP
dGhlcndpc2UsIHRoZSBsb2NrcyBjcmVhdGVkIGJ5IGtuZnNkIGZvciB0aGF0DQprdm0gcHJvY2Vz
cyB3aWxsIGhhdmUgdGhlIHNhbWUgY2xpZW50aWQgYXMgYWxsIHRoZSBvdGhlciBsb2NrcyBjcmVh
dGVkDQpieSBrbmZzZD8NCg0KVG8gbWUsIGl0IHNlZW1zIG1vcmUgZmxleGlibGUgdG8gYWxsb3cg
YSB1dGlsaXR5IGxpa2UgY3JpdSAoaHR0cHM6Ly9jcmkNCnUub3JnL01haW5fUGFnZSkgdG8gc3Bl
Y2lmeSAiSSdkIGxpa2UgdG8gbWFyayB0aGVzZSBzcGVjaWZpYyBsb2NrcyBhcw0KYmVpbmcgcGFy
dCBvZiB0aGlzIGNoZWNrcG9pbnQvcmVzdG9yZSBjb250ZXh0IHBsZWFzZSIgKGh0dHBzOi8vY3Jp
dS5vcmcNCi9GaWxlX2xvY2tzKSwgYW5kIGFsbG93IHRoZW0gdG8gYmUgYXR0ZW1wdGVkIHJlc3Rv
cmVkIHdpdGggdGhlIHByb2Nlc3MNCnRoYXQgd2FzIG1pZ3JhdGVkLg0KTm90ZSB0aGF0IGNyaXUg
YWxzbyB3b3JrcyBhdCB0aGUgbGV2ZWwgb2YgdGhlIGFwcGxpY2F0aW9uLCBub3QgYQ0KY29udGFp
bmVyLCBldmVuIHRob3VnaCBpdCB3YXMgZGV2ZWxvcGVkIGJ5IHRoZSBjb250YWluZXIgdmlydHVh
bGlzYXRpb24NCmNvbW11bml0eS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBj
bGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0
YS5jb20NCg==


2017-05-22 12:45:27

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Thu, May 18, 2017 at 05:13:48PM +0000, Trond Myklebust wrote:
> On Thu, 2017-05-18 at 12:32 -0400, J. Bruce Fields wrote:
> > On Thu, May 18, 2017 at 04:09:10PM +0000, Trond Myklebust wrote:
> > > On Thu, 2017-05-18 at 11:28 -0400, [email protected] wrote:
> > > > On Thu, May 18, 2017 at 03:17:11PM +0000, Trond Myklebust wrote:
> > > > > For the case that Stefan is discussing (kvm) it would literally
> > > > > be
> > > > > a
> > > > > single process that is being migrated. For lxc and
> > > > > docker/kubernetes-
> > > > > style containers, it would be a collection of processes.
> > > > >
> > > > > The mountpoints used by these containers are often owned by the
> > > > > host;
> > > > > they are typically set up before starting the containerised
> > > > > processes.
> > > > > Furthermore, there is typically no "start container" system
> > > > > call
> > > > > that
> > > > > we can use to identify which set of processes (or cgroups) are
> > > > > containerised, and should share a clientid.
> > > >
> > > > Is that such a hard problem?
> > > >
> > >
> > > Err, yes... isn't it? How do I identify a container and know where
> > > to
> > > set the lease boundary?
> > >
> > > Bear in mind that the definition of "container" is non-existent
> > > beyond
> > > the obvious "a loose collection of processes". It varies from the
> > > docker/lxc/virtuozzo style container, which uses namespaces to
> > > bound
> > > the processes, to the Google type of "container" that is actually
> > > just
> > > a set of cgroups and to the kvm/qemu single process.
> >
> > Sure, but, can't we pick *something* to use as the boundary (network
> > namespace?), document it, and let userspace use that to tell us what
> > it
> > wants?
> >
> > > > In any case, from the protocol point of view these all sound like
> > > > client
> > > > implementation details.
> > >
> > > If you are seeing an obvious architecture for the client, then
> > > please
> > > share...
> >
> > Make clientids per-network-namespace and store them in
> > nfs_net???(Maybe
> > that's what's already done, I can't tell.)
> >
> > > > The only problem I see with multiple client ID's is that you'd
> > > > like
> > > > to
> > > > keep their delegations from conflicting with each other so they
> > > > can
> > > > share cache.
> > > >
> > > > But, maybe I'm missing something else.
> > >
> > > Having to an EXCHANGE_ID + CREATE_SESSION on every call to
> > > fork()/clone() and a DESTROY_SESSION/DESTROY_EXCHANGEID in each
> > > process
> > > destructor? Lease renewal pings from 1000 processes running on 1000
> > > clients?
> > >
> > > This is what I mean about container boundaries. If they aren't well
> > > defined, then we're down to doing precisely the above.
> >
> > Again this sounds like a complaint about the kernel api rather than
> > about the protocol.??If the container management system knows what it
> > wants and we give it a way to explain it to us, then we avoid most of
> > that, right?
> >
>
> OK, so consider the use case that inspired this conversation: namely
> using nfsd on the server to proxy for a client running in kvm and using
> the vsock interface.
>
> How do I architect knfsd so that it handles that use case? Are you
> saying that I need to set up a container of knfsd threads just to serve
> this one kvm instance? Otherwise, the locks created by knfsd for that
> kvm process will have the same clientid as all the other locks created
> by knfsd?

Another issue with Linux namespaces is that the granularity of the "net"
namespace isn't always what you want. The application may need its own
NFS client but that requires isolating it from all other services in the
network namespace (like the physical network interfaces :)).

Stefan


Attachments:
(No filename) (3.67 kB)
signature.asc (455.00 B)
Download all attachments

2017-05-22 14:25:31

by Jeff Layton

[permalink] [raw]
Subject: Re: EXCHANGE_ID with same network address but different server owner

On Thu, 2017-05-18 at 16:09 +0000, Trond Myklebust wrote:
> On Thu, 2017-05-18 at 11:28 -0400, [email protected] wrote:
> > On Thu, May 18, 2017 at 03:17:11PM +0000, Trond Myklebust wrote:
> > > For the case that Stefan is discussing (kvm) it would literally be
> > > a
> > > single process that is being migrated. For lxc and
> > > docker/kubernetes-
> > > style containers, it would be a collection of processes.
> > >
> > > The mountpoints used by these containers are often owned by the
> > > host;
> > > they are typically set up before starting the containerised
> > > processes.
> > > Furthermore, there is typically no "start container" system call
> > > that
> > > we can use to identify which set of processes (or cgroups) are
> > > containerised, and should share a clientid.
> >
> > Is that such a hard problem?
> >
>
> Err, yes... isn't it? How do I identify a container and know where to
> set the lease boundary?
>
> Bear in mind that the definition of "container" is non-existent beyond
> the obvious "a loose collection of processes". It varies from the
> docker/lxc/virtuozzo style container, which uses namespaces to bound
> the processes, to the Google type of "container" that is actually just
> a set of cgroups and to the kvm/qemu single process.
>
> > In any case, from the protocol point of view these all sound like
> > client
> > implementation details.
>
> If you are seeing an obvious architecture for the client, then please
> share...
>
> > The only problem I see with multiple client ID's is that you'd like
> > to
> > keep their delegations from conflicting with each other so they can
> > share cache.
> >
> > But, maybe I'm missing something else.
>
> Having to an EXCHANGE_ID + CREATE_SESSION on every call to
> fork()/clone() and a DESTROY_SESSION/DESTROY_EXCHANGEID in each process
> destructor? Lease renewal pings from 1000 processes running on 1000
> clients?
>
> This is what I mean about container boundaries. If they aren't well
> defined, then we're down to doing precisely the above.
>

This is the crux of the problem with containers in general.

We've been pretending for a long time that the kernel doesn't really
need to understand them and can just worry about namespaces, but that
really hasn't worked out well so far.

I think we need to consider making a "container" a first-class object in
the kernel. Note that that would also help solve the long-standing
problem of how to handle usermode helper upcalls in containers.

I do happen to know of one kernel developer (cc'ed here) who has been
working on something along those lines...
--
Jeff Layton <[email protected]>