Hey Trond-
To support the ability to unload the underlying RDMA device's kernel
driver while NFS mounts are active, xprtrdma needs the ability to
suspend RPC sends temporarily while the transport hands HW resources
back to the driver. Once the device driver is unloaded, the RDMA
transport is left disconnected, and RPCs will be suspended normally
until a connection is possible again (eg, a new device is made
available).
A DEVICE_REMOVAL event is an upcall to xprtrdma that may sleep. Upon
its return, the device driver unloads itself. Currently my prototype
frees all HW resources during the upcall, but that doesn't block
new RPCs from trying to use those resources at the same time.
Seems like the most natural way to temporarily block sends would be
to grab the transport's write lock, just like "connect" does, while
the transport is dealing with DEVICE_REMOVAL, then release it once
all HW resources have been freed.
Unfortunately an RPC task is needed to acquire the write lock. But
disconnect is just an asynchronous event, there is no RPC task
associated with it, and thus no context that the RPC scheduler
can put to sleep if there happens to be another RPC sending at the
moment a device removal event occurs.
I was looking at xprt_lock_connect, but that doesn't appear to do
quite what I need.
Another thought was to have the DEVICE_REMOVAL upcall mark the
transport disconnected, send an asynchronous NULL RPC, then wait
on a kernel waitqueue.
The NULL RPC would grab the write lock and kick the transport's
connect worker. The connect worker would free HW resources, then
awaken the waiter. Then the upcall could return to the driver.
The problem with this scheme is the same as it was for the
keepalive work: there's no task or rpc_clnt available to the
DEVICE_REMOVAL upcall. Sleeping until the write lock is available
would require a task, and sending a NULL RPC would require an
rpc_clnt.
Any advice/thoughts about this?
--
Chuck Lever
T24gV2VkLCAyMDE3LTAyLTIyIGF0IDE2OjMxIC0wNTAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
SGV5IFRyb25kLQ0KPiANCj4gVG8gc3VwcG9ydCB0aGUgYWJpbGl0eSB0byB1bmxvYWQgdGhlIHVu
ZGVybHlpbmcgUkRNQSBkZXZpY2UncyBrZXJuZWwNCj4gZHJpdmVyIHdoaWxlIE5GUyBtb3VudHMg
YXJlIGFjdGl2ZSwgeHBydHJkbWEgbmVlZHMgdGhlIGFiaWxpdHkgdG8NCj4gc3VzcGVuZCBSUEMg
c2VuZHMgdGVtcG9yYXJpbHkgd2hpbGUgdGhlIHRyYW5zcG9ydCBoYW5kcyBIVyByZXNvdXJjZXMN
Cj4gYmFjayB0byB0aGUgZHJpdmVyLiBPbmNlIHRoZSBkZXZpY2UgZHJpdmVyIGlzIHVubG9hZGVk
LCB0aGUgUkRNQQ0KPiB0cmFuc3BvcnQgaXMgbGVmdCBkaXNjb25uZWN0ZWQsIGFuZCBSUENzIHdp
bGwgYmUgc3VzcGVuZGVkIG5vcm1hbGx5DQo+IHVudGlsIGEgY29ubmVjdGlvbiBpcyBwb3NzaWJs
ZSBhZ2FpbiAoZWcsIGEgbmV3IGRldmljZSBpcyBtYWRlDQo+IGF2YWlsYWJsZSkuDQo+IA0KPiBB
IERFVklDRV9SRU1PVkFMIGV2ZW50IGlzIGFuIHVwY2FsbCB0byB4cHJ0cmRtYSB0aGF0IG1heSBz
bGVlcC4gVXBvbg0KPiBpdHMgcmV0dXJuLCB0aGUgZGV2aWNlIGRyaXZlciB1bmxvYWRzIGl0c2Vs
Zi4gQ3VycmVudGx5IG15IHByb3RvdHlwZQ0KPiBmcmVlcyBhbGwgSFcgcmVzb3VyY2VzIGR1cmlu
ZyB0aGUgdXBjYWxsLCBidXQgdGhhdCBkb2Vzbid0IGJsb2NrDQo+IG5ldyBSUENzIGZyb20gdHJ5
aW5nIHRvIHVzZSB0aG9zZSByZXNvdXJjZXMgYXQgdGhlIHNhbWUgdGltZS4NCj4gDQo+IFNlZW1z
IGxpa2UgdGhlIG1vc3QgbmF0dXJhbCB3YXkgdG8gdGVtcG9yYXJpbHkgYmxvY2sgc2VuZHMgd291
bGQgYmUNCj4gdG8gZ3JhYiB0aGUgdHJhbnNwb3J0J3Mgd3JpdGUgbG9jaywganVzdCBsaWtlICJj
b25uZWN0IiBkb2VzLCB3aGlsZQ0KPiB0aGUgdHJhbnNwb3J0IGlzIGRlYWxpbmcgd2l0aCBERVZJ
Q0VfUkVNT1ZBTCwgdGhlbiByZWxlYXNlIGl0IG9uY2UNCj4gYWxsIEhXIHJlc291cmNlcyBoYXZl
IGJlZW4gZnJlZWQuDQo+IA0KPiBVbmZvcnR1bmF0ZWx5IGFuIFJQQyB0YXNrIGlzIG5lZWRlZCB0
byBhY3F1aXJlIHRoZSB3cml0ZSBsb2NrLiBCdXQNCj4gZGlzY29ubmVjdCBpcyBqdXN0IGFuIGFz
eW5jaHJvbm91cyBldmVudCwgdGhlcmUgaXMgbm8gUlBDIHRhc2sNCj4gYXNzb2NpYXRlZCB3aXRo
IGl0LCBhbmQgdGh1cyBubyBjb250ZXh0IHRoYXQgdGhlIFJQQyBzY2hlZHVsZXINCj4gY2FuIHB1
dCB0byBzbGVlcCBpZiB0aGVyZSBoYXBwZW5zIHRvIGJlIGFub3RoZXIgUlBDIHNlbmRpbmcgYXQg
dGhlDQo+IG1vbWVudCBhIGRldmljZSByZW1vdmFsIGV2ZW50IG9jY3Vycy4NCj4gDQo+IEkgd2Fz
IGxvb2tpbmcgYXQgeHBydF9sb2NrX2Nvbm5lY3QsIGJ1dCB0aGF0IGRvZXNuJ3QgYXBwZWFyIHRv
IGRvDQo+IHF1aXRlIHdoYXQgSSBuZWVkLg0KPiANCj4gQW5vdGhlciB0aG91Z2h0IHdhcyB0byBo
YXZlIHRoZSBERVZJQ0VfUkVNT1ZBTCB1cGNhbGwgbWFyayB0aGUNCj4gdHJhbnNwb3J0IGRpc2Nv
bm5lY3RlZCwgc2VuZCBhbiBhc3luY2hyb25vdXMgTlVMTCBSUEMsIHRoZW4gd2FpdA0KPiBvbiBh
IGtlcm5lbCB3YWl0cXVldWUuDQo+IA0KPiBUaGUgTlVMTCBSUEMgd291bGQgZ3JhYiB0aGUgd3Jp
dGUgbG9jayBhbmQga2ljayB0aGUgdHJhbnNwb3J0J3MNCj4gY29ubmVjdCB3b3JrZXIuIFRoZSBj
b25uZWN0IHdvcmtlciB3b3VsZCBmcmVlIEhXIHJlc291cmNlcywgdGhlbg0KPiBhd2FrZW4gdGhl
IHdhaXRlci4gVGhlbiB0aGUgdXBjYWxsIGNvdWxkIHJldHVybiB0byB0aGUgZHJpdmVyLg0KPiAN
Cj4gVGhlIHByb2JsZW0gd2l0aCB0aGlzIHNjaGVtZSBpcyB0aGUgc2FtZSBhcyBpdCB3YXMgZm9y
IHRoZQ0KPiBrZWVwYWxpdmUgd29yazogdGhlcmUncyBubyB0YXNrIG9yIHJwY19jbG50IGF2YWls
YWJsZSB0byB0aGUNCj4gREVWSUNFX1JFTU9WQUwgdXBjYWxsLiBTbGVlcGluZyB1bnRpbCB0aGUg
d3JpdGUgbG9jayBpcyBhdmFpbGFibGUNCj4gd291bGQgcmVxdWlyZSBhIHRhc2ssIGFuZCBzZW5k
aW5nIGEgTlVMTCBSUEMgd291bGQgcmVxdWlyZSBhbg0KPiBycGNfY2xudC4NCj4gDQo+IEFueSBh
ZHZpY2UvdGhvdWdodHMgYWJvdXQgdGhpcz8NCj4gDQoNCkNhbiB5b3UgcGVyaGFwcyB1c2UgWFBS
VF9GT1JDRV9ESVNDT05ORUNUPyBUaGF0IGRvZXMgZW5kIHVwIGNhbGxpbmcgdGhlDQp4cHJ0LT5v
cHMtPmNsb3NlKCkgY2FsbGJhY2sgYXMgc29vbiBhcyB0aGUgWFBSVF9MT0NLIHN0YXRlIGhhcyBi
ZWVuDQpmcmVlZC4gWW91IHN0aWxsIHdvbid0IGhhdmUgYSBjbGllbnQsIGJ1dCB5b3Ugd2lsbCBi
ZSBndWFyYW50ZWVkDQpleGNsdXNpdmUgYWNjZXNzIHRvIHRoZSB0cmFuc3BvcnQsIGFuZCB5b3Ug
Y2FuIGRvIHRoaW5ncyBsaWtlIHdha2luZyB1cA0KYW55IHNsZWVwaW5nIHRhc2tzIG9uIHRoZSB0
cmFuc21pdCBhbmQgcmVjZWl2ZSBxdWV1ZSB0byBoZWxwIHlvdS4NCkhvd2V2ZXIgeW91IGFsc28g
aGF2ZSB0byBkZWFsIHdpdGggdGhlIGNhc2Ugd2hlcmUgdGhlIHRyYW5zcG9ydCB3YXMNCmlkbGUg
dG8gc3RhcnQgd2l0aC4NCg0KVGhlIGJpZyBwcm9ibGVtIHRoYXQgeW91IGhhdmUgaGVyZSBpcyB1
bHRpbWF0ZWx5IHRoYXQgdGhlIGxvdyBsZXZlbA0KY29udHJvbCBjaGFubmVsIGZvciB0aGUgdHJh
bnNwb3J0IGFwcGVhcnMgdG8gd2FudCB0byB1c2UgdGhlIFJQQyB1cHBlcg0KbGF5ZXIgZnVuY3Rp
b25hbGl0eSBmb3IgaXRzIGNvbW11bmljYXRpb24gbWVjaGFuaXNtLiBBRkFJQ1MgeW91IHdpbGwN
CmtlZXAgaGl0dGluZyBpc3N1ZXMgYXMgdGhlIGNvbnRyb2wgY2hhbm5lbCBuZWVkcyB0byBjaXJj
dW12ZW50IGFsbCB0aGUNCnF1ZXVlaW5nIGV0YyB0aGF0IHRoZXNlIHVwcGVyIGxheWVycyBhcmUg
ZGVzaWduZWQgdG8gZW5mb3JjZS4NCkdpdmVuIHRoYXQgdGhlc2UgbWVzc2FnZXMgeW91J3JlIHNl
bmRpbmcgYXJlIGp1c3QgbnVsbCBwaW5ncyB3aXRoIG5vDQpwYXlsb2FkIGFuZCBubyBzcGVjaWFs
IGF1dGhlbnRpY2F0aW9uIG5lZWRzIG9yIGFueXRoaW5nIGVsc2UsIG1pZ2h0IGl0DQptYWtlIHNl
bnNlIHRvIGp1c3QgZ2VuZXJhdGUgdGhlbSBpbiB0aGUgUkRNQSBsYXllciBpdHNlbGY/DQoNCi0t
IA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlE
YXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo=
T24gVGh1LCAyMDE3LTAyLTIzIGF0IDAzOjI1ICswMDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6
DQo+IE9uIFdlZCwgMjAxNy0wMi0yMiBhdCAxNjozMSAtMDUwMCwgQ2h1Y2sgTGV2ZXIgd3JvdGU6
DQo+ID4gSGV5IFRyb25kLQ0KPiA+IA0KPiA+IFRvIHN1cHBvcnQgdGhlIGFiaWxpdHkgdG8gdW5s
b2FkIHRoZSB1bmRlcmx5aW5nIFJETUEgZGV2aWNlJ3MNCj4gPiBrZXJuZWwNCj4gPiBkcml2ZXIg
d2hpbGUgTkZTIG1vdW50cyBhcmUgYWN0aXZlLCB4cHJ0cmRtYSBuZWVkcyB0aGUgYWJpbGl0eSB0
bw0KPiA+IHN1c3BlbmQgUlBDIHNlbmRzIHRlbXBvcmFyaWx5IHdoaWxlIHRoZSB0cmFuc3BvcnQg
aGFuZHMgSFcNCj4gPiByZXNvdXJjZXMNCj4gPiBiYWNrIHRvIHRoZSBkcml2ZXIuIE9uY2UgdGhl
IGRldmljZSBkcml2ZXIgaXMgdW5sb2FkZWQsIHRoZSBSRE1BDQo+ID4gdHJhbnNwb3J0IGlzIGxl
ZnQgZGlzY29ubmVjdGVkLCBhbmQgUlBDcyB3aWxsIGJlIHN1c3BlbmRlZCBub3JtYWxseQ0KPiA+
IHVudGlsIGEgY29ubmVjdGlvbiBpcyBwb3NzaWJsZSBhZ2FpbiAoZWcsIGEgbmV3IGRldmljZSBp
cyBtYWRlDQo+ID4gYXZhaWxhYmxlKS4NCj4gPiANCj4gPiBBIERFVklDRV9SRU1PVkFMIGV2ZW50
IGlzIGFuIHVwY2FsbCB0byB4cHJ0cmRtYSB0aGF0IG1heSBzbGVlcC4NCj4gPiBVcG9uDQo+ID4g
aXRzIHJldHVybiwgdGhlIGRldmljZSBkcml2ZXIgdW5sb2FkcyBpdHNlbGYuIEN1cnJlbnRseSBt
eQ0KPiA+IHByb3RvdHlwZQ0KPiA+IGZyZWVzIGFsbCBIVyByZXNvdXJjZXMgZHVyaW5nIHRoZSB1
cGNhbGwsIGJ1dCB0aGF0IGRvZXNuJ3QgYmxvY2sNCj4gPiBuZXcgUlBDcyBmcm9tIHRyeWluZyB0
byB1c2UgdGhvc2UgcmVzb3VyY2VzIGF0IHRoZSBzYW1lIHRpbWUuDQo+ID4gDQo+ID4gU2VlbXMg
bGlrZSB0aGUgbW9zdCBuYXR1cmFsIHdheSB0byB0ZW1wb3JhcmlseSBibG9jayBzZW5kcyB3b3Vs
ZCBiZQ0KPiA+IHRvIGdyYWIgdGhlIHRyYW5zcG9ydCdzIHdyaXRlIGxvY2ssIGp1c3QgbGlrZSAi
Y29ubmVjdCIgZG9lcywgd2hpbGUNCj4gPiB0aGUgdHJhbnNwb3J0IGlzIGRlYWxpbmcgd2l0aCBE
RVZJQ0VfUkVNT1ZBTCwgdGhlbiByZWxlYXNlIGl0IG9uY2UNCj4gPiBhbGwgSFcgcmVzb3VyY2Vz
IGhhdmUgYmVlbiBmcmVlZC4NCj4gPiANCj4gPiBVbmZvcnR1bmF0ZWx5IGFuIFJQQyB0YXNrIGlz
IG5lZWRlZCB0byBhY3F1aXJlIHRoZSB3cml0ZSBsb2NrLiBCdXQNCj4gPiBkaXNjb25uZWN0IGlz
IGp1c3QgYW4gYXN5bmNocm9ub3VzIGV2ZW50LCB0aGVyZSBpcyBubyBSUEMgdGFzaw0KPiA+IGFz
c29jaWF0ZWQgd2l0aCBpdCwgYW5kIHRodXMgbm8gY29udGV4dCB0aGF0IHRoZSBSUEMgc2NoZWR1
bGVyDQo+ID4gY2FuIHB1dCB0byBzbGVlcCBpZiB0aGVyZSBoYXBwZW5zIHRvIGJlIGFub3RoZXIg
UlBDIHNlbmRpbmcgYXQgdGhlDQo+ID4gbW9tZW50IGEgZGV2aWNlIHJlbW92YWwgZXZlbnQgb2Nj
dXJzLg0KPiA+IA0KPiA+IEkgd2FzIGxvb2tpbmcgYXQgeHBydF9sb2NrX2Nvbm5lY3QsIGJ1dCB0
aGF0IGRvZXNuJ3QgYXBwZWFyIHRvIGRvDQo+ID4gcXVpdGUgd2hhdCBJIG5lZWQuDQo+ID4gDQo+
ID4gQW5vdGhlciB0aG91Z2h0IHdhcyB0byBoYXZlIHRoZSBERVZJQ0VfUkVNT1ZBTCB1cGNhbGwg
bWFyayB0aGUNCj4gPiB0cmFuc3BvcnQgZGlzY29ubmVjdGVkLCBzZW5kIGFuIGFzeW5jaHJvbm91
cyBOVUxMIFJQQywgdGhlbiB3YWl0DQo+ID4gb24gYSBrZXJuZWwgd2FpdHF1ZXVlLg0KPiA+IA0K
PiA+IFRoZSBOVUxMIFJQQyB3b3VsZCBncmFiIHRoZSB3cml0ZSBsb2NrIGFuZCBraWNrIHRoZSB0
cmFuc3BvcnQncw0KPiA+IGNvbm5lY3Qgd29ya2VyLiBUaGUgY29ubmVjdCB3b3JrZXIgd291bGQg
ZnJlZSBIVyByZXNvdXJjZXMsIHRoZW4NCj4gPiBhd2FrZW4gdGhlIHdhaXRlci4gVGhlbiB0aGUg
dXBjYWxsIGNvdWxkIHJldHVybiB0byB0aGUgZHJpdmVyLg0KPiA+IA0KPiA+IFRoZSBwcm9ibGVt
IHdpdGggdGhpcyBzY2hlbWUgaXMgdGhlIHNhbWUgYXMgaXQgd2FzIGZvciB0aGUNCj4gPiBrZWVw
YWxpdmUgd29yazogdGhlcmUncyBubyB0YXNrIG9yIHJwY19jbG50IGF2YWlsYWJsZSB0byB0aGUN
Cj4gPiBERVZJQ0VfUkVNT1ZBTCB1cGNhbGwuIFNsZWVwaW5nIHVudGlsIHRoZSB3cml0ZSBsb2Nr
IGlzIGF2YWlsYWJsZQ0KPiA+IHdvdWxkIHJlcXVpcmUgYSB0YXNrLCBhbmQgc2VuZGluZyBhIE5V
TEwgUlBDIHdvdWxkIHJlcXVpcmUgYW4NCj4gPiBycGNfY2xudC4NCj4gPiANCj4gPiBBbnkgYWR2
aWNlL3Rob3VnaHRzIGFib3V0IHRoaXM/DQo+ID4gDQo+IA0KPiBDYW4geW91IHBlcmhhcHMgdXNl
IFhQUlRfRk9SQ0VfRElTQ09OTkVDVD8gVGhhdCBkb2VzIGVuZCB1cCBjYWxsaW5nIA0KDQpTb3Jy
eS4gRHVubm8gaG93IHRoYXQgZW5kZWQgdXAgYWxsLWNhcHMuIEkgZGlkIG1lYW4NCnhwcnRfZm9y
Y2VfZGlzY29ubmVjdCgpLg0KDQo+IHRoZQ0KPiB4cHJ0LT5vcHMtPmNsb3NlKCkgY2FsbGJhY2sg
YXMgc29vbiBhcyB0aGUgWFBSVF9MT0NLIHN0YXRlIGhhcyBiZWVuDQo+IGZyZWVkLiBZb3Ugc3Rp
bGwgd29uJ3QgaGF2ZSBhIGNsaWVudCwgYnV0IHlvdSB3aWxsIGJlIGd1YXJhbnRlZWQNCj4gZXhj
bHVzaXZlIGFjY2VzcyB0byB0aGUgdHJhbnNwb3J0LCBhbmQgeW91IGNhbiBkbyB0aGluZ3MgbGlr
ZSB3YWtpbmcNCj4gdXANCj4gYW55IHNsZWVwaW5nIHRhc2tzIG9uIHRoZSB0cmFuc21pdCBhbmQg
cmVjZWl2ZSBxdWV1ZSB0byBoZWxwIHlvdS4NCj4gSG93ZXZlciB5b3UgYWxzbyBoYXZlIHRvIGRl
YWwgd2l0aCB0aGUgY2FzZSB3aGVyZSB0aGUgdHJhbnNwb3J0IHdhcw0KPiBpZGxlIHRvIHN0YXJ0
IHdpdGguDQo+IA0KPiBUaGUgYmlnIHByb2JsZW0gdGhhdCB5b3UgaGF2ZSBoZXJlIGlzIHVsdGlt
YXRlbHkgdGhhdCB0aGUgbG93IGxldmVsDQo+IGNvbnRyb2wgY2hhbm5lbCBmb3IgdGhlIHRyYW5z
cG9ydCBhcHBlYXJzIHRvIHdhbnQgdG8gdXNlIHRoZSBSUEMNCj4gdXBwZXINCj4gbGF5ZXIgZnVu
Y3Rpb25hbGl0eSBmb3IgaXRzIGNvbW11bmljYXRpb24gbWVjaGFuaXNtLiBBRkFJQ1MgeW91IHdp
bGwNCj4ga2VlcCBoaXR0aW5nIGlzc3VlcyBhcyB0aGUgY29udHJvbCBjaGFubmVsIG5lZWRzIHRv
IGNpcmN1bXZlbnQgYWxsDQo+IHRoZQ0KPiBxdWV1ZWluZyBldGMgdGhhdCB0aGVzZSB1cHBlciBs
YXllcnMgYXJlIGRlc2lnbmVkIHRvIGVuZm9yY2UuDQo+IEdpdmVuIHRoYXQgdGhlc2UgbWVzc2Fn
ZXMgeW91J3JlIHNlbmRpbmcgYXJlIGp1c3QgbnVsbCBwaW5ncyB3aXRoIG5vDQo+IHBheWxvYWQg
YW5kIG5vIHNwZWNpYWwgYXV0aGVudGljYXRpb24gbmVlZHMgb3IgYW55dGhpbmcgZWxzZSwgbWln
aHQNCj4gaXQNCj4gbWFrZSBzZW5zZSB0byBqdXN0IGdlbmVyYXRlIHRoZW0gaW4gdGhlIFJETUEg
bGF5ZXIgaXRzZWxmPw0KPiANCj4gLS3CoA0KPiBUcm9uZCBNeWtsZWJ1c3QNCj4gTGludXggTkZT
IGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KPiB0cm9uZC5teWtsZWJ1c3RAcHJpbWFy
eWRhdGEuY29tDQo+IE7vv73vv73vv73vv73vv71y77+977+9ee+/ve+/ve+/vWLvv71Y77+977+9
x6d277+9Xu+/vSneunsubu+/vSvvv73vv73vv73vv71777+977+977+9Iu+/ve+/vV5u77+9cu+/
ve+/ve+/vXrvv70a77+977+9aO+/ve+/ve+/ve+/vSbvv73vv70e77+9R++/ve+/ve+/vWjvv70N
Cj4gAyjvv73pmo7vv73domoi77+977+9Gu+/vRtt77+977+977+977+977+9eu+/vd6W77+977+9
77+9Zu+/ve+/ve+/vWjvv73vv73vv71+77+9be+/vQ0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxp
bnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBw
cmltYXJ5ZGF0YS5jb20NCg==
> On Feb 22, 2017, at 10:25 PM, Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2017-02-22 at 16:31 -0500, Chuck Lever wrote:
>> Hey Trond-
>>
>> To support the ability to unload the underlying RDMA device's kernel
>> driver while NFS mounts are active, xprtrdma needs the ability to
>> suspend RPC sends temporarily while the transport hands HW resources
>> back to the driver. Once the device driver is unloaded, the RDMA
>> transport is left disconnected, and RPCs will be suspended normally
>> until a connection is possible again (eg, a new device is made
>> available).
>>
>> A DEVICE_REMOVAL event is an upcall to xprtrdma that may sleep. Upon
>> its return, the device driver unloads itself. Currently my prototype
>> frees all HW resources during the upcall, but that doesn't block
>> new RPCs from trying to use those resources at the same time.
>>
>> Seems like the most natural way to temporarily block sends would be
>> to grab the transport's write lock, just like "connect" does, while
>> the transport is dealing with DEVICE_REMOVAL, then release it once
>> all HW resources have been freed.
>>
>> Unfortunately an RPC task is needed to acquire the write lock. But
>> disconnect is just an asynchronous event, there is no RPC task
>> associated with it, and thus no context that the RPC scheduler
>> can put to sleep if there happens to be another RPC sending at the
>> moment a device removal event occurs.
>>
>> I was looking at xprt_lock_connect, but that doesn't appear to do
>> quite what I need.
>>
>> Another thought was to have the DEVICE_REMOVAL upcall mark the
>> transport disconnected, send an asynchronous NULL RPC, then wait
>> on a kernel waitqueue.
>>
>> The NULL RPC would grab the write lock and kick the transport's
>> connect worker. The connect worker would free HW resources, then
>> awaken the waiter. Then the upcall could return to the driver.
>>
>> The problem with this scheme is the same as it was for the
>> keepalive work: there's no task or rpc_clnt available to the
>> DEVICE_REMOVAL upcall. Sleeping until the write lock is available
>> would require a task, and sending a NULL RPC would require an
>> rpc_clnt.
>>
>> Any advice/thoughts about this?
>>
>
> Can you perhaps use XPRT_FORCE_DISCONNECT? That does end up calling the
> xprt->ops->close() callback as soon as the XPRT_LOCK state has been
> freed. You still won't have a client, but you will be guaranteed
> exclusive access to the transport, and you can do things like waking up
> any sleeping tasks on the transmit and receive queue to help you.
> However you also have to deal with the case where the transport was
> idle to start with.
The case where the transport is idle is indeed the most bothersome.
I hadn't realized that ->close guaranteed exclusive access to
the transport. If ->close guarantees access to the transport, then
I don't need an RPC task or the NULL RPC because we don't ever have
to wait for write access.
The hardware resources can be torn down immediately in
xprt_rdma_close.
> The big problem that you have here is ultimately that the low level
> control channel for the transport appears to want to use the RPC upper
> layer functionality for its communication mechanism. AFAICS you will
> keep hitting issues as the control channel needs to circumvent all the
> queueing etc that these upper layers are designed to enforce.
> Given that these messages you're sending are just null pings with no
> payload and no special authentication needs or anything else, might it
> make sense to just generate them in the RDMA layer itself?
(I don't really need a NULL RPC for the DEVICE_REMOVAL case. I
just need an RPC task).
There seems to be a strong community belief that code duplication
is bad. Constructing even a NULL RPC without using the existing
infrastructure would duplicate a large amount of code.
Eventually RPC-over-RDMA should have its own ping mechanism,
probably. But Version One doesn't.
An alternative to ping here would be to simply force the
connection to disconnect if an RPC times out. That can be done
trivially in the timer call out, for example.
--
Chuck Lever
T24gVGh1LCAyMDE3LTAyLTIzIGF0IDA5OjU0IC0wNTAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
PiBPbiBGZWIgMjIsIDIwMTcsIGF0IDEwOjI1IFBNLCBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kbXlA
cHJpbWFyeWRhdGEuDQo+ID4gY29tPiB3cm90ZToNCj4gPiANCj4gPiBPbiBXZWQsIDIwMTctMDIt
MjIgYXQgMTY6MzEgLTA1MDAsIENodWNrIExldmVyIHdyb3RlOg0KPiA+ID4gSGV5IFRyb25kLQ0K
PiA+ID4gDQo+ID4gPiBUbyBzdXBwb3J0IHRoZSBhYmlsaXR5IHRvIHVubG9hZCB0aGUgdW5kZXJs
eWluZyBSRE1BIGRldmljZSdzDQo+ID4gPiBrZXJuZWwNCj4gPiA+IGRyaXZlciB3aGlsZSBORlMg
bW91bnRzIGFyZSBhY3RpdmUsIHhwcnRyZG1hIG5lZWRzIHRoZSBhYmlsaXR5IHRvDQo+ID4gPiBz
dXNwZW5kIFJQQyBzZW5kcyB0ZW1wb3JhcmlseSB3aGlsZSB0aGUgdHJhbnNwb3J0IGhhbmRzIEhX
DQo+ID4gPiByZXNvdXJjZXMNCj4gPiA+IGJhY2sgdG8gdGhlIGRyaXZlci4gT25jZSB0aGUgZGV2
aWNlIGRyaXZlciBpcyB1bmxvYWRlZCwgdGhlIFJETUENCj4gPiA+IHRyYW5zcG9ydCBpcyBsZWZ0
IGRpc2Nvbm5lY3RlZCwgYW5kIFJQQ3Mgd2lsbCBiZSBzdXNwZW5kZWQNCj4gPiA+IG5vcm1hbGx5
DQo+ID4gPiB1bnRpbCBhIGNvbm5lY3Rpb24gaXMgcG9zc2libGUgYWdhaW4gKGVnLCBhIG5ldyBk
ZXZpY2UgaXMgbWFkZQ0KPiA+ID4gYXZhaWxhYmxlKS4NCj4gPiA+IA0KPiA+ID4gQSBERVZJQ0Vf
UkVNT1ZBTCBldmVudCBpcyBhbiB1cGNhbGwgdG8geHBydHJkbWEgdGhhdCBtYXkgc2xlZXAuDQo+
ID4gPiBVcG9uDQo+ID4gPiBpdHMgcmV0dXJuLCB0aGUgZGV2aWNlIGRyaXZlciB1bmxvYWRzIGl0
c2VsZi4gQ3VycmVudGx5IG15DQo+ID4gPiBwcm90b3R5cGUNCj4gPiA+IGZyZWVzIGFsbCBIVyBy
ZXNvdXJjZXMgZHVyaW5nIHRoZSB1cGNhbGwsIGJ1dCB0aGF0IGRvZXNuJ3QgYmxvY2sNCj4gPiA+
IG5ldyBSUENzIGZyb20gdHJ5aW5nIHRvIHVzZSB0aG9zZSByZXNvdXJjZXMgYXQgdGhlIHNhbWUg
dGltZS4NCj4gPiA+IA0KPiA+ID4gU2VlbXMgbGlrZSB0aGUgbW9zdCBuYXR1cmFsIHdheSB0byB0
ZW1wb3JhcmlseSBibG9jayBzZW5kcyB3b3VsZA0KPiA+ID4gYmUNCj4gPiA+IHRvIGdyYWIgdGhl
IHRyYW5zcG9ydCdzIHdyaXRlIGxvY2ssIGp1c3QgbGlrZSAiY29ubmVjdCIgZG9lcywNCj4gPiA+
IHdoaWxlDQo+ID4gPiB0aGUgdHJhbnNwb3J0IGlzIGRlYWxpbmcgd2l0aCBERVZJQ0VfUkVNT1ZB
TCwgdGhlbiByZWxlYXNlIGl0DQo+ID4gPiBvbmNlDQo+ID4gPiBhbGwgSFcgcmVzb3VyY2VzIGhh
dmUgYmVlbiBmcmVlZC4NCj4gPiA+IA0KPiA+ID4gVW5mb3J0dW5hdGVseSBhbiBSUEMgdGFzayBp
cyBuZWVkZWQgdG8gYWNxdWlyZSB0aGUgd3JpdGUgbG9jay4NCj4gPiA+IEJ1dA0KPiA+ID4gZGlz
Y29ubmVjdCBpcyBqdXN0IGFuIGFzeW5jaHJvbm91cyBldmVudCwgdGhlcmUgaXMgbm8gUlBDIHRh
c2sNCj4gPiA+IGFzc29jaWF0ZWQgd2l0aCBpdCwgYW5kIHRodXMgbm8gY29udGV4dCB0aGF0IHRo
ZSBSUEMgc2NoZWR1bGVyDQo+ID4gPiBjYW4gcHV0IHRvIHNsZWVwIGlmIHRoZXJlIGhhcHBlbnMg
dG8gYmUgYW5vdGhlciBSUEMgc2VuZGluZyBhdA0KPiA+ID4gdGhlDQo+ID4gPiBtb21lbnQgYSBk
ZXZpY2UgcmVtb3ZhbCBldmVudCBvY2N1cnMuDQo+ID4gPiANCj4gPiA+IEkgd2FzIGxvb2tpbmcg
YXQgeHBydF9sb2NrX2Nvbm5lY3QsIGJ1dCB0aGF0IGRvZXNuJ3QgYXBwZWFyIHRvIGRvDQo+ID4g
PiBxdWl0ZSB3aGF0IEkgbmVlZC4NCj4gPiA+IA0KPiA+ID4gQW5vdGhlciB0aG91Z2h0IHdhcyB0
byBoYXZlIHRoZSBERVZJQ0VfUkVNT1ZBTCB1cGNhbGwgbWFyayB0aGUNCj4gPiA+IHRyYW5zcG9y
dCBkaXNjb25uZWN0ZWQsIHNlbmQgYW4gYXN5bmNocm9ub3VzIE5VTEwgUlBDLCB0aGVuIHdhaXQN
Cj4gPiA+IG9uIGEga2VybmVsIHdhaXRxdWV1ZS4NCj4gPiA+IA0KPiA+ID4gVGhlIE5VTEwgUlBD
IHdvdWxkIGdyYWIgdGhlIHdyaXRlIGxvY2sgYW5kIGtpY2sgdGhlIHRyYW5zcG9ydCdzDQo+ID4g
PiBjb25uZWN0IHdvcmtlci4gVGhlIGNvbm5lY3Qgd29ya2VyIHdvdWxkIGZyZWUgSFcgcmVzb3Vy
Y2VzLCB0aGVuDQo+ID4gPiBhd2FrZW4gdGhlIHdhaXRlci4gVGhlbiB0aGUgdXBjYWxsIGNvdWxk
IHJldHVybiB0byB0aGUgZHJpdmVyLg0KPiA+ID4gDQo+ID4gPiBUaGUgcHJvYmxlbSB3aXRoIHRo
aXMgc2NoZW1lIGlzIHRoZSBzYW1lIGFzIGl0IHdhcyBmb3IgdGhlDQo+ID4gPiBrZWVwYWxpdmUg
d29yazogdGhlcmUncyBubyB0YXNrIG9yIHJwY19jbG50IGF2YWlsYWJsZSB0byB0aGUNCj4gPiA+
IERFVklDRV9SRU1PVkFMIHVwY2FsbC4gU2xlZXBpbmcgdW50aWwgdGhlIHdyaXRlIGxvY2sgaXMg
YXZhaWxhYmxlDQo+ID4gPiB3b3VsZCByZXF1aXJlIGEgdGFzaywgYW5kIHNlbmRpbmcgYSBOVUxM
IFJQQyB3b3VsZCByZXF1aXJlIGFuDQo+ID4gPiBycGNfY2xudC4NCj4gPiA+IA0KPiA+ID4gQW55
IGFkdmljZS90aG91Z2h0cyBhYm91dCB0aGlzPw0KPiA+ID4gDQo+ID4gDQo+ID4gQ2FuIHlvdSBw
ZXJoYXBzIHVzZSBYUFJUX0ZPUkNFX0RJU0NPTk5FQ1Q/IFRoYXQgZG9lcyBlbmQgdXAgY2FsbGlu
Zw0KPiA+IHRoZQ0KPiA+IHhwcnQtPm9wcy0+Y2xvc2UoKSBjYWxsYmFjayBhcyBzb29uIGFzIHRo
ZSBYUFJUX0xPQ0sgc3RhdGUgaGFzIGJlZW4NCj4gPiBmcmVlZC4gWW91IHN0aWxsIHdvbid0IGhh
dmUgYSBjbGllbnQsIGJ1dCB5b3Ugd2lsbCBiZSBndWFyYW50ZWVkDQo+ID4gZXhjbHVzaXZlIGFj
Y2VzcyB0byB0aGUgdHJhbnNwb3J0LCBhbmQgeW91IGNhbiBkbyB0aGluZ3MgbGlrZQ0KPiA+IHdh
a2luZyB1cA0KPiA+IGFueSBzbGVlcGluZyB0YXNrcyBvbiB0aGUgdHJhbnNtaXQgYW5kIHJlY2Vp
dmUgcXVldWUgdG8gaGVscCB5b3UuDQo+ID4gSG93ZXZlciB5b3UgYWxzbyBoYXZlIHRvIGRlYWwg
d2l0aCB0aGUgY2FzZSB3aGVyZSB0aGUgdHJhbnNwb3J0IHdhcw0KPiA+IGlkbGUgdG8gc3RhcnQg
d2l0aC4NCj4gDQo+IFRoZSBjYXNlIHdoZXJlIHRoZSB0cmFuc3BvcnQgaXMgaWRsZSBpcyBpbmRl
ZWQgdGhlIG1vc3QgYm90aGVyc29tZS4NCj4gDQo+IEkgaGFkbid0IHJlYWxpemVkIHRoYXQgLT5j
bG9zZSBndWFyYW50ZWVkIGV4Y2x1c2l2ZSBhY2Nlc3MgdG8NCj4gdGhlIHRyYW5zcG9ydC4gSWYg
LT5jbG9zZSBndWFyYW50ZWVzIGFjY2VzcyB0byB0aGUgdHJhbnNwb3J0LCB0aGVuDQo+IEkgZG9u
J3QgbmVlZCBhbiBSUEMgdGFzayBvciB0aGUgTlVMTCBSUEMgYmVjYXVzZSB3ZSBkb24ndCBldmVy
IGhhdmUNCj4gdG8gd2FpdCBmb3Igd3JpdGUgYWNjZXNzLg0KPiANCg0KWWVzLCAtPmNsb3NlKCkg
bXVzdCBhbHdheXMgYmUgdGFrZW4gd2l0aCBYUFJUX0xPQ0sgc2V0LCBhbmQgaXQgaXMNCmd1YXJh
bnRlZWQgdG8gYmUgY2FsbGVkIHRoZSBpbnN0YW50IHRoYXQgdGhlIGN1cnJlbnQgaG9sZGVyIG9m
DQpYUFJUX0xPQ0sgcmVsaW5xdWlzaGVzIGl0LiBUaGlzIGlzIHdoeSB3ZSBoYXZlIHRoZSB0ZXN0
IGZvcg0KWFBSVF9DTE9TRV9XQUlUIGluIHhwcnRfY2xlYXJfbG9ja2VkKCk7IGl0IHNob3J0LWNp
cmN1aXRzIHRoZSB3YWl0DQpxdWV1ZXMgYW5kIGltbWVkaWF0ZWx5IHN0YXJ0cyBxdWV1ZV93b3Jr
KHhwcnRpb2Rfd29ya3F1ZXVlLCAmeHBydC0NCj50YXNrX2NsZWFudXApLg0KDQo+IFRoZSBoYXJk
d2FyZSByZXNvdXJjZXMgY2FuIGJlIHRvcm4gZG93biBpbW1lZGlhdGVseSBpbg0KPiB4cHJ0X3Jk
bWFfY2xvc2UuDQoNClNvdW5kcyBsaWtlIHRoZSByaWdodCB0aGluZyB0byBkbyB0aGVuLg0KDQo+
ID4gVGhlIGJpZyBwcm9ibGVtIHRoYXQgeW91IGhhdmUgaGVyZSBpcyB1bHRpbWF0ZWx5IHRoYXQg
dGhlIGxvdyBsZXZlbA0KPiA+IGNvbnRyb2wgY2hhbm5lbCBmb3IgdGhlIHRyYW5zcG9ydCBhcHBl
YXJzIHRvIHdhbnQgdG8gdXNlIHRoZSBSUEMNCj4gPiB1cHBlcg0KPiA+IGxheWVyIGZ1bmN0aW9u
YWxpdHkgZm9yIGl0cyBjb21tdW5pY2F0aW9uIG1lY2hhbmlzbS4gQUZBSUNTIHlvdQ0KPiA+IHdp
bGwNCj4gPiBrZWVwIGhpdHRpbmcgaXNzdWVzIGFzIHRoZSBjb250cm9sIGNoYW5uZWwgbmVlZHMg
dG8gY2lyY3VtdmVudCBhbGwNCj4gPiB0aGUNCj4gPiBxdWV1ZWluZyBldGMgdGhhdCB0aGVzZSB1
cHBlciBsYXllcnMgYXJlIGRlc2lnbmVkIHRvIGVuZm9yY2UuDQo+ID4gR2l2ZW4gdGhhdCB0aGVz
ZSBtZXNzYWdlcyB5b3UncmUgc2VuZGluZyBhcmUganVzdCBudWxsIHBpbmdzIHdpdGgNCj4gPiBu
bw0KPiA+IHBheWxvYWQgYW5kIG5vIHNwZWNpYWwgYXV0aGVudGljYXRpb24gbmVlZHMgb3IgYW55
dGhpbmcgZWxzZSwgbWlnaHQNCj4gPiBpdA0KPiA+IG1ha2Ugc2Vuc2UgdG8ganVzdCBnZW5lcmF0
ZSB0aGVtIGluIHRoZSBSRE1BIGxheWVyIGl0c2VsZj8NCj4gDQo+IChJIGRvbid0IHJlYWxseSBu
ZWVkIGEgTlVMTCBSUEMgZm9yIHRoZSBERVZJQ0VfUkVNT1ZBTCBjYXNlLiBJDQo+IGp1c3QgbmVl
ZCBhbiBSUEMgdGFzaykuDQo+IA0KPiBUaGVyZSBzZWVtcyB0byBiZSBhIHN0cm9uZyBjb21tdW5p
dHkgYmVsaWVmIHRoYXQgY29kZSBkdXBsaWNhdGlvbg0KPiBpcyBiYWQuIENvbnN0cnVjdGluZyBl
dmVuIGEgTlVMTCBSUEMgd2l0aG91dCB1c2luZyB0aGUgZXhpc3RpbmcNCj4gaW5mcmFzdHJ1Y3R1
cmUgd291bGQgZHVwbGljYXRlIGEgbGFyZ2UgYW1vdW50IG9mIGNvZGUuDQo+DQo+IEV2ZW50dWFs
bHkgUlBDLW92ZXItUkRNQSBzaG91bGQgaGF2ZSBpdHMgb3duIHBpbmcgbWVjaGFuaXNtLA0KPiBw
cm9iYWJseS4gQnV0IFZlcnNpb24gT25lIGRvZXNuJ3QuDQo+IA0KPiBBbiBhbHRlcm5hdGl2ZSB0
byBwaW5nIGhlcmUgd291bGQgYmUgdG8gc2ltcGx5IGZvcmNlIHRoZQ0KPiBjb25uZWN0aW9uIHRv
IGRpc2Nvbm5lY3QgaWYgYW4gUlBDIHRpbWVzIG91dC4gVGhhdCBjYW4gYmUgZG9uZQ0KPiB0cml2
aWFsbHkgaW4gdGhlIHRpbWVyIGNhbGwgb3V0LCBmb3IgZXhhbXBsZS4NCg0KUmlnaHQuIEkgYWdy
ZWUgdGhhdCB0aGUgLT50aW1lcigpIGNhbGxiYWNrIHdpbGwgd29yayBqdXN0IGZpbmUgZm9yDQp0
aGF0Lg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVy
LCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K
> On Feb 23, 2017, at 7:30 AM, Trond Myklebust <[email protected]> wrote:
>
> On Thu, 2017-02-23 at 09:54 -0500, Chuck Lever wrote:
>>> On Feb 22, 2017, at 10:25 PM, Trond Myklebust <trondmy@primarydata.
>>> com> wrote:
>>>
>>> On Wed, 2017-02-22 at 16:31 -0500, Chuck Lever wrote:
>>>> Hey Trond-
>>>>
>>>> To support the ability to unload the underlying RDMA device's
>>>> kernel
>>>> driver while NFS mounts are active, xprtrdma needs the ability to
>>>> suspend RPC sends temporarily while the transport hands HW
>>>> resources
>>>> back to the driver. Once the device driver is unloaded, the RDMA
>>>> transport is left disconnected, and RPCs will be suspended
>>>> normally
>>>> until a connection is possible again (eg, a new device is made
>>>> available).
>>>>
>>>> A DEVICE_REMOVAL event is an upcall to xprtrdma that may sleep.
>>>> Upon
>>>> its return, the device driver unloads itself. Currently my
>>>> prototype
>>>> frees all HW resources during the upcall, but that doesn't block
>>>> new RPCs from trying to use those resources at the same time.
>>>>
>>>> Seems like the most natural way to temporarily block sends would
>>>> be
>>>> to grab the transport's write lock, just like "connect" does,
>>>> while
>>>> the transport is dealing with DEVICE_REMOVAL, then release it
>>>> once
>>>> all HW resources have been freed.
>>>>
>>>> Unfortunately an RPC task is needed to acquire the write lock.
>>>> But
>>>> disconnect is just an asynchronous event, there is no RPC task
>>>> associated with it, and thus no context that the RPC scheduler
>>>> can put to sleep if there happens to be another RPC sending at
>>>> the
>>>> moment a device removal event occurs.
>>>>
>>>> I was looking at xprt_lock_connect, but that doesn't appear to do
>>>> quite what I need.
>>>>
>>>> Another thought was to have the DEVICE_REMOVAL upcall mark the
>>>> transport disconnected, send an asynchronous NULL RPC, then wait
>>>> on a kernel waitqueue.
>>>>
>>>> The NULL RPC would grab the write lock and kick the transport's
>>>> connect worker. The connect worker would free HW resources, then
>>>> awaken the waiter. Then the upcall could return to the driver.
>>>>
>>>> The problem with this scheme is the same as it was for the
>>>> keepalive work: there's no task or rpc_clnt available to the
>>>> DEVICE_REMOVAL upcall. Sleeping until the write lock is available
>>>> would require a task, and sending a NULL RPC would require an
>>>> rpc_clnt.
>>>>
>>>> Any advice/thoughts about this?
>>>>
>>>
>>> Can you perhaps use XPRT_FORCE_DISCONNECT? That does end up calling
>>> the
>>> xprt->ops->close() callback as soon as the XPRT_LOCK state has been
>>> freed. You still won't have a client, but you will be guaranteed
>>> exclusive access to the transport, and you can do things like
>>> waking up
>>> any sleeping tasks on the transmit and receive queue to help you.
>>> However you also have to deal with the case where the transport was
>>> idle to start with.
>>
>> The case where the transport is idle is indeed the most bothersome.
>>
>> I hadn't realized that ->close guaranteed exclusive access to
>> the transport. If ->close guarantees access to the transport, then
>> I don't need an RPC task or the NULL RPC because we don't ever have
>> to wait for write access.
>>
>
> Yes, ->close() must always be taken with XPRT_LOCK set, and it is
> guaranteed to be called the instant that the current holder of
> XPRT_LOCK relinquishes it. This is why we have the test for
> XPRT_CLOSE_WAIT in xprt_clear_locked(); it short-circuits the wait
> queues and immediately starts queue_work(xprtiod_workqueue, &xprt-
>> task_cleanup).
>
>> The hardware resources can be torn down immediately in
>> xprt_rdma_close.
>
> Sounds like the right thing to do then.
This approach seems to be working well, except for one
unexpected behavior.
->send_request seems to be called even though
XPRT_CONNECTED is clear. I expected that after a transport
has been disconnected, ->send_request would never be
invoked when that bit is clear.
Occasionally, ->send_request is invoked with
XPRT_CONNECTING set.
If those are OK, then I will have something to post for
review next week.
--
Chuck Lever