Dear NFS fellows,
we have notice a behavior of nfs client when iterating over a big
directory. The client re-requests entries that already has been. For
example, a client issues READDIR on a directory with 1k files. Initial cookie
is 0, maxcount 32768.
c -> s cookie 0
s -> c last cookie 159
c -> s cookie 105
s -> c last cookie 259
c -> s cookie 207
...
and so on. The interesting thing is, if I mount with rsize 8192 (maxcount 8192), then first couple
or requests are asking for correct cookies - 0, 43, 81, 105. Again 105 as with maxcount 32678. To
me it looks like that there is some kind of internal page (actually NFS_MAX_READDIR_PAGES) alignment
and entries which do not fit into initially allocated PAGE_SIZE * NFS_MAX_READDIR_PAGES memory
just get dropped.
As 30% of each reply is thrown away, listing of large directories may produce much more requests
than required.
Is it an expected behavior?
Thanks in advance,
Tigran.
T24gU2F0LCAyMDE4LTEwLTIwIGF0IDAwOjExICswMjAwLCBNa3J0Y2h5YW4sIFRpZ3JhbiB3cm90
ZToNCj4gRGVhciBORlMgZmVsbG93cywNCj4gDQo+IHdlIGhhdmUgbm90aWNlIGEgYmVoYXZpb3Ig
b2YgbmZzIGNsaWVudCB3aGVuIGl0ZXJhdGluZyBvdmVyIGEgYmlnDQo+IGRpcmVjdG9yeS4gVGhl
IGNsaWVudCByZS1yZXF1ZXN0cyBlbnRyaWVzIHRoYXQgYWxyZWFkeSBoYXMgYmVlbi4gRm9yDQo+
IGV4YW1wbGUsIGEgY2xpZW50IGlzc3VlcyBSRUFERElSIG9uIGEgZGlyZWN0b3J5IHdpdGggMWsg
ZmlsZXMuDQo+IEluaXRpYWwgY29va2llDQo+IGlzIDAsIG1heGNvdW50IDMyNzY4Lg0KPiANCj4g
YyAtPiBzIGNvb2tpZSAwDQo+IHMgLT4gYyBsYXN0IGNvb2tpZSAxNTkNCj4gYyAtPiBzIGNvb2tp
ZSAxMDUNCj4gcyAtPiBjIGxhc3QgY29va2llIDI1OQ0KPiBjIC0+IHMgY29va2llIDIwNw0KPiAN
Cj4gLi4uDQo+IA0KPiBhbmQgc28gb24uIFRoZSBpbnRlcmVzdGluZyB0aGluZyBpcywgaWYgSSBt
b3VudCB3aXRoIHJzaXplIDgxOTINCj4gKG1heGNvdW50IDgxOTIpLCB0aGVuIGZpcnN0IGNvdXBs
ZQ0KPiBvciByZXF1ZXN0cyBhcmUgYXNraW5nIGZvciBjb3JyZWN0IGNvb2tpZXMgLSAwLCA0Mywg
ODEsIDEwNS4gQWdhaW4NCj4gMTA1IGFzIHdpdGggbWF4Y291bnQgMzI2NzguIFRvDQo+IG1lIGl0
IGxvb2tzIGxpa2UgdGhhdCB0aGVyZSBpcyBzb21lIGtpbmQgb2YgaW50ZXJuYWwgcGFnZSAoYWN0
dWFsbHkNCj4gTkZTX01BWF9SRUFERElSX1BBR0VTKSBhbGlnbm1lbnQNCj4gYW5kIGVudHJpZXMg
d2hpY2ggZG8gbm90IGZpdCBpbnRvIGluaXRpYWxseSBhbGxvY2F0ZWQgUEFHRV9TSVpFICoNCj4g
TkZTX01BWF9SRUFERElSX1BBR0VTIG1lbW9yeQ0KPiBqdXN0IGdldCBkcm9wcGVkLg0KPiANCj4g
QXMgMzAlIG9mIGVhY2ggcmVwbHkgaXMgdGhyb3duIGF3YXksIGxpc3Rpbmcgb2YgbGFyZ2UgZGly
ZWN0b3JpZXMgbWF5DQo+IHByb2R1Y2UgbXVjaCBtb3JlIHJlcXVlc3RzDQo+IHRoYW4gcmVxdWly
ZWQuDQo+IA0KPiBJcyBpdCBhbiBleHBlY3RlZCBiZWhhdmlvcj8NCg0KWWVzLg0KDQotLSANClRy
b25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBIYW1tZXJzcGFjZQ0K
dHJvbmQubXlrbGVidXN0QGhhbW1lcnNwYWNlLmNvbQ0KDQoNCg==
On 10/19/2018 05:11 PM, Mkrtchyan, Tigran wrote:
>
> Dear NFS fellows,
>
> we have notice a behavior of nfs client when iterating over a big
> directory. The client re-requests entries that already has been. For
> example, a client issues READDIR on a directory with 1k files. Initial cookie
> is 0, maxcount 32768.
>
> c -> s cookie 0
> s -> c last cookie 159
> c -> s cookie 105
> s -> c last cookie 259
> c -> s cookie 207
>
> ...
>
> and so on. The interesting thing is, if I mount with rsize 8192 (maxcount 8192), then first couple
> or requests are asking for correct cookies - 0, 43, 81, 105. Again 105 as with maxcount 32678. To
> me it looks like that there is some kind of internal page (actually NFS_MAX_READDIR_PAGES) alignment
> and entries which do not fit into initially allocated PAGE_SIZE * NFS_MAX_READDIR_PAGES memory
> just get dropped.
>
> As 30% of each reply is thrown away, listing of large directories may produce much more requests
> than required.
>
> Is it an expected behavior?
Expected based on how readdir entries are handled on the client, though
you are probably correct that there is room for improvement (but may not
be the largest opportunity in the readdir code).
The number of excess directory entries retrieved from the server will
vary based on a number of factors, including kernel version, filename
length, rsize, etc.
On the client, each page of the directory inode's address space contains
an nfs_cache_array:
struct nfs_cache_array {
unsigned int size;
int eof_index;
u64 last_cookie;
struct nfs_cache_array_entry array[];
}
(size: 16)
with the array of nfs_cache_array_entry structs extending to the end of
the page:
struct nfs_cache_array_entry {
u64 cookie;
u64 ino;
struct qstr string;
unsigned char d_type;
}
(size: 40)
this means that each page can hold 102 entries:
$ echo "(4096-16)/40"|bc
102
Actual behavior depends on a number of factors, including the client's
kernel version but with the cookie sequence you mention, I suspect the
following is occurring:
* client
READDIR call with cookie 0
* server
READDIR reply with 156 entries, cookies numbered ?-159 (4?)
* client
keeps 102 entries (numbered 4-105)
READDIR call with cookie 105
* server
READDIR reply with 154 entries, cookies numbered 106-259
* client
keeps 102 entries (numbered 106-207)
READDIR call with cookie 207
* server
READDIR reply with ??? entries, cookies numbered 108-???
etc.
Frank
--
Frank Sorenson
[email protected]
Senior Software Maintenance Engineer
Global Support Services - filesystems
Red Hat