2017-07-17 12:08:11

by Eryu Guan

[permalink] [raw]
Subject: [4.13-rc1 regression] fstests generic/013 crashed nfsd on ppc64 host

Hi all,

I hit a nfsd crash in fstests generic/013 run with 4.13-rc1 kernel, NFS
version 4.0/4.1/4.2, v3 passed the test, and it only happens on
ppc64/ppc64le hosts for me. git bisect pointed first bad to

commit 1c5876ddbdb401f814ef717394826e7dfb6704d4
Author: Christoph Hellwig <[email protected]>
Date: Mon May 8 23:27:10 2017 +0200

sunrpc: move p_count out of struct rpc_procinfo

p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.

This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.

Signed-off-by: Christoph Hellwig <[email protected]>

I was testing with a local mounted NFS share, but I can also reproduce
it by running generic/013 from a remote nfs client. If you need more
information please let me know.

Thanks,
Eryu

[ 992.581712] run fstests generic/013 at 2017-07-16 07:30:42
[ 993.895088] Unable to handle kernel paging request for data at address 0x2f7362696e2f6e76
[ 993.895113] Faulting instruction address: 0xd000000006660428
[ 993.895121] Oops: Kernel access of bad area, sig: 11 [#1]
[ 993.895126] SMP NR_CPUS=2048
[ 993.895127] NUMA
[ 993.895130] pSeries
[ 993.895137] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ext4 mbcache jbd2 nx_crypto sg pseries_rng nfsd auth_rpcgss nfs_acl lockd sunrpc grace ip_tables xfs libcrc32c sd_mod ibmvscsi scsi_transport_srp ibmveth
[ 993.895168] CPU: 11 PID: 335 Comm: kworker/11:1 Not tainted 4.13.0-rc1 #1
[ 993.895197] Workqueue: rpciod .rpc_async_schedule [sunrpc]
[ 993.895203] task: c0000001f94cf780 task.stack: c0000001f952c000
[ 993.895208] NIP: d000000006660428 LR: d0000000066748d4 CTR: d0000000066603d0
[ 993.895214] REGS: c0000001f952f7e0 TRAP: 0380 Not tainted (4.13.0-rc1)
[ 993.895219] MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
[ 993.895225] CR: 22004024 XER: 00000001
[ 993.895233] CFAR: d0000000066748d0 SOFTE: 1
[ 993.895233] GPR00: d0000000066748d4 c0000001f952fa60 d0000000066b5d78 c0000001bcee7d00
[ 993.895233] GPR04: c0000000fefc19e8 c0000001bcee7d48 002d1e7473db58e8 0000000000000001
[ 993.895233] GPR08: d0000000079dd588 2f7362696e2f6e66 0000000000000008 d0000000079d45f8
[ 993.895233] GPR12: d000000006660010 c00000000e986e00 c000000000110ab0 c0000001f81d0040
[ 993.895233] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 993.895233] GPR20: 0000000000000000 fffffffffffffe00 0000000000000000 0000000000000001
[ 993.895233] GPR24: d0000000066b9f34 c0000001bcee7d30 0000000000000000 d0000000066aac68
[ 993.895233] GPR28: c0000001bc79cc00 0000000000000001 c0000001bc79cc00 c0000001bcee7d00
[ 993.895313] NIP [d000000006660428] .call_start+0x58/0x120 [sunrpc]
[ 993.895337] LR [d0000000066748d4] .__rpc_execute+0xc4/0x540 [sunrpc]
[ 993.895342] Call Trace:
[ 993.895346] [c0000001f952fa60] [0000000000000001] 0x1 (unreliable)
[ 993.895370] [c0000001f952faf0] [d0000000066748d4] .__rpc_execute+0xc4/0x540 [sunrpc]
[ 993.895379] [c0000001f952fbe0] [c000000000108e74] .process_one_work+0x194/0x480
[ 993.895387] [c0000001f952fc90] [c0000000001091e8] .worker_thread+0x88/0x510
[ 993.895393] [c0000001f952fd70] [c000000000110c0c] .kthread+0x15c/0x1a0
[ 993.895401] [c0000001f952fe30] [c00000000000b520] .ret_from_kernel_thread+0x58/0xb8
[ 993.895407] Instruction dump:
[ 993.895411] e9430078 ebc300a8 7928ffe3 ebaa0026 40c2006c e95e0180 80fe0044 e90a0010
[ 993.895421] 78ea1f24 7d28502a 2fa90000 419e0018 <e9490010> 7ba91764 7d0a482e 39080001
[ 993.895433] ---[ end trace aeee2c84dc1574c0 ]---

And gdb shows:

(gdb) l *(call_start+0x60)
0x4b0 is in call_start (net/sunrpc/clnt.c:1529).
1524 rpc_proc_name(task),
1525 (RPC_IS_ASYNC(task) ? "async" : "sync"));
1526
1527 /* Increment call count (version might not be valid for ping) */
1528 if (clnt->cl_program->version[clnt->cl_vers])
1529 clnt->cl_program->version[clnt->cl_vers]->counts[idx]++;
1530 clnt->cl_stats->rpccnt++;
1531 task->tk_action = call_reserve;
1532 }
1533


2017-07-17 16:00:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: [4.13-rc1 regression] fstests generic/013 crashed nfsd on ppc64 host

T24gTW9uLCAyMDE3LTA3LTE3IGF0IDIwOjA4ICswODAwLCBFcnl1IEd1YW4gd3JvdGU6DQo+IEhp
IGFsbCwNCj4gDQo+IEkgaGl0IGEgbmZzZCBjcmFzaCBpbiBmc3Rlc3RzIGdlbmVyaWMvMDEzIHJ1
biB3aXRoIDQuMTMtcmMxIGtlcm5lbCwNCj4gTkZTDQo+IHZlcnNpb24gNC4wLzQuMS80LjIsIHYz
IHBhc3NlZCB0aGUgdGVzdCwgYW5kIGl0IG9ubHkgaGFwcGVucyBvbg0KPiBwcGM2NC9wcGM2NGxl
IGhvc3RzIGZvciBtZS4gZ2l0IGJpc2VjdCBwb2ludGVkIGZpcnN0IGJhZCB0bw0KPiANCj4gY29t
bWl0IDFjNTg3NmRkYmRiNDAxZjgxNGVmNzE3Mzk0ODI2ZTdkZmI2NzA0ZDQNCj4gQXV0aG9yOiBD
aHJpc3RvcGggSGVsbHdpZyA8aGNoQGxzdC5kZT4NCj4gRGF0ZTogICBNb24gTWF5IDggMjM6Mjc6
MTAgMjAxNyArMDIwMA0KPiANCj4gICAgIHN1bnJwYzogbW92ZSBwX2NvdW50IG91dCBvZiBzdHJ1
Y3QgcnBjX3Byb2NpbmZvDQo+IA0KPiAgICAgcF9jb3VudCBpcyB0aGUgb25seSB3cml0ZWFibGUg
bWVtZWJlciBvZiBzdHJ1Y3QgcnBjX3Byb2NpbmZvLA0KPiB3aGljaCBpcw0KPiAgICAgYSBnb29k
IGNhbmRpZGF0ZSB0byBiZSBjb25zdC1pZmllZCBhcyBpdCBjb250YWlucyBmdW5jdGlvbg0KPiBw
b2ludGVycy4NCj4gDQo+ICAgICBUaGlzIHBhdGNoIG1vdmVzIGl0IGludG8gb3V0IG91dCBzdHJ1
Y3QgcnBjX3Byb2NpbmZvLCBhbmQgaW50byBhDQo+ICAgICBzZXBhcmF0ZSB3cml0YWJsZSBhcnJh
eSB0aGF0IGlzIHBvaW50ZWQgdG8gYnkgc3RydWN0IHJwY192ZXJzaW9uDQo+IGFuZA0KPiAgICAg
aW5kZXhlZCBieSBwX3N0YXRpZHguDQo+IA0KPiAgICAgU2lnbmVkLW9mZi1ieTogQ2hyaXN0b3Bo
IEhlbGx3aWcgPGhjaEBsc3QuZGU+DQo+IA0KPiBJIHdhcyB0ZXN0aW5nIHdpdGggYSBsb2NhbCBt
b3VudGVkIE5GUyBzaGFyZSwgYnV0IEkgY2FuIGFsc28NCj4gcmVwcm9kdWNlDQo+IGl0IGJ5IHJ1
bm5pbmcgZ2VuZXJpYy8wMTMgZnJvbSBhIHJlbW90ZSBuZnMgY2xpZW50LiBJZiB5b3UgbmVlZCBt
b3JlDQo+IGluZm9ybWF0aW9uIHBsZWFzZSBsZXQgbWUga25vdy4NCj4gDQo+IFRoYW5rcywNCj4g
RXJ5dQ0KPiANCj4gWyAgOTkyLjU4MTcxMl0gcnVuIGZzdGVzdHMgZ2VuZXJpYy8wMTMgYXQgMjAx
Ny0wNy0xNiAwNzozMDo0MiANCj4gWyAgOTkzLjg5NTA4OF0gVW5hYmxlIHRvIGhhbmRsZSBrZXJu
ZWwgcGFnaW5nIHJlcXVlc3QgZm9yIGRhdGEgYXQNCj4gYWRkcmVzcyAweDJmNzM2MjY5NmUyZjZl
NzYgDQo+IFsgIDk5My44OTUxMTNdIEZhdWx0aW5nIGluc3RydWN0aW9uIGFkZHJlc3M6IDB4ZDAw
MDAwMDAwNjY2MDQyOCANCj4gWyAgOTkzLjg5NTEyMV0gT29wczogS2VybmVsIGFjY2VzcyBvZiBi
YWQgYXJlYSwgc2lnOiAxMSBbIzFdIA0KPiBbICA5OTMuODk1MTI2XSBTTVAgTlJfQ1BVUz0yMDQ4
ICANCj4gWyAgOTkzLjg5NTEyN10gTlVNQSAgDQo+IFsgIDk5My44OTUxMzBdIHBTZXJpZXMgDQo+
IFsgIDk5My44OTUxMzddIE1vZHVsZXMgbGlua2VkIGluOiBycGNzZWNfZ3NzX2tyYjUgbmZzdjQg
ZG5zX3Jlc29sdmVyDQo+IG5mcyBmc2NhY2hlIGV4dDQgbWJjYWNoZSBqYmQyIG54X2NyeXB0byBz
ZyBwc2VyaWVzX3JuZyBuZnNkDQo+IGF1dGhfcnBjZ3NzIG5mc19hY2wgbG9ja2Qgc3VucnBjIGdy
YWNlIGlwX3RhYmxlcyB4ZnMgbGliY3JjMzJjIHNkX21vZA0KPiBpYm12c2NzaSBzY3NpX3RyYW5z
cG9ydF9zcnAgaWJtdmV0aCANCj4gWyAgOTkzLjg5NTE2OF0gQ1BVOiAxMSBQSUQ6IDMzNSBDb21t
OiBrd29ya2VyLzExOjEgTm90IHRhaW50ZWQNCj4gNC4xMy4wLXJjMSAjMSANCj4gWyAgOTkzLjg5
NTE5N10gV29ya3F1ZXVlOiBycGNpb2QgLnJwY19hc3luY19zY2hlZHVsZSBbc3VucnBjXSANCj4g
WyAgOTkzLjg5NTIwM10gdGFzazogYzAwMDAwMDFmOTRjZjc4MCB0YXNrLnN0YWNrOiBjMDAwMDAw
MWY5NTJjMDAwIA0KPiBbICA5OTMuODk1MjA4XSBOSVA6IGQwMDAwMDAwMDY2NjA0MjggTFI6IGQw
MDAwMDAwMDY2NzQ4ZDQgQ1RSOg0KPiBkMDAwMDAwMDA2NjYwM2QwIA0KPiBbICA5OTMuODk1MjE0
XSBSRUdTOiBjMDAwMDAwMWY5NTJmN2UwIFRSQVA6IDAzODAgICBOb3QNCj4gdGFpbnRlZCAgKDQu
MTMuMC1yYzEpIA0KPiBbICA5OTMuODk1MjE5XSBNU1I6IDgwMDAwMDAwMDI4MGIwMzIgPFNGLFZF
QyxWU1gsRUUsRlAsTUUsSVIsRFIsUkk+IA0KPiBbICA5OTMuODk1MjI1XSAgIENSOiAyMjAwNDAy
NCAgWEVSOiAwMDAwMDAwMSANCj4gWyAgOTkzLjg5NTIzM10gQ0ZBUjogZDAwMDAwMDAwNjY3NDhk
MCBTT0ZURTogMSAgDQo+IFsgIDk5My44OTUyMzNdIEdQUjAwOiBkMDAwMDAwMDA2Njc0OGQ0IGMw
MDAwMDAxZjk1MmZhNjANCj4gZDAwMDAwMDAwNjZiNWQ3OCBjMDAwMDAwMWJjZWU3ZDAwICANCj4g
WyAgOTkzLjg5NTIzM10gR1BSMDQ6IGMwMDAwMDAwZmVmYzE5ZTggYzAwMDAwMDFiY2VlN2Q0OA0K
PiAwMDJkMWU3NDczZGI1OGU4IDAwMDAwMDAwMDAwMDAwMDEgIA0KPiBbICA5OTMuODk1MjMzXSBH
UFIwODogZDAwMDAwMDAwNzlkZDU4OCAyZjczNjI2OTZlMmY2ZTY2DQo+IDAwMDAwMDAwMDAwMDAw
MDggZDAwMDAwMDAwNzlkNDVmOCAgDQo+IFsgIDk5My44OTUyMzNdIEdQUjEyOiBkMDAwMDAwMDA2
NjYwMDEwIGMwMDAwMDAwMGU5ODZlMDANCj4gYzAwMDAwMDAwMDExMGFiMCBjMDAwMDAwMWY4MWQw
MDQwICANCj4gWyAgOTkzLjg5NTIzM10gR1BSMTY6IDAwMDAwMDAwMDAwMDAwMDAgMDAwMDAwMDAw
MDAwMDAwMA0KPiAwMDAwMDAwMDAwMDAwMDAwIDAwMDAwMDAwMDAwMDAwMDAgIA0KPiBbICA5OTMu
ODk1MjMzXSBHUFIyMDogMDAwMDAwMDAwMDAwMDAwMCBmZmZmZmZmZmZmZmZmZTAwDQo+IDAwMDAw
MDAwMDAwMDAwMDAgMDAwMDAwMDAwMDAwMDAwMSAgDQo+IFsgIDk5My44OTUyMzNdIEdQUjI0OiBk
MDAwMDAwMDA2NmI5ZjM0IGMwMDAwMDAxYmNlZTdkMzANCj4gMDAwMDAwMDAwMDAwMDAwMCBkMDAw
MDAwMDA2NmFhYzY4ICANCj4gWyAgOTkzLjg5NTIzM10gR1BSMjg6IGMwMDAwMDAxYmM3OWNjMDAg
MDAwMDAwMDAwMDAwMDAwMQ0KPiBjMDAwMDAwMWJjNzljYzAwIGMwMDAwMDAxYmNlZTdkMDAgIA0K
PiBbICA5OTMuODk1MzEzXSBOSVAgW2QwMDAwMDAwMDY2NjA0MjhdIC5jYWxsX3N0YXJ0KzB4NTgv
MHgxMjANCj4gW3N1bnJwY10gDQo+IFsgIDk5My44OTUzMzddIExSIFtkMDAwMDAwMDA2Njc0OGQ0
XSAuX19ycGNfZXhlY3V0ZSsweGM0LzB4NTQwDQo+IFtzdW5ycGNdIA0KPiBbICA5OTMuODk1MzQy
XSBDYWxsIFRyYWNlOiANCj4gWyAgOTkzLjg5NTM0Nl0gW2MwMDAwMDAxZjk1MmZhNjBdIFswMDAw
MDAwMDAwMDAwMDAxXSAweDENCj4gKHVucmVsaWFibGUpIA0KPiBbICA5OTMuODk1MzcwXSBbYzAw
MDAwMDFmOTUyZmFmMF0gW2QwMDAwMDAwMDY2NzQ4ZDRdDQo+IC5fX3JwY19leGVjdXRlKzB4YzQv
MHg1NDAgW3N1bnJwY10gDQo+IFsgIDk5My44OTUzNzldIFtjMDAwMDAwMWY5NTJmYmUwXSBbYzAw
MDAwMDAwMDEwOGU3NF0NCj4gLnByb2Nlc3Nfb25lX3dvcmsrMHgxOTQvMHg0ODAgDQo+IFsgIDk5
My44OTUzODddIFtjMDAwMDAwMWY5NTJmYzkwXSBbYzAwMDAwMDAwMDEwOTFlOF0NCj4gLndvcmtl
cl90aHJlYWQrMHg4OC8weDUxMCANCj4gWyAgOTkzLjg5NTM5M10gW2MwMDAwMDAxZjk1MmZkNzBd
IFtjMDAwMDAwMDAwMTEwYzBjXQ0KPiAua3RocmVhZCsweDE1Yy8weDFhMCANCj4gWyAgOTkzLjg5
NTQwMV0gW2MwMDAwMDAxZjk1MmZlMzBdIFtjMDAwMDAwMDAwMDBiNTIwXQ0KPiAucmV0X2Zyb21f
a2VybmVsX3RocmVhZCsweDU4LzB4YjggDQo+IFsgIDk5My44OTU0MDddIEluc3RydWN0aW9uIGR1
bXA6IA0KPiBbICA5OTMuODk1NDExXSBlOTQzMDA3OCBlYmMzMDBhOCA3OTI4ZmZlMyBlYmFhMDAy
NiA0MGMyMDA2YyBlOTVlMDE4MA0KPiA4MGZlMDA0NCBlOTBhMDAxMCAgDQo+IFsgIDk5My44OTU0
MjFdIDc4ZWExZjI0IDdkMjg1MDJhIDJmYTkwMDAwIDQxOWUwMDE4IDxlOTQ5MDAxMD4NCj4gN2Jh
OTE3NjQgN2QwYTQ4MmUgMzkwODAwMDEgIA0KPiBbICA5OTMuODk1NDMzXSAtLS1bIGVuZCB0cmFj
ZSBhZWVlMmM4NGRjMTU3NGMwIF0tLS0gDQo+IA0KPiBBbmQgZ2RiIHNob3dzOg0KPiANCj4gKGdk
YikgbCAqKGNhbGxfc3RhcnQrMHg2MCkNCj4gMHg0YjAgaXMgaW4gY2FsbF9zdGFydCAobmV0L3N1
bnJwYy9jbG50LmM6MTUyOSkuDQo+IDE1MjQgICAgICAgICAgICAgICAgICAgICAgICAgICAgcnBj
X3Byb2NfbmFtZSh0YXNrKSwNCj4gMTUyNSAgICAgICAgICAgICAgICAgICAgICAgICAgICAoUlBD
X0lTX0FTWU5DKHRhc2spID8gImFzeW5jIiA6DQo+ICJzeW5jIikpOw0KPiAxNTI2DQo+IDE1Mjcg
ICAgICAgICAgICAvKiBJbmNyZW1lbnQgY2FsbCBjb3VudCAodmVyc2lvbiBtaWdodCBub3QgYmUg
dmFsaWQNCj4gZm9yIHBpbmcpICovDQo+IDE1MjggICAgICAgICAgICBpZiAoY2xudC0+Y2xfcHJv
Z3JhbS0+dmVyc2lvbltjbG50LT5jbF92ZXJzXSkNCj4gMTUyOSAgICAgICAgICAgICAgICAgICAg
Y2xudC0+Y2xfcHJvZ3JhbS0+dmVyc2lvbltjbG50LT5jbF92ZXJzXS0NCj4gPmNvdW50c1tpZHhd
Kys7DQo+IDE1MzAgICAgICAgICAgICBjbG50LT5jbF9zdGF0cy0+cnBjY250Kys7DQo+IDE1MzEg
ICAgICAgICAgICB0YXNrLT50a19hY3Rpb24gPSBjYWxsX3Jlc2VydmU7DQo+IDE1MzIgICAgfQ0K
PiAxNTMzDQo+IA0KDQpQbGVhc2Ugc2VlIHRoZSBwYXRjaCB0aGF0IEkgcG9zdGVkIHllc3RlcmRh
eSBpbiByZXNwb25zZSB0byBEYXZlIEpvbmVzJw0KcmVwb3J0IG9mIHRoZSBzYW1lIGlzc3VlLg0K
DQpCcnVjZSwgZG8geW91IHdhbnQgbWUgdG8gcmVzZW5kPw0KDQpUaGFua3MNCiAgVHJvbmQNCg0K
LS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFy
eURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg==


2017-07-17 17:31:20

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [4.13-rc1 regression] fstests generic/013 crashed nfsd on ppc64 host

On Mon, Jul 17, 2017 at 04:00:10PM +0000, Trond Myklebust wrote:
> Please see the patch that I posted yesterday in response to Dave Jones'
> report of the same issue.
>
> Bruce, do you want me to resend?

I've got it, thanks!

(Had to grep through the code to figure out why it doesn't change the
on-the-wire version number. OK, I see, that's in rpc_version->number,
which is unchanged.)

--b.