2013-11-12 15:31:37

by Weston Andros Adamson

[permalink] [raw]
Subject: Thread overran stack, or stack corrupted BUG on mount

I got this oops yesterday running the ?test_sec_options.sh? script I recently posted as a patch to Anna?s nfs-ordeal repo (tons of mount/umount).

At this point GSSD had died (I was tracking down a fd leak). I haven?t been able to reproduce this yet.

Any idea if I should trust the stack trace? Could this be related to the issue Jeff just posted?

-dros

BUG: unable to handle kernel paging request at ffff88017a604030
IP: [<ffffffff81063089>] __wake_up+0x22/0x4d
PGD 2651067 PUD 0
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfsv4 cts rpcsec_gss_krb5 nfsv3 nfs fscache crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ppdev ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 nfsd parport_pc parport shpchp auth_rpcgss oid_registry exportfs nfs_acl lockd floppy freq_table sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic
CPU: 0 PID: 10547 Comm: mount.nfs Not tainted 3.12.0-rc3-branch-dros_testing+ #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
task: ffff8800798f2100 ti: ffff88007a604000 task.ti: ffff88007a604000
RIP: 0010:[<ffffffff81063089>] [<ffffffff81063089>] __wake_up+0x22/0x4d
RSP: 0018:ffff88007a604028 EFLAGS: 00010092
RAX: 0000000000000296 RBX: ffffffffa006a980 RCX: 000000009a519a50
RDX: 000000009a509a50 RSI: 000000000000038a RDI: ffffffffa006a980
RBP: ffff88017a604058 R08: 0000000000000003 R09: 0000000000000001
R10: ffff88006d41d7c0 R11: ffff88007f20b000 R12: ffff88007a6058e0
R13: ffff8800645d8018 R14: ffff88007a6058f8 R15: ffff8800798f2100
FS: 00007fb2765b3880(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88017a604030 CR3: 000000007a6ba000 CR4: 00000000001407f0
Stack:
ffff88007a604038 0000000000000000 ffff880000000001 0000000000000003
ffff88006453a0d0 ffff88007a6058e0 ffff88007a604078 ffffffffa0045de2
ffff88006453dbe0 ffff88006453dbe0 ffff88007a604098 ffffffffa0045d27
Call Trace:
[<ffffffffa0045de2>] ? rpc_release_client+0x4a/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
[<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
[<ffffffffa0045f39>] ? rpc_shutdown_client+0x107/0x116 [sunrpc]
[<ffffffffa02a6456>] ? __fscache_cookie_put+0x43/0x4f [fscache]
[<ffffffffa02a65ca>] ? __fscache_relinquish_cookie+0x168/0x16d [fscache]
[<ffffffffa02bdc2b>] ? nfs_free_client+0x4c/0xaf [nfs]
[<ffffffffa0340e4a>] ? nfs4_free_client+0x97/0x9b [nfsv4]
[<ffffffffa02bcfd9>] ? nfs_put_client+0xe8/0xed [nfs]
[<ffffffffa0341126>] ? nfs4_init_client+0x22e/0x29d [nfsv4]
[<ffffffffa02bc95f>] ? nfs_probe_fsinfo+0x2c7/0x2c7 [nfs]
[<ffffffffa02bd1af>] ? nfs_get_client+0x8a/0x2bf [nfs]
[<ffffffffa02bd37f>] ? nfs_get_client+0x25a/0x2bf [nfs]
[<ffffffffa034059c>] ? nfs4_set_client+0x9f/0xf1 [nfsv4]
[<ffffffffa004e917>] ? __rpc_init_priority_wait_queue+0x98/0xcf [sunrpc]
[<ffffffffa0341999>] ? nfs4_create_server+0xfe/0x264 [nfsv4]
[<ffffffffa033ac59>] ? nfs4_remote_mount+0x2f/0x57 [nfsv4]
[<ffffffff8112a846>] ? mount_fs+0x69/0x157
[<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
[<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
[<ffffffffa033ac02>] ? nfs_do_root_mount+0x8c/0xb4 [nfsv4]
[<ffffffffa033aea9>] ? nfs4_try_mount+0x60/0xbb [nfsv4]
[<ffffffffa02c80eb>] ? nfs_fs_mount+0x88f/0x97a [nfs]
[<ffffffffa02c8620>] ? nfs_clone_super+0x6b/0x6b [nfs]
[<ffffffffa02c59ce>] ? nfs_set_super+0x53/0x53 [nfs]
[<ffffffff8112a846>] ? mount_fs+0x69/0x157
[<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
[<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
[<ffffffff81141fa6>] ? do_mount+0x6ce/0x871
[<ffffffff81141833>] ? copy_mount_options+0xc2/0x12f
[<ffffffff811421ce>] ? SyS_mount+0x85/0xbe
[<ffffffff814a4292>] ? system_call_fastpath+0x16/0x1b
Code: 89 e5 e8 98 ff ff ff 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 48 83 ec 20 89 55 e0 89 75 e8 48 89 4d d8 e8 0c 99 43 00 <4c> 8b 45 d8 48 89 df 8b 55 e0 49 89 c4 31 c9 8b 75 e8 e8 b4 d0
RIP [<ffffffff81063089>] __wake_up+0x22/0x4d
RSP <ffff88007a604028>
CR2: ffff88017a604030
---[ end trace 1122f3f8cf98e4c2 ]---



2013-11-12 17:50:26

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount

On Tue, 2013-11-12 at 11:57 -0500, J. Bruce Fields wrote:
+AD4- On Tue, Nov 12, 2013 at 11:20:21AM -0500, Jeff Layton wrote:
+AD4- +AD4- On Tue, 12 Nov 2013 10:55:39 -0500
+AD4- +AD4- Jeff Layton +ADw-jlayton+AEA-redhat.com+AD4- wrote:
+AD4- +AD4-
+AD4- +AD4- +AD4- On Tue, 12 Nov 2013 15:31:34 +-0000
+AD4- +AD4- +AD4- Weston Andros Adamson +ADw-dros+AEA-netapp.com+AD4- wrote:
+AD4- +AD4-
+AD4- +AD4- How that ends up smashing the stack, I'm not sure though.
+AD4-
+AD4- rpc+AF8-free+AF8-client(clnt)
+AD4- rpc+AF8-release+AF8-client(clnt-+AD4-cl+AF8-parent)
+AD4- rpc+AF8-free+AF8-auth(clnt)
+AD4- free+AF8-free+AF8-client(clnt)
+AD4-
+AD4- So freeing a client with N ancestors can take N times the stack as
+AD4- freeing a single client.
+AD4-
+AD4- (Are there any other cases that can create arbitrarily long cl+AF8-parent
+AD4- chains?)

Ewww.... At this point, that would be pretty much anything that calls
rpc+AF8-clone+AF8-client+AF8-set+AF8-auth() in response to a NFS4ERR+AF8-WRONG+AF8-SEC.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
http://www.netapp.com

2013-11-12 15:55:42

by Jeff Layton

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount

On Tue, 12 Nov 2013 15:31:34 +0000
Weston Andros Adamson <[email protected]> wrote:

> I got this oops yesterday running the ?test_sec_options.sh? script I recently posted as a patch to Anna?s nfs-ordeal repo (tons of mount/umount).
>
> At this point GSSD had died (I was tracking down a fd leak). I haven?t been able to reproduce this yet.
>
> Any idea if I should trust the stack trace? Could this be related to the issue Jeff just posted?
>
> -dros
>
> BUG: unable to handle kernel paging request at ffff88017a604030
> IP: [<ffffffff81063089>] __wake_up+0x22/0x4d
> PGD 2651067 PUD 0
> Thread overran stack, or stack corrupted
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in: nfsv4 cts rpcsec_gss_krb5 nfsv3 nfs fscache crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ppdev ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 nfsd parport_pc parport shpchp auth_rpcgss oid_registry exportfs nfs_acl lockd floppy freq_table sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic
> CPU: 0 PID: 10547 Comm: mount.nfs Not tainted 3.12.0-rc3-branch-dros_testing+ #1
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> task: ffff8800798f2100 ti: ffff88007a604000 task.ti: ffff88007a604000
> RIP: 0010:[<ffffffff81063089>] [<ffffffff81063089>] __wake_up+0x22/0x4d
> RSP: 0018:ffff88007a604028 EFLAGS: 00010092
> RAX: 0000000000000296 RBX: ffffffffa006a980 RCX: 000000009a519a50
> RDX: 000000009a509a50 RSI: 000000000000038a RDI: ffffffffa006a980
> RBP: ffff88017a604058 R08: 0000000000000003 R09: 0000000000000001
> R10: ffff88006d41d7c0 R11: ffff88007f20b000 R12: ffff88007a6058e0
> R13: ffff8800645d8018 R14: ffff88007a6058f8 R15: ffff8800798f2100
> FS: 00007fb2765b3880(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88017a604030 CR3: 000000007a6ba000 CR4: 00000000001407f0
> Stack:
> ffff88007a604038 0000000000000000 ffff880000000001 0000000000000003
> ffff88006453a0d0 ffff88007a6058e0 ffff88007a604078 ffffffffa0045de2
> ffff88006453dbe0 ffff88006453dbe0 ffff88007a604098 ffffffffa0045d27
> Call Trace:
> [<ffffffffa0045de2>] ? rpc_release_client+0x4a/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> [<ffffffffa0045f39>] ? rpc_shutdown_client+0x107/0x116 [sunrpc]
> [<ffffffffa02a6456>] ? __fscache_cookie_put+0x43/0x4f [fscache]
> [<ffffffffa02a65ca>] ? __fscache_relinquish_cookie+0x168/0x16d [fscache]
> [<ffffffffa02bdc2b>] ? nfs_free_client+0x4c/0xaf [nfs]
> [<ffffffffa0340e4a>] ? nfs4_free_client+0x97/0x9b [nfsv4]
> [<ffffffffa02bcfd9>] ? nfs_put_client+0xe8/0xed [nfs]
> [<ffffffffa0341126>] ? nfs4_init_client+0x22e/0x29d [nfsv4]
> [<ffffffffa02bc95f>] ? nfs_probe_fsinfo+0x2c7/0x2c7 [nfs]
> [<ffffffffa02bd1af>] ? nfs_get_client+0x8a/0x2bf [nfs]
> [<ffffffffa02bd37f>] ? nfs_get_client+0x25a/0x2bf [nfs]
> [<ffffffffa034059c>] ? nfs4_set_client+0x9f/0xf1 [nfsv4]
> [<ffffffffa004e917>] ? __rpc_init_priority_wait_queue+0x98/0xcf [sunrpc]
> [<ffffffffa0341999>] ? nfs4_create_server+0xfe/0x264 [nfsv4]
> [<ffffffffa033ac59>] ? nfs4_remote_mount+0x2f/0x57 [nfsv4]
> [<ffffffff8112a846>] ? mount_fs+0x69/0x157
> [<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
> [<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
> [<ffffffffa033ac02>] ? nfs_do_root_mount+0x8c/0xb4 [nfsv4]
> [<ffffffffa033aea9>] ? nfs4_try_mount+0x60/0xbb [nfsv4]
> [<ffffffffa02c80eb>] ? nfs_fs_mount+0x88f/0x97a [nfs]
> [<ffffffffa02c8620>] ? nfs_clone_super+0x6b/0x6b [nfs]
> [<ffffffffa02c59ce>] ? nfs_set_super+0x53/0x53 [nfs]
> [<ffffffff8112a846>] ? mount_fs+0x69/0x157
> [<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
> [<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
> [<ffffffff81141fa6>] ? do_mount+0x6ce/0x871
> [<ffffffff81141833>] ? copy_mount_options+0xc2/0x12f
> [<ffffffff811421ce>] ? SyS_mount+0x85/0xbe
> [<ffffffff814a4292>] ? system_call_fastpath+0x16/0x1b
> Code: 89 e5 e8 98 ff ff ff 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 48 83 ec 20 89 55 e0 89 75 e8 48 89 4d d8 e8 0c 99 43 00 <4c> 8b 45 d8 48 89 df 8b 55 e0 49 89 c4 31 c9 8b 75 e8 e8 b4 d0
> RIP [<ffffffff81063089>] __wake_up+0x22/0x4d
> RSP <ffff88007a604028>
> CR2: ffff88017a604030
> ---[ end trace 1122f3f8cf98e4c2 ]---
>

Yep, I think this is the same problem I reported earlier. I ran the
reproducer with rpc_debug turned up and ended up seeing a very similar
stack trace. Basically the server is returning NFS4ERR_CLID_IN_USE but
the client keeps retrying the call over and over.

I suspect that leads to some sort of recursion, but I haven't quite
spotted it yet.

--
Jeff Layton <[email protected]>

2013-11-12 16:20:28

by Jeff Layton

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount

On Tue, 12 Nov 2013 10:55:39 -0500
Jeff Layton <[email protected]> wrote:

> On Tue, 12 Nov 2013 15:31:34 +0000
> Weston Andros Adamson <[email protected]> wrote:
>
> > I got this oops yesterday running the ?test_sec_options.sh? script I recently posted as a patch to Anna?s nfs-ordeal repo (tons of mount/umount).
> >
> > At this point GSSD had died (I was tracking down a fd leak). I haven?t been able to reproduce this yet.
> >
> > Any idea if I should trust the stack trace? Could this be related to the issue Jeff just posted?
> >
> > -dros
> >
> > BUG: unable to handle kernel paging request at ffff88017a604030
> > IP: [<ffffffff81063089>] __wake_up+0x22/0x4d
> > PGD 2651067 PUD 0
> > Thread overran stack, or stack corrupted
> > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> > Modules linked in: nfsv4 cts rpcsec_gss_krb5 nfsv3 nfs fscache crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ppdev ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 nfsd parport_pc parport shpchp auth_rpcgss oid_registry exportfs nfs_acl lockd floppy freq_table sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic
> > CPU: 0 PID: 10547 Comm: mount.nfs Not tainted 3.12.0-rc3-branch-dros_testing+ #1
> > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> > task: ffff8800798f2100 ti: ffff88007a604000 task.ti: ffff88007a604000
> > RIP: 0010:[<ffffffff81063089>] [<ffffffff81063089>] __wake_up+0x22/0x4d
> > RSP: 0018:ffff88007a604028 EFLAGS: 00010092
> > RAX: 0000000000000296 RBX: ffffffffa006a980 RCX: 000000009a519a50
> > RDX: 000000009a509a50 RSI: 000000000000038a RDI: ffffffffa006a980
> > RBP: ffff88017a604058 R08: 0000000000000003 R09: 0000000000000001
> > R10: ffff88006d41d7c0 R11: ffff88007f20b000 R12: ffff88007a6058e0
> > R13: ffff8800645d8018 R14: ffff88007a6058f8 R15: ffff8800798f2100
> > FS: 00007fb2765b3880(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: ffff88017a604030 CR3: 000000007a6ba000 CR4: 00000000001407f0
> > Stack:
> > ffff88007a604038 0000000000000000 ffff880000000001 0000000000000003
> > ffff88006453a0d0 ffff88007a6058e0 ffff88007a604078 ffffffffa0045de2
> > ffff88006453dbe0 ffff88006453dbe0 ffff88007a604098 ffffffffa0045d27
> > Call Trace:
> > [<ffffffffa0045de2>] ? rpc_release_client+0x4a/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > [<ffffffffa0045f39>] ? rpc_shutdown_client+0x107/0x116 [sunrpc]
> > [<ffffffffa02a6456>] ? __fscache_cookie_put+0x43/0x4f [fscache]
> > [<ffffffffa02a65ca>] ? __fscache_relinquish_cookie+0x168/0x16d [fscache]
> > [<ffffffffa02bdc2b>] ? nfs_free_client+0x4c/0xaf [nfs]
> > [<ffffffffa0340e4a>] ? nfs4_free_client+0x97/0x9b [nfsv4]
> > [<ffffffffa02bcfd9>] ? nfs_put_client+0xe8/0xed [nfs]
> > [<ffffffffa0341126>] ? nfs4_init_client+0x22e/0x29d [nfsv4]
> > [<ffffffffa02bc95f>] ? nfs_probe_fsinfo+0x2c7/0x2c7 [nfs]
> > [<ffffffffa02bd1af>] ? nfs_get_client+0x8a/0x2bf [nfs]
> > [<ffffffffa02bd37f>] ? nfs_get_client+0x25a/0x2bf [nfs]
> > [<ffffffffa034059c>] ? nfs4_set_client+0x9f/0xf1 [nfsv4]
> > [<ffffffffa004e917>] ? __rpc_init_priority_wait_queue+0x98/0xcf [sunrpc]
> > [<ffffffffa0341999>] ? nfs4_create_server+0xfe/0x264 [nfsv4]
> > [<ffffffffa033ac59>] ? nfs4_remote_mount+0x2f/0x57 [nfsv4]
> > [<ffffffff8112a846>] ? mount_fs+0x69/0x157
> > [<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
> > [<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
> > [<ffffffffa033ac02>] ? nfs_do_root_mount+0x8c/0xb4 [nfsv4]
> > [<ffffffffa033aea9>] ? nfs4_try_mount+0x60/0xbb [nfsv4]
> > [<ffffffffa02c80eb>] ? nfs_fs_mount+0x88f/0x97a [nfs]
> > [<ffffffffa02c8620>] ? nfs_clone_super+0x6b/0x6b [nfs]
> > [<ffffffffa02c59ce>] ? nfs_set_super+0x53/0x53 [nfs]
> > [<ffffffff8112a846>] ? mount_fs+0x69/0x157
> > [<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
> > [<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
> > [<ffffffff81141fa6>] ? do_mount+0x6ce/0x871
> > [<ffffffff81141833>] ? copy_mount_options+0xc2/0x12f
> > [<ffffffff811421ce>] ? SyS_mount+0x85/0xbe
> > [<ffffffff814a4292>] ? system_call_fastpath+0x16/0x1b
> > Code: 89 e5 e8 98 ff ff ff 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 48 83 ec 20 89 55 e0 89 75 e8 48 89 4d d8 e8 0c 99 43 00 <4c> 8b 45 d8 48 89 df 8b 55 e0 49 89 c4 31 c9 8b 75 e8 e8 b4 d0
> > RIP [<ffffffff81063089>] __wake_up+0x22/0x4d
> > RSP <ffff88007a604028>
> > CR2: ffff88017a604030
> > ---[ end trace 1122f3f8cf98e4c2 ]---
> >
>
> Yep, I think this is the same problem I reported earlier. I ran the
> reproducer with rpc_debug turned up and ended up seeing a very similar
> stack trace. Basically the server is returning NFS4ERR_CLID_IN_USE but
> the client keeps retrying the call over and over.
>
> I suspect that leads to some sort of recursion, but I haven't quite
> spotted it yet.
>

(cc'ing Chuck since I think the problem is in the new detect_trunking code)

Ok, I think I see the problem. The looping comes from this block in
nfs4_discover_server_trunking:

-----------------[snip]-----------------
case -NFS4ERR_CLID_INUSE:
case -NFS4ERR_WRONGSEC:
clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
if (IS_ERR(clnt)) {
status = PTR_ERR(clnt);
break;
}
/* Note: this is safe because we haven't yet marked the
* client as ready, so we are the only user of
* clp->cl_rpcclient
*/
clnt = xchg(&clp->cl_rpcclient, clnt);
rpc_shutdown_client(clnt);
clnt = clp->cl_rpcclient;
goto again;
-----------------[snip]-----------------

...so in the case of the reproducer, we get back -NFS4ERR_CLID_IN_USE,
at that point we call rpc_clone_client_set_auth(), which creates a new
rpc_clnt, but it's created as a child of the original.

When rpc_shutdown_client is called, the original clnt is not destroyed
because the child still holds a reference to it. So, we go and try the
call again and it fails with the same error over and over again, and we
end up with a long chain of rpc_clnt's.

How that ends up smashing the stack, I'm not sure though. I'm also not
sure of the remedy. It seems like we might ought to have some upper
bound on the number of SETCLIENTID attempts?

--
Jeff Layton <[email protected]>

2013-11-12 17:41:41

by Jeff Layton

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount

On Tue, 12 Nov 2013 12:33:28 -0500
Chuck Lever <[email protected]> wrote:

>
> On Nov 12, 2013, at 12:30 PM, "Myklebust, Trond" <[email protected]> wrote:
>
> > On Tue, 2013-11-12 at 11:23 -0500, Chuck Lever wrote:
> >> On Nov 12, 2013, at 11:20 AM, Jeff Layton <[email protected]> wrote:
> >>> Ok, I think I see the problem. The looping comes from this block in
> >>> nfs4_discover_server_trunking:
> >>>
> >>> -----------------[snip]-----------------
> >>> case -NFS4ERR_CLID_INUSE:
> >>> case -NFS4ERR_WRONGSEC:
> >>> clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
> >>> if (IS_ERR(clnt)) {
> >>> status = PTR_ERR(clnt);
> >>> break;
> >>> }
> >>> /* Note: this is safe because we haven't yet marked the
> >>> * client as ready, so we are the only user of
> >>> * clp->cl_rpcclient
> >>> */
> >>> clnt = xchg(&clp->cl_rpcclient, clnt);
> >>> rpc_shutdown_client(clnt);
> >>> clnt = clp->cl_rpcclient;
> >>> goto again;
> >>> -----------------[snip]-----------------
> >>>
> >>> ...so in the case of the reproducer, we get back -NFS4ERR_CLID_IN_USE,
> >>> at that point we call rpc_clone_client_set_auth(), which creates a new
> >>> rpc_clnt, but it's created as a child of the original.
> >>>
> >>> When rpc_shutdown_client is called, the original clnt is not destroyed
> >>> because the child still holds a reference to it. So, we go and try the
> >>> call again and it fails with the same error over and over again, and we
> >>> end up with a long chain of rpc_clnt's.
> >>>
> >>> How that ends up smashing the stack, I'm not sure though. I'm also not
> >>> sure of the remedy. It seems like we might ought to have some upper
> >>> bound on the number of SETCLIENTID attempts?
> >>
> >> CLID_INUSE is supposed to be a permanent error now. I think one retry, if any, is all that is appropriate.
> >
> > Right. If we hit CLID_INUSE in nfs4_discover_server_trunking then
> >
> > a) we know this is a server that we've already mounted
> > b) we know that either nfs4_init_client set us up with RPC_AUTH_UNIX to
> > begin with, or that rpc.gssd was started only after we'd already sent a
> > SETCLIENTID/EXCHANGE_ID using RPC_AUTH_UNIX to this server
> >
> > so the correct thing to do is to retry once if we know that we're not
> > already using AUTH_SYS, and then to EPERM.
>
> Agree. Sorry I didn't spell that out.
>
>
> > Now that said, I agree that this should not be able to trigger a stack
> > overflow. Is this NFSv4 or NFSv4.1/NFSv4.2? Have either of you (Jeff and
> > Dros) tried enabling DEBUG_STACKOVERFLOW?
> >

My kernel says it's on -- but the comments on stack_overflow_check
aren't encouraging for finding this sort of thing:

/*
* Probabilistic stack overflow check:
*
* Only check the stack in process context, because everything else
* runs on the big interrupt stacks. Checking reliably is too expensive,
* so we just check from interrupts.
*/


...as to Bruce's earlier question, the recursion in how this stuff is
freed does seem a bit spooky...

Perhaps we could try doing this iteratively somehow such that it
doesn't recurse

...and/or maybe we should BUG() or WARN() if you create a chain of
clients more than 10-20 deep?

--
Jeff Layton <[email protected]>

2013-11-12 16:23:29

by Chuck Lever III

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount


On Nov 12, 2013, at 11:20 AM, Jeff Layton <[email protected]> wrote:

> On Tue, 12 Nov 2013 10:55:39 -0500
> Jeff Layton <[email protected]> wrote:
>
>> On Tue, 12 Nov 2013 15:31:34 +0000
>> Weston Andros Adamson <[email protected]> wrote:
>>
>>> I got this oops yesterday running the ?test_sec_options.sh? script I recently posted as a patch to Anna?s nfs-ordeal repo (tons of mount/umount).
>>>
>>> At this point GSSD had died (I was tracking down a fd leak). I haven?t been able to reproduce this yet.
>>>
>>> Any idea if I should trust the stack trace? Could this be related to the issue Jeff just posted?
>>>
>>> -dros
>>>
>>> BUG: unable to handle kernel paging request at ffff88017a604030
>>> IP: [<ffffffff81063089>] __wake_up+0x22/0x4d
>>> PGD 2651067 PUD 0
>>> Thread overran stack, or stack corrupted
>>> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
>>> Modules linked in: nfsv4 cts rpcsec_gss_krb5 nfsv3 nfs fscache crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ppdev ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 nfsd parport_pc parport shpchp auth_rpcgss oid_registry exportfs nfs_acl lockd floppy freq_table sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic
>>> CPU: 0 PID: 10547 Comm: mount.nfs Not tainted 3.12.0-rc3-branch-dros_testing+ #1
>>> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
>>> task: ffff8800798f2100 ti: ffff88007a604000 task.ti: ffff88007a604000
>>> RIP: 0010:[<ffffffff81063089>] [<ffffffff81063089>] __wake_up+0x22/0x4d
>>> RSP: 0018:ffff88007a604028 EFLAGS: 00010092
>>> RAX: 0000000000000296 RBX: ffffffffa006a980 RCX: 000000009a519a50
>>> RDX: 000000009a509a50 RSI: 000000000000038a RDI: ffffffffa006a980
>>> RBP: ffff88017a604058 R08: 0000000000000003 R09: 0000000000000001
>>> R10: ffff88006d41d7c0 R11: ffff88007f20b000 R12: ffff88007a6058e0
>>> R13: ffff8800645d8018 R14: ffff88007a6058f8 R15: ffff8800798f2100
>>> FS: 00007fb2765b3880(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: ffff88017a604030 CR3: 000000007a6ba000 CR4: 00000000001407f0
>>> Stack:
>>> ffff88007a604038 0000000000000000 ffff880000000001 0000000000000003
>>> ffff88006453a0d0 ffff88007a6058e0 ffff88007a604078 ffffffffa0045de2
>>> ffff88006453dbe0 ffff88006453dbe0 ffff88007a604098 ffffffffa0045d27
>>> Call Trace:
>>> [<ffffffffa0045de2>] ? rpc_release_client+0x4a/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
>>> [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
>>> [<ffffffffa0045f39>] ? rpc_shutdown_client+0x107/0x116 [sunrpc]
>>> [<ffffffffa02a6456>] ? __fscache_cookie_put+0x43/0x4f [fscache]
>>> [<ffffffffa02a65ca>] ? __fscache_relinquish_cookie+0x168/0x16d [fscache]
>>> [<ffffffffa02bdc2b>] ? nfs_free_client+0x4c/0xaf [nfs]
>>> [<ffffffffa0340e4a>] ? nfs4_free_client+0x97/0x9b [nfsv4]
>>> [<ffffffffa02bcfd9>] ? nfs_put_client+0xe8/0xed [nfs]
>>> [<ffffffffa0341126>] ? nfs4_init_client+0x22e/0x29d [nfsv4]
>>> [<ffffffffa02bc95f>] ? nfs_probe_fsinfo+0x2c7/0x2c7 [nfs]
>>> [<ffffffffa02bd1af>] ? nfs_get_client+0x8a/0x2bf [nfs]
>>> [<ffffffffa02bd37f>] ? nfs_get_client+0x25a/0x2bf [nfs]
>>> [<ffffffffa034059c>] ? nfs4_set_client+0x9f/0xf1 [nfsv4]
>>> [<ffffffffa004e917>] ? __rpc_init_priority_wait_queue+0x98/0xcf [sunrpc]
>>> [<ffffffffa0341999>] ? nfs4_create_server+0xfe/0x264 [nfsv4]
>>> [<ffffffffa033ac59>] ? nfs4_remote_mount+0x2f/0x57 [nfsv4]
>>> [<ffffffff8112a846>] ? mount_fs+0x69/0x157
>>> [<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
>>> [<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
>>> [<ffffffffa033ac02>] ? nfs_do_root_mount+0x8c/0xb4 [nfsv4]
>>> [<ffffffffa033aea9>] ? nfs4_try_mount+0x60/0xbb [nfsv4]
>>> [<ffffffffa02c80eb>] ? nfs_fs_mount+0x88f/0x97a [nfs]
>>> [<ffffffffa02c8620>] ? nfs_clone_super+0x6b/0x6b [nfs]
>>> [<ffffffffa02c59ce>] ? nfs_set_super+0x53/0x53 [nfs]
>>> [<ffffffff8112a846>] ? mount_fs+0x69/0x157
>>> [<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
>>> [<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
>>> [<ffffffff81141fa6>] ? do_mount+0x6ce/0x871
>>> [<ffffffff81141833>] ? copy_mount_options+0xc2/0x12f
>>> [<ffffffff811421ce>] ? SyS_mount+0x85/0xbe
>>> [<ffffffff814a4292>] ? system_call_fastpath+0x16/0x1b
>>> Code: 89 e5 e8 98 ff ff ff 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 48 83 ec 20 89 55 e0 89 75 e8 48 89 4d d8 e8 0c 99 43 00 <4c> 8b 45 d8 48 89 df 8b 55 e0 49 89 c4 31 c9 8b 75 e8 e8 b4 d0
>>> RIP [<ffffffff81063089>] __wake_up+0x22/0x4d
>>> RSP <ffff88007a604028>
>>> CR2: ffff88017a604030
>>> ---[ end trace 1122f3f8cf98e4c2 ]---
>>>
>>
>> Yep, I think this is the same problem I reported earlier. I ran the
>> reproducer with rpc_debug turned up and ended up seeing a very similar
>> stack trace. Basically the server is returning NFS4ERR_CLID_IN_USE but
>> the client keeps retrying the call over and over.
>>
>> I suspect that leads to some sort of recursion, but I haven't quite
>> spotted it yet.
>>
>
> (cc'ing Chuck since I think the problem is in the new detect_trunking code)
>
> Ok, I think I see the problem. The looping comes from this block in
> nfs4_discover_server_trunking:
>
> -----------------[snip]-----------------
> case -NFS4ERR_CLID_INUSE:
> case -NFS4ERR_WRONGSEC:
> clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
> if (IS_ERR(clnt)) {
> status = PTR_ERR(clnt);
> break;
> }
> /* Note: this is safe because we haven't yet marked the
> * client as ready, so we are the only user of
> * clp->cl_rpcclient
> */
> clnt = xchg(&clp->cl_rpcclient, clnt);
> rpc_shutdown_client(clnt);
> clnt = clp->cl_rpcclient;
> goto again;
> -----------------[snip]-----------------
>
> ...so in the case of the reproducer, we get back -NFS4ERR_CLID_IN_USE,
> at that point we call rpc_clone_client_set_auth(), which creates a new
> rpc_clnt, but it's created as a child of the original.
>
> When rpc_shutdown_client is called, the original clnt is not destroyed
> because the child still holds a reference to it. So, we go and try the
> call again and it fails with the same error over and over again, and we
> end up with a long chain of rpc_clnt's.
>
> How that ends up smashing the stack, I'm not sure though. I'm also not
> sure of the remedy. It seems like we might ought to have some upper
> bound on the number of SETCLIENTID attempts?

CLID_INUSE is supposed to be a permanent error now. I think one retry, if any, is all that is appropriate.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2013-11-12 16:57:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount

On Tue, Nov 12, 2013 at 11:20:21AM -0500, Jeff Layton wrote:
> On Tue, 12 Nov 2013 10:55:39 -0500
> Jeff Layton <[email protected]> wrote:
>
> > On Tue, 12 Nov 2013 15:31:34 +0000
> > Weston Andros Adamson <[email protected]> wrote:
> >
> > > I got this oops yesterday running the “test_sec_options.sh” script I recently posted as a patch to Anna’s nfs-ordeal repo (tons of mount/umount).
> > >
> > > At this point GSSD had died (I was tracking down a fd leak). I haven’t been able to reproduce this yet.
> > >
> > > Any idea if I should trust the stack trace? Could this be related to the issue Jeff just posted?
> > >
> > > -dros
> > >
> > > BUG: unable to handle kernel paging request at ffff88017a604030
> > > IP: [<ffffffff81063089>] __wake_up+0x22/0x4d
> > > PGD 2651067 PUD 0
> > > Thread overran stack, or stack corrupted
> > > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> > > Modules linked in: nfsv4 cts rpcsec_gss_krb5 nfsv3 nfs fscache crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ppdev ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 nfsd parport_pc parport shpchp auth_rpcgss oid_registry exportfs nfs_acl lockd floppy freq_table sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic
> > > CPU: 0 PID: 10547 Comm: mount.nfs Not tainted 3.12.0-rc3-branch-dros_testing+ #1
> > > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> > > task: ffff8800798f2100 ti: ffff88007a604000 task.ti: ffff88007a604000
> > > RIP: 0010:[<ffffffff81063089>] [<ffffffff81063089>] __wake_up+0x22/0x4d
> > > RSP: 0018:ffff88007a604028 EFLAGS: 00010092
> > > RAX: 0000000000000296 RBX: ffffffffa006a980 RCX: 000000009a519a50
> > > RDX: 000000009a509a50 RSI: 000000000000038a RDI: ffffffffa006a980
> > > RBP: ffff88017a604058 R08: 0000000000000003 R09: 0000000000000001
> > > R10: ffff88006d41d7c0 R11: ffff88007f20b000 R12: ffff88007a6058e0
> > > R13: ffff8800645d8018 R14: ffff88007a6058f8 R15: ffff8800798f2100
> > > FS: 00007fb2765b3880(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: ffff88017a604030 CR3: 000000007a6ba000 CR4: 00000000001407f0
> > > Stack:
> > > ffff88007a604038 0000000000000000 ffff880000000001 0000000000000003
> > > ffff88006453a0d0 ffff88007a6058e0 ffff88007a604078 ffffffffa0045de2
> > > ffff88006453dbe0 ffff88006453dbe0 ffff88007a604098 ffffffffa0045d27
> > > Call Trace:
> > > [<ffffffffa0045de2>] ? rpc_release_client+0x4a/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045d27>] ? rpc_free_client+0x56/0xc7 [sunrpc]
> > > [<ffffffffa0045e00>] ? rpc_release_client+0x68/0x9a [sunrpc]
> > > [<ffffffffa0045f39>] ? rpc_shutdown_client+0x107/0x116 [sunrpc]
> > > [<ffffffffa02a6456>] ? __fscache_cookie_put+0x43/0x4f [fscache]
> > > [<ffffffffa02a65ca>] ? __fscache_relinquish_cookie+0x168/0x16d [fscache]
> > > [<ffffffffa02bdc2b>] ? nfs_free_client+0x4c/0xaf [nfs]
> > > [<ffffffffa0340e4a>] ? nfs4_free_client+0x97/0x9b [nfsv4]
> > > [<ffffffffa02bcfd9>] ? nfs_put_client+0xe8/0xed [nfs]
> > > [<ffffffffa0341126>] ? nfs4_init_client+0x22e/0x29d [nfsv4]
> > > [<ffffffffa02bc95f>] ? nfs_probe_fsinfo+0x2c7/0x2c7 [nfs]
> > > [<ffffffffa02bd1af>] ? nfs_get_client+0x8a/0x2bf [nfs]
> > > [<ffffffffa02bd37f>] ? nfs_get_client+0x25a/0x2bf [nfs]
> > > [<ffffffffa034059c>] ? nfs4_set_client+0x9f/0xf1 [nfsv4]
> > > [<ffffffffa004e917>] ? __rpc_init_priority_wait_queue+0x98/0xcf [sunrpc]
> > > [<ffffffffa0341999>] ? nfs4_create_server+0xfe/0x264 [nfsv4]
> > > [<ffffffffa033ac59>] ? nfs4_remote_mount+0x2f/0x57 [nfsv4]
> > > [<ffffffff8112a846>] ? mount_fs+0x69/0x157
> > > [<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
> > > [<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
> > > [<ffffffffa033ac02>] ? nfs_do_root_mount+0x8c/0xb4 [nfsv4]
> > > [<ffffffffa033aea9>] ? nfs4_try_mount+0x60/0xbb [nfsv4]
> > > [<ffffffffa02c80eb>] ? nfs_fs_mount+0x88f/0x97a [nfs]
> > > [<ffffffffa02c8620>] ? nfs_clone_super+0x6b/0x6b [nfs]
> > > [<ffffffffa02c59ce>] ? nfs_set_super+0x53/0x53 [nfs]
> > > [<ffffffff8112a846>] ? mount_fs+0x69/0x157
> > > [<ffffffff810fb79b>] ? __alloc_percpu+0x10/0x12
> > > [<ffffffff8113fcbd>] ? vfs_kern_mount+0x62/0xd9
> > > [<ffffffff81141fa6>] ? do_mount+0x6ce/0x871
> > > [<ffffffff81141833>] ? copy_mount_options+0xc2/0x12f
> > > [<ffffffff811421ce>] ? SyS_mount+0x85/0xbe
> > > [<ffffffff814a4292>] ? system_call_fastpath+0x16/0x1b
> > > Code: 89 e5 e8 98 ff ff ff 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 48 83 ec 20 89 55 e0 89 75 e8 48 89 4d d8 e8 0c 99 43 00 <4c> 8b 45 d8 48 89 df 8b 55 e0 49 89 c4 31 c9 8b 75 e8 e8 b4 d0
> > > RIP [<ffffffff81063089>] __wake_up+0x22/0x4d
> > > RSP <ffff88007a604028>
> > > CR2: ffff88017a604030
> > > ---[ end trace 1122f3f8cf98e4c2 ]---
> > >
> >
> > Yep, I think this is the same problem I reported earlier. I ran the
> > reproducer with rpc_debug turned up and ended up seeing a very similar
> > stack trace. Basically the server is returning NFS4ERR_CLID_IN_USE but
> > the client keeps retrying the call over and over.
> >
> > I suspect that leads to some sort of recursion, but I haven't quite
> > spotted it yet.
> >
>
> (cc'ing Chuck since I think the problem is in the new detect_trunking code)
>
> Ok, I think I see the problem. The looping comes from this block in
> nfs4_discover_server_trunking:
>
> -----------------[snip]-----------------
> case -NFS4ERR_CLID_INUSE:
> case -NFS4ERR_WRONGSEC:
> clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
> if (IS_ERR(clnt)) {
> status = PTR_ERR(clnt);
> break;
> }
> /* Note: this is safe because we haven't yet marked the
> * client as ready, so we are the only user of
> * clp->cl_rpcclient
> */
> clnt = xchg(&clp->cl_rpcclient, clnt);
> rpc_shutdown_client(clnt);
> clnt = clp->cl_rpcclient;
> goto again;
> -----------------[snip]-----------------
>
> ...so in the case of the reproducer, we get back -NFS4ERR_CLID_IN_USE,
> at that point we call rpc_clone_client_set_auth(), which creates a new
> rpc_clnt, but it's created as a child of the original.
>
> When rpc_shutdown_client is called, the original clnt is not destroyed
> because the child still holds a reference to it. So, we go and try the
> call again and it fails with the same error over and over again, and we
> end up with a long chain of rpc_clnt's.
>
> How that ends up smashing the stack, I'm not sure though.

rpc_free_client(clnt)
rpc_release_client(clnt->cl_parent)
rpc_free_auth(clnt)
free_free_client(clnt)

So freeing a client with N ancestors can take N times the stack as
freeing a single client.

(Are there any other cases that can create arbitrarily long cl_parent
chains?)

--b.

> I'm also not
> sure of the remedy. It seems like we might ought to have some upper
> bound on the number of SETCLIENTID attempts?
>
> --
> Jeff Layton <[email protected]>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-11-12 17:52:46

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount


On Nov 12, 2013, at 12:30 PM, Myklebust, Trond <[email protected]> wrote:

> On Tue, 2013-11-12 at 11:23 -0500, Chuck Lever wrote:
>> On Nov 12, 2013, at 11:20 AM, Jeff Layton <[email protected]> wrote:
>>> Ok, I think I see the problem. The looping comes from this block in
>>> nfs4_discover_server_trunking:
>>>
>>> -----------------[snip]-----------------
>>> case -NFS4ERR_CLID_INUSE:
>>> case -NFS4ERR_WRONGSEC:
>>> clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
>>> if (IS_ERR(clnt)) {
>>> status = PTR_ERR(clnt);
>>> break;
>>> }
>>> /* Note: this is safe because we haven't yet marked the
>>> * client as ready, so we are the only user of
>>> * clp->cl_rpcclient
>>> */
>>> clnt = xchg(&clp->cl_rpcclient, clnt);
>>> rpc_shutdown_client(clnt);
>>> clnt = clp->cl_rpcclient;
>>> goto again;
>>> -----------------[snip]-----------------
>>>
>>> ...so in the case of the reproducer, we get back -NFS4ERR_CLID_IN_USE,
>>> at that point we call rpc_clone_client_set_auth(), which creates a new
>>> rpc_clnt, but it's created as a child of the original.
>>>
>>> When rpc_shutdown_client is called, the original clnt is not destroyed
>>> because the child still holds a reference to it. So, we go and try the
>>> call again and it fails with the same error over and over again, and we
>>> end up with a long chain of rpc_clnt's.
>>>
>>> How that ends up smashing the stack, I'm not sure though. I'm also not
>>> sure of the remedy. It seems like we might ought to have some upper
>>> bound on the number of SETCLIENTID attempts?
>>
>> CLID_INUSE is supposed to be a permanent error now. I think one retry, if any, is all that is appropriate.
>
> Right. If we hit CLID_INUSE in nfs4_discover_server_trunking then
>
> a) we know this is a server that we've already mounted
> b) we know that either nfs4_init_client set us up with RPC_AUTH_UNIX to
> begin with, or that rpc.gssd was started only after we'd already sent a
> SETCLIENTID/EXCHANGE_ID using RPC_AUTH_UNIX to this server
>
> so the correct thing to do is to retry once if we know that we're not
> already using AUTH_SYS, and then to EPERM.
>
>
> Now that said, I agree that this should not be able to trigger a stack
> overflow. Is this NFSv4 or NFSv4.1/NFSv4.2? Have either of you (Jeff and
> Dros) tried enabling DEBUG_STACKOVERFLOW?

IIRC it was a v4.0 mount when I hit this. Yes, I have CONFIG_DEBUG_STACKOVERFLOW=y.

-dros

>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com


2013-11-12 17:31:14

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount

On Tue, 2013-11-12 at 11:23 -0500, Chuck Lever wrote:
+AD4- On Nov 12, 2013, at 11:20 AM, Jeff Layton +ADw-jlayton+AEA-redhat.com+AD4- wrote:
+AD4- +AD4- Ok, I think I see the problem. The looping comes from this block in
+AD4- +AD4- nfs4+AF8-discover+AF8-server+AF8-trunking:
+AD4- +AD4-
+AD4- +AD4- -----------------+AFs-snip+AF0------------------
+AD4- +AD4- case -NFS4ERR+AF8-CLID+AF8-INUSE:
+AD4- +AD4- case -NFS4ERR+AF8-WRONGSEC:
+AD4- +AD4- clnt +AD0- rpc+AF8-clone+AF8-client+AF8-set+AF8-auth(clnt, RPC+AF8-AUTH+AF8-UNIX)+ADs-
+AD4- +AD4- if (IS+AF8-ERR(clnt)) +AHs-
+AD4- +AD4- status +AD0- PTR+AF8-ERR(clnt)+ADs-
+AD4- +AD4- break+ADs-
+AD4- +AD4- +AH0-
+AD4- +AD4- /+ACo- Note: this is safe because we haven't yet marked the
+AD4- +AD4- +ACo- client as ready, so we are the only user of
+AD4- +AD4- +ACo- clp-+AD4-cl+AF8-rpcclient
+AD4- +AD4- +ACo-/
+AD4- +AD4- clnt +AD0- xchg(+ACY-clp-+AD4-cl+AF8-rpcclient, clnt)+ADs-
+AD4- +AD4- rpc+AF8-shutdown+AF8-client(clnt)+ADs-
+AD4- +AD4- clnt +AD0- clp-+AD4-cl+AF8-rpcclient+ADs-
+AD4- +AD4- goto again+ADs-
+AD4- +AD4- -----------------+AFs-snip+AF0------------------
+AD4- +AD4-
+AD4- +AD4- ...so in the case of the reproducer, we get back -NFS4ERR+AF8-CLID+AF8-IN+AF8-USE,
+AD4- +AD4- at that point we call rpc+AF8-clone+AF8-client+AF8-set+AF8-auth(), which creates a new
+AD4- +AD4- rpc+AF8-clnt, but it's created as a child of the original.
+AD4- +AD4-
+AD4- +AD4- When rpc+AF8-shutdown+AF8-client is called, the original clnt is not destroyed
+AD4- +AD4- because the child still holds a reference to it. So, we go and try the
+AD4- +AD4- call again and it fails with the same error over and over again, and we
+AD4- +AD4- end up with a long chain of rpc+AF8-clnt's.
+AD4- +AD4-
+AD4- +AD4- How that ends up smashing the stack, I'm not sure though. I'm also not
+AD4- +AD4- sure of the remedy. It seems like we might ought to have some upper
+AD4- +AD4- bound on the number of SETCLIENTID attempts?
+AD4-
+AD4- CLID+AF8-INUSE is supposed to be a permanent error now. I think one retry, if any, is all that is appropriate.

Right. If we hit CLID+AF8-INUSE in nfs4+AF8-discover+AF8-server+AF8-trunking then

a) we know this is a server that we've already mounted
b) we know that either nfs4+AF8-init+AF8-client set us up with RPC+AF8-AUTH+AF8-UNIX to
begin with, or that rpc.gssd was started only after we'd already sent a
SETCLIENTID/EXCHANGE+AF8-ID using RPC+AF8-AUTH+AF8-UNIX to this server

so the correct thing to do is to retry once if we know that we're not
already using AUTH+AF8-SYS, and then to EPERM.


Now that said, I agree that this should not be able to trigger a stack
overflow. Is this NFSv4 or NFSv4.1/NFSv4.2? Have either of you (Jeff and
Dros) tried enabling DEBUG+AF8-STACKOVERFLOW?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
http://www.netapp.com

2013-11-12 17:33:35

by Chuck Lever III

[permalink] [raw]
Subject: Re: Thread overran stack, or stack corrupted BUG on mount


On Nov 12, 2013, at 12:30 PM, "Myklebust, Trond" <[email protected]> wrote:

> On Tue, 2013-11-12 at 11:23 -0500, Chuck Lever wrote:
>> On Nov 12, 2013, at 11:20 AM, Jeff Layton <[email protected]> wrote:
>>> Ok, I think I see the problem. The looping comes from this block in
>>> nfs4_discover_server_trunking:
>>>
>>> -----------------[snip]-----------------
>>> case -NFS4ERR_CLID_INUSE:
>>> case -NFS4ERR_WRONGSEC:
>>> clnt = rpc_clone_client_set_auth(clnt, RPC_AUTH_UNIX);
>>> if (IS_ERR(clnt)) {
>>> status = PTR_ERR(clnt);
>>> break;
>>> }
>>> /* Note: this is safe because we haven't yet marked the
>>> * client as ready, so we are the only user of
>>> * clp->cl_rpcclient
>>> */
>>> clnt = xchg(&clp->cl_rpcclient, clnt);
>>> rpc_shutdown_client(clnt);
>>> clnt = clp->cl_rpcclient;
>>> goto again;
>>> -----------------[snip]-----------------
>>>
>>> ...so in the case of the reproducer, we get back -NFS4ERR_CLID_IN_USE,
>>> at that point we call rpc_clone_client_set_auth(), which creates a new
>>> rpc_clnt, but it's created as a child of the original.
>>>
>>> When rpc_shutdown_client is called, the original clnt is not destroyed
>>> because the child still holds a reference to it. So, we go and try the
>>> call again and it fails with the same error over and over again, and we
>>> end up with a long chain of rpc_clnt's.
>>>
>>> How that ends up smashing the stack, I'm not sure though. I'm also not
>>> sure of the remedy. It seems like we might ought to have some upper
>>> bound on the number of SETCLIENTID attempts?
>>
>> CLID_INUSE is supposed to be a permanent error now. I think one retry, if any, is all that is appropriate.
>
> Right. If we hit CLID_INUSE in nfs4_discover_server_trunking then
>
> a) we know this is a server that we've already mounted
> b) we know that either nfs4_init_client set us up with RPC_AUTH_UNIX to
> begin with, or that rpc.gssd was started only after we'd already sent a
> SETCLIENTID/EXCHANGE_ID using RPC_AUTH_UNIX to this server
>
> so the correct thing to do is to retry once if we know that we're not
> already using AUTH_SYS, and then to EPERM.

Agree. Sorry I didn't spell that out.


> Now that said, I agree that this should not be able to trigger a stack
> overflow. Is this NFSv4 or NFSv4.1/NFSv4.2? Have either of you (Jeff and
> Dros) tried enabling DEBUG_STACKOVERFLOW?
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com