Hi,
I'm getting a trace on nfs_idmap_legacy_upcall with a 3.4.7 amd64 kernel.
This nfs patch was also applied:
a427b9e (trond tree) NFS: Fix a number of bugs in the idmapper
Any thoughts?
CPU 0
Pid: 32093, comm: mount.nfs4 Not tainted 3.4.7
RIP: 0010:[<ffffffff81185451>] [<ffffffff81185451>]
nfs_idmap_legacy_upcall+0x1e1/0x1f0
RSP: 0018:ffff88056eef9548 EFLAGS: 00010282
RAX: 000000000000000e RBX: ffff88061b8d1200 RCX: 0000000000000000
RDX: 0000000000000006 RSI: ffff8805804354d2 RDI: ffff88061b8d1210
RBP: ffff8805b8725140 R08: 000000000000003a R09: ffffffff811852ed
R10: 00000000f661ded1 R11: 00000000677e45e9 R12: ffff880c0512c5b0
R13: ffff8805b8725740 R14: ffff88061ccab140 R15: 000000000000000e
FS: 0000036dfcf5d700(0000) GS:ffff880627c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000040001dc1000 CR3: 000000000148e000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mount.nfs4 (pid: 32093, threadinfo ffff88056ec3f950, task
ffff88056ec3f500)
Stack:
ffff8805804354c4 ffff8805804354d2 ffffffff81938600 ffff880c0512c5b0
ffff88061ccaae40 ffff88061ccab140 ffffffff816b02b9 ffffffff81938618
0000000000000000 ffffffff811a56b4 0000000000000000 0000000000000000
Call Trace:
[<ffffffff811a56b4>] ? request_key_and_link+0x394/0x460
[<ffffffff811a57d5>] ? request_key_with_auxdata+0x15/0x60
[<ffffffff81185543>] ? nfs_idmap_request_key+0xe3/0x1c0
[<ffffffff811856d0>] ? nfs_idmap_get_key+0xb0/0xd0
[<ffffffff811859af>] ? nfs_idmap_lookup_id+0x2f/0x80
[<ffffffff81185a39>] ? nfs_map_group_to_gid+0x39/0x90
[<ffffffff8117e4f5>] ? decode_getfattr_attrs+0x915/0xa10
[<ffffffff8117fdc6>] ? T.1607+0x96/0xe0
[<ffffffff811802a9>] ? nfs4_xdr_dec_lookup_root+0x69/0x70
[<ffffffff814481f0>] ? rpc_queue_empty+0x40/0x40
[<ffffffff814481f0>] ? rpc_queue_empty+0x40/0x40
[<ffffffff81180240>] ? nfs4_xdr_dec_remove+0x90/0x90
[<ffffffff81449419>] ? rpcauth_unwrap_resp+0x79/0x80
[<ffffffff81180240>] ? nfs4_xdr_dec_remove+0x90/0x90
[<ffffffff81441313>] ? call_decode+0x2a3/0x400
[<ffffffff81448806>] ? __rpc_execute+0x46/0x1b0
[<ffffffff81055368>] ? wake_up_bit+0x18/0x40
[<ffffffff81441cc9>] ? rpc_run_task+0x79/0xa0
[<ffffffff81441deb>] ? rpc_call_sync+0x3b/0x70
[<ffffffff8117624a>] ? _nfs4_lookup_root+0xaa/0xd0
[<ffffffff81458052>] ? T.993+0x92/0xd0
[<ffffffff811762be>] ? nfs4_lookup_root+0x4e/0x90
[<ffffffff81176330>] ? nfs4_proc_get_root+0x30/0xd0
[<ffffffff811616c3>] ? nfs4_get_rootfh+0x43/0xf0
[<ffffffff81440ac9>] ? rpc_net_ns+0x19/0x20
[<ffffffff814418a6>] ? rpc_register_client+0x56/0xa0
[<ffffffff8115b553>] ? nfs4_server_common_setup+0x63/0xf0
[<ffffffff8115c8ca>] ? nfs4_create_server+0x1ca/0x2c0
[<ffffffff81166432>] ? nfs4_remote_mount+0x62/0x210
[<ffffffff810f5045>] ? mount_fs+0x45/0x1c0
[<ffffffff8111288d>] ? vfs_kern_mount+0x6d/0x100
[<ffffffff81166840>] ? nfs_do_root_mount+0x90/0xf0
[<ffffffff81166b08>] ? nfs4_try_mount+0x38/0x60
[<ffffffff81166d15>] ? nfs4_mount+0x1e5/0x490
[<ffffffff810df1a4>] ? __kmalloc_track_caller+0x34/0x120
[<ffffffff810f5045>] ? mount_fs+0x45/0x1c0
[<ffffffff8111288d>] ? vfs_kern_mount+0x6d/0x100
[<ffffffff81112e33>] ? do_kern_mount+0x53/0x110
[<ffffffff8111440e>] ? do_mount+0x22e/0x870
[<ffffffff811f6cb5>] ? __get_user_1+0x25/0x30
[<ffffffff81114aea>] ? sys_mount+0x9a/0x100
[<ffffffff8147e2de>] ? system_call_fastpath+0x18/0x1d
Code: 01 48 8d 7b 02 ba 80 00 00 00 48 89 e6 e8 58 36 07 00 41 89 c7
eb 9c 0f 1f 00 c7 03 00 00 00 00 48 8d 7b 04 b2 88 e9 b9 fe ff ff <0f>
0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec 58 31 c0
RIP [<ffffffff81185451>] nfs_idmap_legacy_upcall+0x1e1/0x1f0
RSP <ffff88056eef9548>
---[ end trace ab4ae3b499d533dc ]---
Regards,
--
William
On 08/04/2012 06:26 AM, William Dauchy wrote:
> On Fri, Aug 3, 2012 at 2:08 PM, William Dauchy <[email protected]> wrote:
>> 4697bd5:NFSv4: Fix a race in the net namespace mount notification
>
> this one didn't help. still getting a wait_for_key_construction oops.
> I opened a second thread about it, first thinking this was two
> different issues
> will dig it a bit more
>
What are you doing to reproduce this? I'm having bash run mount / unmount in a loop, but so far I haven't seen the oops. Also, are you using kerberos?
- Bryan
Hi David,
Thank you for your reply.
On Fri, Aug 3, 2012 at 11:47 AM, David Howells <[email protected]> wrote:
> I'm afraid I'm not going to be able to do a lot with this until the 13th as
> I'm about to go on holiday. If it is still a problem when I get back, I'll
> look at it then.
Thanks. I started to dig it and went through the huge backlog between
3.4 and 3.5. It seems like a race during mount since it's only
happening sometimes and can't reproduce it easily.
I first thought there was still an issue in this fix:
b102743: NFS: Force the legacy idmapper to be single threaded
but then found:
4697bd5:NFSv4: Fix a race in the net namespace mount notification
I guess this last one might be related to my issues since idmaper is
using struct nfs_client. I started new tests with it.
Regards,
--
William
William Dauchy <[email protected]> wrote:
> I'm getting a trace on nfs_idmap_legacy_upcall with a 3.4.7 amd64 kernel.
> This nfs patch was also applied:
> a427b9e (trond tree) NFS: Fix a number of bugs in the idmapper
> Any thoughts?
I'm afraid I'm not going to be able to do a lot with this until the 13th as
I'm about to go on holiday. If it is still a problem when I get back, I'll
look at it then.
David
Hi Bryan,
On Sat, Aug 4, 2012 at 5:09 PM, Bryan Schumaker <[email protected]> wrote:
> What are you doing to reproduce this? I'm having bash run mount / unmount in a loop, but so far I haven't seen the oops. Also, are you using kerberos?
That's what I'm trying to understand because I'm note able to
reproduce it when I want (wait_for_key_construction oops).
About 300 nfs mount, each one doing I/O, after a while some
mount/umount produces the oops; but I don't know yet exactly when.
I'm not using kerberos.
I was thinking about a remaining race on the old idmaper. I'm putting
the null deference trace back in this thread as a reminder.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
IP: [<ffffffff811a4a18>] wait_for_key_construction+0x28/0x70
PGD 2bf8de000
Oops: 0000 [#1] PREEMPT SMP
CPU 23
Pid: 25735, comm: kworker/23:2 Tainted: G W 3.4.4 #1 Dell
Inc. C6100 /0D61XP
RIP: 0010:[<ffffffff811a4a18>] [<ffffffff811a4a18>]
wait_for_key_construction+0x28/0x70
RSP: 0018:ffff880b99a15a80 EFLAGS: 00010246
RAX: ffffffff811a4a70 RBX: 0000000000000000 RCX: 0000000000000002
RDX: ffffffff811a4a60 RSI: 0000000000000000 RDI: 0000000000000070
RBP: ffff8803c915e420 R08: ffff88061c877bc1 R09: 0000000000000000
R10: 0000000050159fb7 R11: 0000000000000000 R12: ffffffff816b5e49
R13: ffff880ba944ca48 R14: 0000000000000013 R15: ffff8803c915e423
FS: 0000000000000000(0000) GS:ffff880c3fd60000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000070 CR3: 00000000014aa000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/23:2 (pid: 25735, threadinfo ffff880b154a0ba0, task
ffff880b154a0750)
Stack:
0000000000000000 ffffffff811a512f 0000000000000000 ffffffff810df09d
0000000000000000 000000000000000e ffff880841b54cc0 ffffffff816c88f8
ffff8803c915e420 ffffffff81184fbb 0000000000000013 ffffffff81935c40
Call Trace:
[<ffffffff811a512f>] ? request_key+0x5f/0xa0
[<ffffffff810df09d>] ? __kmalloc+0x2d/0x120
[<ffffffff81184fbb>] ? nfs_idmap_request_key+0x1ab/0x1c0
[<ffffffff81185027>] ? nfs_idmap_get_key+0x57/0xd0
[<ffffffff811852ae>] ? nfs_map_string_to_numeric+0x3e/0xc0
[<ffffffff8118535f>] ? nfs_idmap_lookup_id+0x2f/0x80
[<ffffffff81185479>] ? nfs_map_name_to_uid+0x39/0x90
[<ffffffff8117dedb>] ? decode_getfattr_attrs+0x94b/0xa10
[<ffffffff8117f776>] ? T.1607+0x96/0xe0
[<ffffffff8117f852>] ? nfs4_xdr_dec_delegreturn+0x72/0x80
[<ffffffff8117f7e0>] ? decode_getfattr+0x20/0x20
[<ffffffff8144db39>] ? rpcauth_unwrap_resp+0x79/0x80
[<ffffffff8117f7e0>] ? decode_getfattr+0x20/0x20
[<ffffffff81445a33>] ? call_decode+0x2a3/0x400
[<ffffffff8144cf26>] ? __rpc_execute+0x46/0x1b0
[<ffffffff81050338>] ? process_one_work+0x108/0x3a0
[<ffffffff8144d0c0>] ? rpc_execute+0x30/0x30
[<ffffffff81050a21>] ? worker_thread+0x151/0x420
[<ffffffff810508d0>] ? rescuer_thread+0x300/0x300
[<ffffffff810508d0>] ? rescuer_thread+0x300/0x300
[<ffffffff81054e3e>] ? kthread+0x9e/0xb0
[<ffffffff81484774>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81482a38>] ? retint_restore_args+0x6/0x6
[<ffffffff81054da0>] ? kthread_freezable_should_stop+0x60/0x60
[<ffffffff81484770>] ? gs_change+0xb/0xb
Code: 00 00 00 40 80 fe 01 53 19 c9 48 89 fb 48 c7 c0 70 4a 1a 81 f7
d1 48 c7 c2 60 4a 1a 81 83 c1 02 48 8d 7f 70 40 84 f6 48 0f 45 d0 <48>
8b 43 70 a8 10 75 20 48 8b 43 70 a8 20 74 08 8b 83 80 00 00
RIP [<ffffffff811a4a18>] wait_for_key_construction+0x28/0x70
RSP <ffff880b99a15a80>
CR2: 0000000000000070
--
William
On Fri, Aug 3, 2012 at 2:08 PM, William Dauchy <[email protected]> wrote:
> 4697bd5:NFSv4: Fix a race in the net namespace mount notification
this one didn't help. still getting a wait_for_key_construction oops.
I opened a second thread about it, first thinking this was two
different issues
will dig it a bit more
--
William