2013-03-05 18:54:59

by Ben Greear

[permalink] [raw]
Subject: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

In doing some CIFS testing (utilizing it's feature to bind to local
address..but not sure that matters), we saw this error when trying
to un-mount.

Our kernel is patched (nfs, some networking related patches), but there
are no out-of-kernel patches to CIFS, so I don't *think* this is anything
we could have caused.

This problem appears to be easily reproducible, so we will be happy
to test patches if anyone has any suggestions.

BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]
------------[ cut here ]------------
kernel BUG at /home/greearb/git/linux-3.7.dev.y/fs/dcache.c:967!
invalid opcode: 0000 [#1] PREEMPT SMP
Modules linked in: nls_utf8 cifs 8021q garp stp llc iptable_raw xt_CT veth nf_nat_ipv4 nf_nat fuse macvlan wanlink(O) pktgen nfsv3 nfs_acl nfsv4 auth_rpcgss nfs
fscache lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
w83793 iscsi_tcp w83627hf libiscsi_tcp hwmon_vid libiscsi scsi_transport_iscsi coretemp mperf kvm_intel kvm i5k_amb uinput i5000_edac gpio_ich edac_core
iTCO_wdt e1000e iTCO_vendor_support lpc_ich i2c_i801 pcspkr ioatdma dca microcode shpchp ipv6 floppy radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core
[last unloaded: iptable_nat]
CPU 6
Pid: 6610, comm: umount Tainted: G C O 3.7.10+ #74 Supermicro X7DBU/X7DBU
RIP: 0010:[<ffffffff811591c9>] [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
RSP: 0018:ffff8800c0085dc8 EFLAGS: 00010296
RAX: 0000000000000062 RBX: ffff8800c07e43c0 RCX: 0000000000000059
RDX: ffffffff81bc25a8 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff8800c0085de8 R08: 0000000000000001 R09: ffff8800c0085cc8
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c050e9c0
R13: ffff880128ee8000 R14: 0000000000000000 R15: ffff8800c0085f28
FS: 00007f6084847840(0000) GS:ffff88012fd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f608442c3a0 CR3: 00000000c7c2d000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount (pid: 6610, threadinfo ffff8800c0084000, task ffff88012a6a0000)
Stack:
ffff880128ee8310 00000000000128c0 ffff880128ee8000 ffffffffa06b96b0
ffff8800c0085e08 ffffffff81159310 ffff8800c0084000 ffff880128ee8000
ffff8800c0085e38 ffffffff81149afe ffff8800c0085e38 0000000000000021
Call Trace:
[<ffffffff81159310>] shrink_dcache_for_umount+0x37/0x49
[<ffffffff81149afe>] generic_shutdown_super+0x20/0xd2
[<ffffffff81149c25>] kill_anon_super+0x11/0x1c
[<ffffffffa068b1ea>] cifs_kill_sb+0x15/0x21 [cifs]
[<ffffffff81149e48>] deactivate_locked_super+0x32/0x5e
[<ffffffff8114a942>] deactivate_super+0x40/0x46
[<ffffffff8115fdb3>] mntput_no_expire+0x12d/0x136
[<ffffffff81160b59>] sys_umount+0x321/0x34c
[<ffffffff8114f846>] ? path_put+0x1d/0x21
[<ffffffff81525229>] system_call_fastpath+0x16/0x1b
Code: 50 28 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 10 03 00 00 48 89 de 48 c7 c7 9d d0 7b 81 48 89 04 24 31 c0 e8 b9 4e 3c 00 <0f> 0b eb fe 4c 8b 63 18
4c 39 e3 75 3c 48 8b 93 90 00 00 00 48
RIP [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
RSP <ffff8800c0085dc8>
---[ end trace 9b2978a89532c292 ]---
--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2013-03-05 19:08:52

by Jeff Layton

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

On Tue, 05 Mar 2013 10:54:56 -0800
Ben Greear <[email protected]> wrote:

> In doing some CIFS testing (utilizing it's feature to bind to local
> address..but not sure that matters), we saw this error when trying
> to un-mount.
>
> Our kernel is patched (nfs, some networking related patches), but there
> are no out-of-kernel patches to CIFS, so I don't *think* this is anything
> we could have caused.
>
> This problem appears to be easily reproducible, so we will be happy
> to test patches if anyone has any suggestions.
>
> BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]
> ------------[ cut here ]------------
> kernel BUG at /home/greearb/git/linux-3.7.dev.y/fs/dcache.c:967!
> invalid opcode: 0000 [#1] PREEMPT SMP
> Modules linked in: nls_utf8 cifs 8021q garp stp llc iptable_raw xt_CT veth nf_nat_ipv4 nf_nat fuse macvlan wanlink(O) pktgen nfsv3 nfs_acl nfsv4 auth_rpcgss nfs
> fscache lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
> w83793 iscsi_tcp w83627hf libiscsi_tcp hwmon_vid libiscsi scsi_transport_iscsi coretemp mperf kvm_intel kvm i5k_amb uinput i5000_edac gpio_ich edac_core
> iTCO_wdt e1000e iTCO_vendor_support lpc_ich i2c_i801 pcspkr ioatdma dca microcode shpchp ipv6 floppy radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core
> [last unloaded: iptable_nat]
> CPU 6
> Pid: 6610, comm: umount Tainted: G C O 3.7.10+ #74 Supermicro X7DBU/X7DBU
> RIP: 0010:[<ffffffff811591c9>] [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
> RSP: 0018:ffff8800c0085dc8 EFLAGS: 00010296
> RAX: 0000000000000062 RBX: ffff8800c07e43c0 RCX: 0000000000000059
> RDX: ffffffff81bc25a8 RSI: 0000000000000046 RDI: 0000000000000246
> RBP: ffff8800c0085de8 R08: 0000000000000001 R09: ffff8800c0085cc8
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c050e9c0
> R13: ffff880128ee8000 R14: 0000000000000000 R15: ffff8800c0085f28
> FS: 00007f6084847840(0000) GS:ffff88012fd80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f608442c3a0 CR3: 00000000c7c2d000 CR4: 00000000000007e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process umount (pid: 6610, threadinfo ffff8800c0084000, task ffff88012a6a0000)
> Stack:
> ffff880128ee8310 00000000000128c0 ffff880128ee8000 ffffffffa06b96b0
> ffff8800c0085e08 ffffffff81159310 ffff8800c0084000 ffff880128ee8000
> ffff8800c0085e38 ffffffff81149afe ffff8800c0085e38 0000000000000021
> Call Trace:
> [<ffffffff81159310>] shrink_dcache_for_umount+0x37/0x49
> [<ffffffff81149afe>] generic_shutdown_super+0x20/0xd2
> [<ffffffff81149c25>] kill_anon_super+0x11/0x1c
> [<ffffffffa068b1ea>] cifs_kill_sb+0x15/0x21 [cifs]
> [<ffffffff81149e48>] deactivate_locked_super+0x32/0x5e
> [<ffffffff8114a942>] deactivate_super+0x40/0x46
> [<ffffffff8115fdb3>] mntput_no_expire+0x12d/0x136
> [<ffffffff81160b59>] sys_umount+0x321/0x34c
> [<ffffffff8114f846>] ? path_put+0x1d/0x21
> [<ffffffff81525229>] system_call_fastpath+0x16/0x1b
> Code: 50 28 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 10 03 00 00 48 89 de 48 c7 c7 9d d0 7b 81 48 89 04 24 31 c0 e8 b9 4e 3c 00 <0f> 0b eb fe 4c 8b 63 18
> 4c 39 e3 75 3c 48 8b 93 90 00 00 00 48
> RIP [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
> RSP <ffff8800c0085dc8>
> ---[ end trace 9b2978a89532c292 ]---

Hmmm...dentry leak. Are there any jobs queued to the cifsiod workqueue
when the box oopses?

--
Jeff Layton <[email protected]>

2013-03-05 19:22:24

by Jeff Layton

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

On Tue, 5 Mar 2013 14:08:49 -0500
Jeff Layton <[email protected]> wrote:

> On Tue, 05 Mar 2013 10:54:56 -0800
> Ben Greear <[email protected]> wrote:
>
> > In doing some CIFS testing (utilizing it's feature to bind to local
> > address..but not sure that matters), we saw this error when trying
> > to un-mount.
> >
> > Our kernel is patched (nfs, some networking related patches), but there
> > are no out-of-kernel patches to CIFS, so I don't *think* this is anything
> > we could have caused.
> >
> > This problem appears to be easily reproducible, so we will be happy
> > to test patches if anyone has any suggestions.
> >
> > BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]
> > ------------[ cut here ]------------
> > kernel BUG at /home/greearb/git/linux-3.7.dev.y/fs/dcache.c:967!
> > invalid opcode: 0000 [#1] PREEMPT SMP
> > Modules linked in: nls_utf8 cifs 8021q garp stp llc iptable_raw xt_CT veth nf_nat_ipv4 nf_nat fuse macvlan wanlink(O) pktgen nfsv3 nfs_acl nfsv4 auth_rpcgss nfs
> > fscache lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
> > w83793 iscsi_tcp w83627hf libiscsi_tcp hwmon_vid libiscsi scsi_transport_iscsi coretemp mperf kvm_intel kvm i5k_amb uinput i5000_edac gpio_ich edac_core
> > iTCO_wdt e1000e iTCO_vendor_support lpc_ich i2c_i801 pcspkr ioatdma dca microcode shpchp ipv6 floppy radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core
> > [last unloaded: iptable_nat]
> > CPU 6
> > Pid: 6610, comm: umount Tainted: G C O 3.7.10+ #74 Supermicro X7DBU/X7DBU
> > RIP: 0010:[<ffffffff811591c9>] [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
> > RSP: 0018:ffff8800c0085dc8 EFLAGS: 00010296
> > RAX: 0000000000000062 RBX: ffff8800c07e43c0 RCX: 0000000000000059
> > RDX: ffffffff81bc25a8 RSI: 0000000000000046 RDI: 0000000000000246
> > RBP: ffff8800c0085de8 R08: 0000000000000001 R09: ffff8800c0085cc8
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c050e9c0
> > R13: ffff880128ee8000 R14: 0000000000000000 R15: ffff8800c0085f28
> > FS: 00007f6084847840(0000) GS:ffff88012fd80000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f608442c3a0 CR3: 00000000c7c2d000 CR4: 00000000000007e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process umount (pid: 6610, threadinfo ffff8800c0084000, task ffff88012a6a0000)
> > Stack:
> > ffff880128ee8310 00000000000128c0 ffff880128ee8000 ffffffffa06b96b0
> > ffff8800c0085e08 ffffffff81159310 ffff8800c0084000 ffff880128ee8000
> > ffff8800c0085e38 ffffffff81149afe ffff8800c0085e38 0000000000000021
> > Call Trace:
> > [<ffffffff81159310>] shrink_dcache_for_umount+0x37/0x49
> > [<ffffffff81149afe>] generic_shutdown_super+0x20/0xd2
> > [<ffffffff81149c25>] kill_anon_super+0x11/0x1c
> > [<ffffffffa068b1ea>] cifs_kill_sb+0x15/0x21 [cifs]
> > [<ffffffff81149e48>] deactivate_locked_super+0x32/0x5e
> > [<ffffffff8114a942>] deactivate_super+0x40/0x46
> > [<ffffffff8115fdb3>] mntput_no_expire+0x12d/0x136
> > [<ffffffff81160b59>] sys_umount+0x321/0x34c
> > [<ffffffff8114f846>] ? path_put+0x1d/0x21
> > [<ffffffff81525229>] system_call_fastpath+0x16/0x1b
> > Code: 50 28 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 10 03 00 00 48 89 de 48 c7 c7 9d d0 7b 81 48 89 04 24 31 c0 e8 b9 4e 3c 00 <0f> 0b eb fe 4c 8b 63 18
> > 4c 39 e3 75 3c 48 8b 93 90 00 00 00 48
> > RIP [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
> > RSP <ffff8800c0085dc8>
> > ---[ end trace 9b2978a89532c292 ]---
>
> Hmmm...dentry leak. Are there any jobs queued to the cifsiod workqueue
> when the box oopses?
>

In fact...

It's just a guess, but does this patch help at all? Note that it builds
but is otherwise untested ;). If it works we might want to go with
something a bit less invasive but this may tell us if we're on the
right track.

------------------[snip]------------------

[PATCH] cifs: flush cifsiod workqueue before unmounting

It's possible that we'll end up with some writeback activity going on
while the fs is unmounted. If that happens we'll queue some work to a
workqueue to handle the last bit of cleanup. At that point though,
there's nothing pinning down the superblock, so it's possible to race
with a umount.

There will still be an active cifsFileInfo reference, and that will
hold a reference to a dentry. That, in turn, can cause a "Dentry still
in use" error. Flush the cifsiod workqueue before allowing a umount
to proceed.

Reported-by: Ben Greear <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/cifs/connect.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 991c63c..7840f3f 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -3815,6 +3815,7 @@ cifs_umount(struct cifs_sb_info *cifs_sb)
struct tcon_link *tlink;

cancel_delayed_work_sync(&cifs_sb->prune_tlinks);
+ flush_workqueue(cifsiod_wq);

spin_lock(&cifs_sb->tlink_tree_lock);
while ((node = rb_first(root))) {
--
1.7.11.7

2013-03-05 19:42:59

by Ben Greear

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

On 03/05/2013 11:22 AM, Jeff Layton wrote:
> On Tue, 5 Mar 2013 14:08:49 -0500
> Jeff Layton <[email protected]> wrote:
>
>> On Tue, 05 Mar 2013 10:54:56 -0800
>> Ben Greear <[email protected]> wrote:
>>
>>> In doing some CIFS testing (utilizing it's feature to bind to local
>>> address..but not sure that matters), we saw this error when trying
>>> to un-mount.
>>>
>>> Our kernel is patched (nfs, some networking related patches), but there
>>> are no out-of-kernel patches to CIFS, so I don't *think* this is anything
>>> we could have caused.
>>>
>>> This problem appears to be easily reproducible, so we will be happy
>>> to test patches if anyone has any suggestions.
>>>
>>> BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]
>>> ------------[ cut here ]------------
>>> kernel BUG at /home/greearb/git/linux-3.7.dev.y/fs/dcache.c:967!
>>> invalid opcode: 0000 [#1] PREEMPT SMP
>>> Modules linked in: nls_utf8 cifs 8021q garp stp llc iptable_raw xt_CT veth nf_nat_ipv4 nf_nat fuse macvlan wanlink(O) pktgen nfsv3 nfs_acl nfsv4 auth_rpcgss nfs
>>> fscache lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
>>> w83793 iscsi_tcp w83627hf libiscsi_tcp hwmon_vid libiscsi scsi_transport_iscsi coretemp mperf kvm_intel kvm i5k_amb uinput i5000_edac gpio_ich edac_core
>>> iTCO_wdt e1000e iTCO_vendor_support lpc_ich i2c_i801 pcspkr ioatdma dca microcode shpchp ipv6 floppy radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core
>>> [last unloaded: iptable_nat]
>>> CPU 6
>>> Pid: 6610, comm: umount Tainted: G C O 3.7.10+ #74 Supermicro X7DBU/X7DBU
>>> RIP: 0010:[<ffffffff811591c9>] [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
>>> RSP: 0018:ffff8800c0085dc8 EFLAGS: 00010296
>>> RAX: 0000000000000062 RBX: ffff8800c07e43c0 RCX: 0000000000000059
>>> RDX: ffffffff81bc25a8 RSI: 0000000000000046 RDI: 0000000000000246
>>> RBP: ffff8800c0085de8 R08: 0000000000000001 R09: ffff8800c0085cc8
>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c050e9c0
>>> R13: ffff880128ee8000 R14: 0000000000000000 R15: ffff8800c0085f28
>>> FS: 00007f6084847840(0000) GS:ffff88012fd80000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> CR2: 00007f608442c3a0 CR3: 00000000c7c2d000 CR4: 00000000000007e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process umount (pid: 6610, threadinfo ffff8800c0084000, task ffff88012a6a0000)
>>> Stack:
>>> ffff880128ee8310 00000000000128c0 ffff880128ee8000 ffffffffa06b96b0
>>> ffff8800c0085e08 ffffffff81159310 ffff8800c0084000 ffff880128ee8000
>>> ffff8800c0085e38 ffffffff81149afe ffff8800c0085e38 0000000000000021
>>> Call Trace:
>>> [<ffffffff81159310>] shrink_dcache_for_umount+0x37/0x49
>>> [<ffffffff81149afe>] generic_shutdown_super+0x20/0xd2
>>> [<ffffffff81149c25>] kill_anon_super+0x11/0x1c
>>> [<ffffffffa068b1ea>] cifs_kill_sb+0x15/0x21 [cifs]
>>> [<ffffffff81149e48>] deactivate_locked_super+0x32/0x5e
>>> [<ffffffff8114a942>] deactivate_super+0x40/0x46
>>> [<ffffffff8115fdb3>] mntput_no_expire+0x12d/0x136
>>> [<ffffffff81160b59>] sys_umount+0x321/0x34c
>>> [<ffffffff8114f846>] ? path_put+0x1d/0x21
>>> [<ffffffff81525229>] system_call_fastpath+0x16/0x1b
>>> Code: 50 28 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 10 03 00 00 48 89 de 48 c7 c7 9d d0 7b 81 48 89 04 24 31 c0 e8 b9 4e 3c 00 <0f> 0b eb fe 4c 8b 63 18
>>> 4c 39 e3 75 3c 48 8b 93 90 00 00 00 48
>>> RIP [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
>>> RSP <ffff8800c0085dc8>
>>> ---[ end trace 9b2978a89532c292 ]---
>>
>> Hmmm...dentry leak. Are there any jobs queued to the cifsiod workqueue
>> when the box oopses?
>>
>
> In fact...
>
> It's just a guess, but does this patch help at all? Note that it builds
> but is otherwise untested ;). If it works we might want to go with
> something a bit less invasive but this may tell us if we're on the
> right track.

This does not fix the problem, though possibly it is still
a correct fix for some other bug. Some more details on this test case:

We create 8 writer processes (which do one mount per thread), write some files.

Then, stop those, and un-mount.

Then, start 8 reader processes, which will create 8 mounts and then start
reading data.

Finally, stop these readers, which will stop the read IO calls and immediately
try to un-mount the the 8 mounts. These unmount attempts cause the bug.

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2013-03-05 21:09:27

by Jeff Layton

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

On Tue, 05 Mar 2013 11:42:46 -0800
Ben Greear <[email protected]> wrote:

> On 03/05/2013 11:22 AM, Jeff Layton wrote:
> > On Tue, 5 Mar 2013 14:08:49 -0500
> > Jeff Layton <[email protected]> wrote:
> >
> >> On Tue, 05 Mar 2013 10:54:56 -0800
> >> Ben Greear <[email protected]> wrote:
> >>
> >>> In doing some CIFS testing (utilizing it's feature to bind to local
> >>> address..but not sure that matters), we saw this error when trying
> >>> to un-mount.
> >>>
> >>> Our kernel is patched (nfs, some networking related patches), but there
> >>> are no out-of-kernel patches to CIFS, so I don't *think* this is anything
> >>> we could have caused.
> >>>
> >>> This problem appears to be easily reproducible, so we will be happy
> >>> to test patches if anyone has any suggestions.
> >>>
> >>> BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]
> >>> ------------[ cut here ]------------
> >>> kernel BUG at /home/greearb/git/linux-3.7.dev.y/fs/dcache.c:967!
> >>> invalid opcode: 0000 [#1] PREEMPT SMP
> >>> Modules linked in: nls_utf8 cifs 8021q garp stp llc iptable_raw xt_CT veth nf_nat_ipv4 nf_nat fuse macvlan wanlink(O) pktgen nfsv3 nfs_acl nfsv4 auth_rpcgss nfs
> >>> fscache lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
> >>> w83793 iscsi_tcp w83627hf libiscsi_tcp hwmon_vid libiscsi scsi_transport_iscsi coretemp mperf kvm_intel kvm i5k_amb uinput i5000_edac gpio_ich edac_core
> >>> iTCO_wdt e1000e iTCO_vendor_support lpc_ich i2c_i801 pcspkr ioatdma dca microcode shpchp ipv6 floppy radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core
> >>> [last unloaded: iptable_nat]
> >>> CPU 6
> >>> Pid: 6610, comm: umount Tainted: G C O 3.7.10+ #74 Supermicro X7DBU/X7DBU
> >>> RIP: 0010:[<ffffffff811591c9>] [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
> >>> RSP: 0018:ffff8800c0085dc8 EFLAGS: 00010296
> >>> RAX: 0000000000000062 RBX: ffff8800c07e43c0 RCX: 0000000000000059
> >>> RDX: ffffffff81bc25a8 RSI: 0000000000000046 RDI: 0000000000000246
> >>> RBP: ffff8800c0085de8 R08: 0000000000000001 R09: ffff8800c0085cc8
> >>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c050e9c0
> >>> R13: ffff880128ee8000 R14: 0000000000000000 R15: ffff8800c0085f28
> >>> FS: 00007f6084847840(0000) GS:ffff88012fd80000(0000) knlGS:0000000000000000
> >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >>> CR2: 00007f608442c3a0 CR3: 00000000c7c2d000 CR4: 00000000000007e0
> >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>> Process umount (pid: 6610, threadinfo ffff8800c0084000, task ffff88012a6a0000)
> >>> Stack:
> >>> ffff880128ee8310 00000000000128c0 ffff880128ee8000 ffffffffa06b96b0
> >>> ffff8800c0085e08 ffffffff81159310 ffff8800c0084000 ffff880128ee8000
> >>> ffff8800c0085e38 ffffffff81149afe ffff8800c0085e38 0000000000000021
> >>> Call Trace:
> >>> [<ffffffff81159310>] shrink_dcache_for_umount+0x37/0x49
> >>> [<ffffffff81149afe>] generic_shutdown_super+0x20/0xd2
> >>> [<ffffffff81149c25>] kill_anon_super+0x11/0x1c
> >>> [<ffffffffa068b1ea>] cifs_kill_sb+0x15/0x21 [cifs]
> >>> [<ffffffff81149e48>] deactivate_locked_super+0x32/0x5e
> >>> [<ffffffff8114a942>] deactivate_super+0x40/0x46
> >>> [<ffffffff8115fdb3>] mntput_no_expire+0x12d/0x136
> >>> [<ffffffff81160b59>] sys_umount+0x321/0x34c
> >>> [<ffffffff8114f846>] ? path_put+0x1d/0x21
> >>> [<ffffffff81525229>] system_call_fastpath+0x16/0x1b
> >>> Code: 50 28 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 10 03 00 00 48 89 de 48 c7 c7 9d d0 7b 81 48 89 04 24 31 c0 e8 b9 4e 3c 00 <0f> 0b eb fe 4c 8b 63 18
> >>> 4c 39 e3 75 3c 48 8b 93 90 00 00 00 48
> >>> RIP [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
> >>> RSP <ffff8800c0085dc8>
> >>> ---[ end trace 9b2978a89532c292 ]---
> >>
> >> Hmmm...dentry leak. Are there any jobs queued to the cifsiod workqueue
> >> when the box oopses?
> >>
> >
> > In fact...
> >
> > It's just a guess, but does this patch help at all? Note that it builds
> > but is otherwise untested ;). If it works we might want to go with
> > something a bit less invasive but this may tell us if we're on the
> > right track.
>
> This does not fix the problem, though possibly it is still
> a correct fix for some other bug. Some more details on this test case:
>
> We create 8 writer processes (which do one mount per thread), write some files.
>
> Then, stop those, and un-mount.
>
> Then, start 8 reader processes, which will create 8 mounts and then start
> reading data.
>
> Finally, stop these readers, which will stop the read IO calls and immediately
> try to un-mount the the 8 mounts. These unmount attempts cause the bug.
>
> Thanks,
> Ben
>
>

Ok, thanks...it was worth a shot. I guess we'll have to track this down
the hard way then and try to figure out where the dentry leak is coming
from.

My guess would be that it's coming from the async read code path
somewhere. When you stop the processes doing the reading, how do you do
it? Are they sent a signal?

--
Jeff Layton <[email protected]>

2013-03-05 21:26:08

by Ben Greear

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

On 03/05/2013 01:09 PM, Jeff Layton wrote:
> On Tue, 05 Mar 2013 11:42:46 -0800
> Ben Greear <[email protected]> wrote:
>
>> On 03/05/2013 11:22 AM, Jeff Layton wrote:
>>> On Tue, 5 Mar 2013 14:08:49 -0500
>>> Jeff Layton <[email protected]> wrote:
>>>
>>>> On Tue, 05 Mar 2013 10:54:56 -0800
>>>> Ben Greear <[email protected]> wrote:
>>>>
>>>>> In doing some CIFS testing (utilizing it's feature to bind to local
>>>>> address..but not sure that matters), we saw this error when trying
>>>>> to un-mount.
>>>>>
>>>>> Our kernel is patched (nfs, some networking related patches), but there
>>>>> are no out-of-kernel patches to CIFS, so I don't *think* this is anything
>>>>> we could have caused.
>>>>>
>>>>> This problem appears to be easily reproducible, so we will be happy
>>>>> to test patches if anyone has any suggestions.
>>>>>
>>>>> BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]
>>>>> ------------[ cut here ]------------
>>>>> kernel BUG at /home/greearb/git/linux-3.7.dev.y/fs/dcache.c:967!
>>>>> invalid opcode: 0000 [#1] PREEMPT SMP
>>>>> Modules linked in: nls_utf8 cifs 8021q garp stp llc iptable_raw xt_CT veth nf_nat_ipv4 nf_nat fuse macvlan wanlink(O) pktgen nfsv3 nfs_acl nfsv4 auth_rpcgss nfs
>>>>> fscache lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
>>>>> w83793 iscsi_tcp w83627hf libiscsi_tcp hwmon_vid libiscsi scsi_transport_iscsi coretemp mperf kvm_intel kvm i5k_amb uinput i5000_edac gpio_ich edac_core
>>>>> iTCO_wdt e1000e iTCO_vendor_support lpc_ich i2c_i801 pcspkr ioatdma dca microcode shpchp ipv6 floppy radeon i2c_algo_bit hwmon drm_kms_helper ttm drm i2c_core
>>>>> [last unloaded: iptable_nat]
>>>>> CPU 6
>>>>> Pid: 6610, comm: umount Tainted: G C O 3.7.10+ #74 Supermicro X7DBU/X7DBU
>>>>> RIP: 0010:[<ffffffff811591c9>] [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
>>>>> RSP: 0018:ffff8800c0085dc8 EFLAGS: 00010296
>>>>> RAX: 0000000000000062 RBX: ffff8800c07e43c0 RCX: 0000000000000059
>>>>> RDX: ffffffff81bc25a8 RSI: 0000000000000046 RDI: 0000000000000246
>>>>> RBP: ffff8800c0085de8 R08: 0000000000000001 R09: ffff8800c0085cc8
>>>>> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800c050e9c0
>>>>> R13: ffff880128ee8000 R14: 0000000000000000 R15: ffff8800c0085f28
>>>>> FS: 00007f6084847840(0000) GS:ffff88012fd80000(0000) knlGS:0000000000000000
>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>> CR2: 00007f608442c3a0 CR3: 00000000c7c2d000 CR4: 00000000000007e0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>> Process umount (pid: 6610, threadinfo ffff8800c0084000, task ffff88012a6a0000)
>>>>> Stack:
>>>>> ffff880128ee8310 00000000000128c0 ffff880128ee8000 ffffffffa06b96b0
>>>>> ffff8800c0085e08 ffffffff81159310 ffff8800c0084000 ffff880128ee8000
>>>>> ffff8800c0085e38 ffffffff81149afe ffff8800c0085e38 0000000000000021
>>>>> Call Trace:
>>>>> [<ffffffff81159310>] shrink_dcache_for_umount+0x37/0x49
>>>>> [<ffffffff81149afe>] generic_shutdown_super+0x20/0xd2
>>>>> [<ffffffff81149c25>] kill_anon_super+0x11/0x1c
>>>>> [<ffffffffa068b1ea>] cifs_kill_sb+0x15/0x21 [cifs]
>>>>> [<ffffffff81149e48>] deactivate_locked_super+0x32/0x5e
>>>>> [<ffffffff8114a942>] deactivate_super+0x40/0x46
>>>>> [<ffffffff8115fdb3>] mntput_no_expire+0x12d/0x136
>>>>> [<ffffffff81160b59>] sys_umount+0x321/0x34c
>>>>> [<ffffffff8114f846>] ? path_put+0x1d/0x21
>>>>> [<ffffffff81525229>] system_call_fastpath+0x16/0x1b
>>>>> Code: 50 28 4c 8b 0a 31 d2 48 85 f6 74 04 48 8b 56 40 48 05 10 03 00 00 48 89 de 48 c7 c7 9d d0 7b 81 48 89 04 24 31 c0 e8 b9 4e 3c 00 <0f> 0b eb fe 4c 8b 63 18
>>>>> 4c 39 e3 75 3c 48 8b 93 90 00 00 00 48
>>>>> RIP [<ffffffff811591c9>] shrink_dcache_for_umount_subtree+0x84/0x194
>>>>> RSP <ffff8800c0085dc8>
>>>>> ---[ end trace 9b2978a89532c292 ]---
>>>>
>>>> Hmmm...dentry leak. Are there any jobs queued to the cifsiod workqueue
>>>> when the box oopses?
>>>>
>>>
>>> In fact...
>>>
>>> It's just a guess, but does this patch help at all? Note that it builds
>>> but is otherwise untested ;). If it works we might want to go with
>>> something a bit less invasive but this may tell us if we're on the
>>> right track.
>>
>> This does not fix the problem, though possibly it is still
>> a correct fix for some other bug. Some more details on this test case:
>>
>> We create 8 writer processes (which do one mount per thread), write some files.
>>
>> Then, stop those, and un-mount.
>>
>> Then, start 8 reader processes, which will create 8 mounts and then start
>> reading data.
>>
>> Finally, stop these readers, which will stop the read IO calls and immediately
>> try to un-mount the the 8 mounts. These unmount attempts cause the bug.
>>
>> Thanks,
>> Ben
>>
>>
>
> Ok, thanks...it was worth a shot. I guess we'll have to track this down
> the hard way then and try to figure out where the dentry leak is coming
> from.
>
> My guess would be that it's coming from the async read code path
> somewhere. When you stop the processes doing the reading, how do you do
> it? Are they sent a signal?

It should be a clean stop (process: receives command over tcp socket asking for stop,
closes sockets, calls script to unmount, exit).

We can work on trying to reproduce this with a less complicated
framework, but might be tomorrow before we can get on that.

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2013-03-07 19:19:22

by Mateusz Guzik

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

On Tue, Mar 05, 2013 at 10:54:56AM -0800, Ben Greear wrote:
> In doing some CIFS testing (utilizing it's feature to bind to local
> address..but not sure that matters), we saw this error when trying
> to un-mount.
>
> Our kernel is patched (nfs, some networking related patches), but there
> are no out-of-kernel patches to CIFS, so I don't *think* this is anything
> we could have caused.
>
> This problem appears to be easily reproducible, so we will be happy
> to test patches if anyone has any suggestions.
>
> BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]

We encountered similar panic, but it was related to writes. In our case
the problem was that some cifsInodeInfo holding dentry references were
still around (in slow-work queue) during superblock destruction in unmount.

I reproduced the problem on 3.8 kernel (sorry, no 3.7 handy) with reads
as well, which should match your scenario.

I attached a patch that deals with this problem by grabbing refcounts to
cifs superblock on cifsInodeInfo creation. This delays sb destruction
until all cifsInodeInfos are gone. I didn't test it on 3.7.10 kernel but
it should work fine.

--
Mateusz Guzik


Attachments:
(No filename) (1.18 kB)
cifs-umount-dentry-panic-upstream.diff (2.30 kB)
cifs-umount-dentry-panic-3.7.10.diff (2.32 kB)
Download all attachments

2013-03-07 22:18:22

by Ben Greear

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

On 03/07/2013 11:19 AM, Mateusz Guzik wrote:
> On Tue, Mar 05, 2013 at 10:54:56AM -0800, Ben Greear wrote:
>> In doing some CIFS testing (utilizing it's feature to bind to local
>> address..but not sure that matters), we saw this error when trying
>> to un-mount.
>>
>> Our kernel is patched (nfs, some networking related patches), but there
>> are no out-of-kernel patches to CIFS, so I don't *think* this is anything
>> we could have caused.
>>
>> This problem appears to be easily reproducible, so we will be happy
>> to test patches if anyone has any suggestions.
>>
>> BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]
>
> We encountered similar panic, but it was related to writes. In our case
> the problem was that some cifsInodeInfo holding dentry references were
> still around (in slow-work queue) during superblock destruction in unmount.
>
> I reproduced the problem on 3.8 kernel (sorry, no 3.7 handy) with reads
> as well, which should match your scenario.
>
> I attached a patch that deals with this problem by grabbing refcounts to
> cifs superblock on cifsInodeInfo creation. This delays sb destruction
> until all cifsInodeInfos are gone. I didn't test it on 3.7.10 kernel but
> it should work fine.

The 3.7.10 patch applied cleanly and appears to fix our problem.

Thanks so much for the patch!

Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2013-03-07 23:37:14

by Jeff Layton

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

On Thu, 7 Mar 2013 20:19:15 +0100
Mateusz Guzik <[email protected]> wrote:

> On Tue, Mar 05, 2013 at 10:54:56AM -0800, Ben Greear wrote:
> > In doing some CIFS testing (utilizing it's feature to bind to local
> > address..but not sure that matters), we saw this error when trying
> > to un-mount.
> >
> > Our kernel is patched (nfs, some networking related patches), but there
> > are no out-of-kernel patches to CIFS, so I don't *think* this is anything
> > we could have caused.
> >
> > This problem appears to be easily reproducible, so we will be happy
> > to test patches if anyone has any suggestions.
> >
> > BUG: Dentry ffff8800c07e43c0{i=45762,n=cifs2-01.7.lf-data} still in use (1) [unmount of cifs cifs]
>
> We encountered similar panic, but it was related to writes. In our case
> the problem was that some cifsInodeInfo holding dentry references were
> still around (in slow-work queue) during superblock destruction in unmount.
>
> I reproduced the problem on 3.8 kernel (sorry, no 3.7 handy) with reads
> as well, which should match your scenario.
>
> I attached a patch that deals with this problem by grabbing refcounts to
> cifs superblock on cifsInodeInfo creation. This delays sb destruction
> until all cifsInodeInfos are gone. I didn't test it on 3.7.10 kernel but
> it should work fine.
>

I think you mean cifsFileInfo instead of cifsInodeInfo above, but that
otherwise makes sense.

Mateusz, for future reference it's best to inline the patches in an
email instead of attaching them like this, since it makes it easier to
comment on them. You should also send them with a proper description at
the head and a Signed-off-by line.

One minor nit inline below...

> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index 1a052c0..345e76b 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -91,6 +91,24 @@ struct workqueue_struct *cifsiod_wq;
> __u8 cifs_client_guid[SMB2_CLIENT_GUID_SIZE];
> #endif
>
> +void
> +cifs_sb_active(struct super_block *sb)
> +{
> + struct cifs_sb_info *server = CIFS_SB(sb);
> +
> + if (atomic_inc_return(&server->active) == 1)
> + atomic_inc(&sb->s_active);
> +}
> +


The thing we have to be careful about here is whether we could ever
race with the atomic_dec_and_test in deactivate_locked_super. We should
only be creating a new cifsFileInfo in the context of an open-type
syscall, so bumping the s_active count unconditionally there should be
since we'll implicitly hold a reference to the vfsmount.

It might be worth a comment on cifs_sb_active to that effect though to
make sure no one makes that mistake in the future.

> +void
> +cifs_sb_deactive(struct super_block *sb)
> +{
> + struct cifs_sb_info *server = CIFS_SB(sb);
> +
> + if (atomic_dec_and_test(&server->active))
> + deactivate_super(sb);
> +}
> +
> static int
> cifs_read_super(struct super_block *sb)
> {
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index 7163419..0e32c34 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -41,6 +41,10 @@ extern struct file_system_type cifs_fs_type;
> extern const struct address_space_operations cifs_addr_ops;
> extern const struct address_space_operations cifs_addr_ops_smallbuf;
>
> +/* Functions related to super block operations */
> +extern void cifs_sb_active(struct super_block *sb);
> +extern void cifs_sb_deactive(struct super_block *sb);
> +
> /* Functions related to inodes */
> extern const struct inode_operations cifs_dir_inode_ops;
> extern struct inode *cifs_root_iget(struct super_block *);
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 8c0d855..7a0dd99 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -300,6 +300,8 @@ cifs_new_fileinfo(struct cifs_fid *fid, struct file *file,
> INIT_WORK(&cfile->oplock_break, cifs_oplock_break);
> mutex_init(&cfile->fh_mutex);
>
> + cifs_sb_active(inode->i_sb);
> +
> /*
> * If the server returned a read oplock and we have mandatory brlocks,
> * set oplock level to None.
> @@ -349,7 +351,8 @@ void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
> struct cifs_tcon *tcon = tlink_tcon(cifs_file->tlink);
> struct TCP_Server_Info *server = tcon->ses->server;
> struct cifsInodeInfo *cifsi = CIFS_I(inode);
> - struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
> + struct super_block *sb = inode->i_sb;
> + struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
> struct cifsLockInfo *li, *tmp;
> struct cifs_fid fid;
> struct cifs_pending_open open;
> @@ -414,6 +417,7 @@ void cifsFileInfo_put(struct cifsFileInfo *cifs_file)
>
> cifs_put_tlink(cifs_file->tlink);
> dput(cifs_file->dentry);
> + cifs_sb_deactive(sb);
> kfree(cifs_file);
> }
>

Other than the need for a comment there, this looks good to me. Nice
work. When you respin and resend, you can add:

Reviewed-by: Jeff Layton <[email protected]>

2017-03-16 07:18:42

by mayaoyao

[permalink] [raw]
Subject: Re: 3.7.10+: BUG Dentry still in use [unmount of cifs cifs]

could you tell us how to reproduce this panic? we sometimes meet this bug on
our OS use kernel 2.6.32 and do not know how to reproduce it. could you
teach us? thanks very much.



--
View this message in context: http://linux-kernel.2935.n7.nabble.com/3-7-10-BUG-Dentry-still-in-use-unmount-of-cifs-cifs-tp610918p1381845.html
Sent from the Linux Kernel mailing list archive at Nabble.com.