2017-06-30 19:11:40

by Andrei Vagin

[permalink] [raw]
Subject: Re: [CRIU] BUG: Dentry ffff9f795a08fe60{i=af565f, n=lo} still in use (1) [unmount of proc proc]

On Thu, Jun 29, 2017 at 08:42:23PM -0500, Eric W. Biederman wrote:
> Andrei Vagin <[email protected]> writes:
>
> > On Thu, Jun 29, 2017 at 12:06 PM, Eric W. Biederman
> > <[email protected]> wrote:
> >> Andrei Vagin <[email protected]> writes:
> >>
> >>> Hello,
> >>>
> >>> We run CRIU tests on linus' tree and today we found this issue.
> >>>
> >>> CRIU tests are the set of small programs to check checkpoint/restore
> >>> of different primitives (files, sockets, signals, pipes, etc).
> >>> https://github.com/xemul/criu/tree/master/test
> >>>
> >>> Each test is executed three times: without namespaces, in a set of all
> >>> namespaces except userns, in a set of all namespaces. When a test
> >>> passed the preparation tests, it sends a signal to an executer, and
> >>> then the executer dumps and restores tests processes, and sends a
> >>> signal to the test back to check that everything are restored
> >>> correctly.
> >>
> >> I am not certain what you are saying, and you seem to have Cc'd
> >> every list except the netdev and netfilter lists that are needed
> >> to deal with this.
> >>
> >> Are you saing that the change from Liping Zhang is needed? Or are you
> >> saying that change introduces the problem below?
> >
> > Hi Eric,
> >
> > Here I tried to explain our usecase. I don't know which changes in the
> > kernel affect this issue.
> >
> > Actually I reported about the similar problem a few month ago on the linux-next:
> > https://lkml.org/lkml/2017/3/10/1586
> >
> > So I don't think that the change from Liping Zhang affects this issue
> > somehow. I mentioned it just to describe what kernel we used.
> >
> > And I don't know how to reproduce the issue. You can see from the
> > kernel log, that the kernel worked for more than 6 hours in out case.
> > During this perioud we run all our tests a few times, so I think there
> > is a kind of race.
> >
> >>
> >> I could not find the mentioned commits. Are the in Linus's tree or
> >> someone's next tree that feeds into linux-next?
> >
> > Here is the patch from Liping Zhang
> > https://patchwork.ozlabs.org/patch/770887/
> >
> > The second mentioned commit is HEAD of the master branch in Linus' tree:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6474924e2b5ddb0030c355558966adcbe3b49022
>
> Apologies I somehow thought that g in the kernel version you mentioned
> was part of the commit id and thus I could not find it. Sigh.
>
> Ok so with Linus's tree and that one extra patch from Liping Zhang you
> have kernel problems sometimes.
>
> The warning and the oops combined are quite suggestive of what is going
> on. It does sound like while the pid namespace is being unregistered
> something under /proc/sys/net/ipv4/conf/<if>/... is being accessed
> and keeping the inode busy.
>
> Which then leads to an oops when the network namespace is being cleaned
> up later, as it tries to purge all of the inodes.
>
> Which raises the question how in the world can the count of the
> superblock drop to zero with an inode in use.
>
> As devinet is where things go strange this does seem completely
> independent of what Liping Zhang was looking at.
>
> This does smell like a bug in the generic code. Hmm.
>
> Is this consistently reproducible when you run your tests...

I'm not sure about that. I'm going to do some experiments to understand
how often it is reproduced on our test system, and then will try to
revert the patch from Konstantin.

Thanks!

>
> I think the following change where it uses igrab has the possibility
> of increasing the inode count while a pid namespace is being shut down.
> So it is worth a good hard look and fixes and possibly a revert if we
> can prove that this is the issue.
>
> commit d6cffbbe9a7e51eb705182965a189457c17ba8a3
> Author: Konstantin Khlebnikov <[email protected]>
> Date: Fri Feb 10 10:35:02 2017 +0300
>
> proc/sysctl: prune stale dentries during unregistering
>
> Eric
>
>
> > Thanks,
> > Andrei
> >
> >>
> >> Eric
> >>
> >>> [root@zdtm linux]# git describe HEAD~1
> >>> v4.12-rc7-25-g6474924
> >>>
> >>> And there is one more patch from the netfilter tree:
> >>> commit b216759c0cb5d37d1eec3cd5b67ba38bace94fd8
> >>> Author: Liping Zhang
> >>> Date: Sun Jun 4 19:17:34 2017 +0800
> >>>
> >>> netfilter: nf_ct_dccp/sctp: fix memory leak after netns cleanup
> >>>
> >>> [root@zdtm linux]# uname -a
> >>> Linux zdtm.openvz.org 4.12.0-rc7+ #9 SMP Thu Jun 29 08:28:18 CEST 2017
> >>> x86_64 x86_64 x86_64 GNU/Linux
> >>>
> >>> A kernel config is attached.
> >>>
> >>>
> >>> [22458.504137] BUG: Dentry ffff9f795a08fe60{i=af565f,n=lo} still in
> >>> use (1) [unmount of proc proc]
> >>> [22458.505117] ------------[ cut here ]------------
> >>> [22458.505299] WARNING: CPU: 0 PID: 15036 at fs/dcache.c:1445
> >>> umount_check+0x66/0x80
> >>> [22458.505564] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
> >>> sunrpc macvlan tun veth nf_conntrack_netlink xt_mark udp_diag tcp_diag
> >>> inet_diag netlink_diag af_packet_diag unix_diag binfmt_misc
> >>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
> >>> ebtable_broute bridge stp llc ebtable_nat ip6table_security
> >>> ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6
> >>> nf_defrag_ipv6 nf_nat_ipv6 iptable_security iptable_mangle iptable_raw
> >>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> >>> nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables btrfs
> >>> xor raid6_pq loop ppdev crct10dif_pclmul crc32_pclmul
> >>> ghash_clmulni_intel lpc_ich sbs virtio_balloon shpchp sbshc parport_pc
> >>> parport tpm_tis tpm_tis_core tpm xfs libcrc32c crc32c_intel
> >>> ata_generic serio_raw pata_acpi
> >>> [22458.507586] virtio_pci virtio virtio_ring e1000
> >>> [22458.507771] CPU: 0 PID: 15036 Comm: kworker/0:2 Not tainted 4.12.0-rc7+ #9
> >>> [22458.507830] systemd-journald[561]: Compressed data object 807 ->
> >>> 605 using LZ4
> >>> [22458.508184] Hardware name: Parallels Software International Inc.
> >>> Parallels Virtual Platform/Parallels Virtual Platform, BIOS
> >>> 6.12.26068.1232434 02/27/2017
> >>> [22458.508641] Workqueue: events proc_cleanup_work
> >>> [22458.508848] task: ffff9f797be80000 task.stack: ffffb2b8825f0000
> >>> [22458.509033] RIP: 0010:umount_check+0x66/0x80
> >>> [22458.509172] RSP: 0018:ffffb2b8825f3c68 EFLAGS: 00010286
> >>> [22458.509363] RAX: 0000000000000054 RBX: ffff9f798492afa0 RCX: 0000000000000000
> >>> [22458.509589] RDX: 0000000000000000 RSI: ffff9f79bce0e388 RDI: ffff9f79bce0e388
> >>> [22458.509823] RBP: ffffb2b8825f3c70 R08: 0000000000000000 R09: 0000000000000000
> >>> [22458.510022] R10: ffffb2b8825f3bf0 R11: 0000000000000000 R12: ffff9f795a08fe60
> >>> [22458.510217] R13: ffff9f795a08ff50 R14: ffff9f795a08fee0 R15: ffff9f798492afa0
> >>> [22458.510456] FS: 0000000000000000(0000) GS:ffff9f79bce00000(0000)
> >>> knlGS:0000000000000000
> >>> [22458.510726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [22458.510953] CR2: 00007f834657a6b0 CR3: 00000000602a3000 CR4: 00000000000006f0
> >>> [22458.511163] Call Trace:
> >>> [22458.511289] d_walk+0x144/0x510
> >>> [22458.511442] ? detach_and_collect+0x30/0x30
> >>> [22458.511620] do_one_tree+0x26/0x40
> >>> [22458.511793] shrink_dcache_for_umount+0x2d/0x90
> >>> [22458.511954] generic_shutdown_super+0x1f/0x120
> >>> [22458.512105] kill_anon_super+0x12/0x20
> >>> [22458.512243] proc_kill_sb+0x40/0x50
> >>> [22458.512406] deactivate_locked_super+0x43/0x70
> >>> [22458.512576] deactivate_super+0x4e/0x60
> >>> [22458.512731] cleanup_mnt+0x3f/0x80
> >>> [22458.512890] mntput_no_expire+0x2fc/0x3b0
> >>> [22458.513028] kern_unmount+0x3e/0x50
> >>> [22458.513153] pid_ns_release_proc+0x15/0x20
> >>> [22458.513309] proc_cleanup_work+0x15/0x20
> >>> [22458.513469] process_one_work+0x205/0x5d0
> >>> [22458.513635] worker_thread+0x4e/0x3a0
> >>> [22458.513809] kthread+0x109/0x140
> >>> [22458.513929] ? process_one_work+0x5d0/0x5d0
> >>> [22458.514067] ? kthread_create_on_node+0x70/0x70
> >>> [22458.514217] ? prepare_exit_to_usermode+0x7e/0x80
> >>> [22458.514400] ret_from_fork+0x2a/0x40
> >>> [22458.514550] Code: 48 8d 88 50 06 00 00 48 8b 40 28 4c 8b 08 48 8b
> >>> 46 58 48 85 c0 74 1b 48 8b 50 40 51 48 c7 c7 60 f9 c9 95 48 89 f1 e8
> >>> 99 39 f3 ff <0f> ff 58 31 c0 c9 c3 31 d2 eb e5 0f 1f 44 00 00 66 2e 0f
> >>> 1f 84
> >>> [22458.515141] ---[ end trace b37db95b00f941ab ]---
> >>> [22458.519368] VFS: Busy inodes after unmount of proc. Self-destruct
> >>> in 5 seconds. Have a nice day...
> >>> [22458.813846] BUG: unable to handle kernel NULL pointer dereference
> >>> at 0000000000000018
> >>> [22458.814081] IP: __lock_acquire+0x127/0x12f0
> >>> [22458.814213] PGD 5ce4e067
> >>> [22458.814213] P4D 5ce4e067
> >>> [22458.814340] PUD 5ce4f067
> >>> [22458.814450] PMD 0
> >>>
> >>> [22458.814724] Oops: 0000 [#1] SMP
> >>> [22458.814844] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
> >>> sunrpc macvlan tun veth nf_conntrack_netlink xt_mark udp_diag tcp_diag
> >>> inet_diag netlink_diag af_packet_diag unix_diag binfmt_misc
> >>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
> >>> ebtable_broute bridge stp llc ebtable_nat ip6table_security
> >>> ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6
> >>> nf_defrag_ipv6 nf_nat_ipv6 iptable_security iptable_mangle iptable_raw
> >>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> >>> nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables btrfs
> >>> xor raid6_pq loop ppdev crct10dif_pclmul crc32_pclmul
> >>> ghash_clmulni_intel lpc_ich sbs virtio_balloon shpchp sbshc parport_pc
> >>> parport tpm_tis tpm_tis_core tpm xfs libcrc32c crc32c_intel
> >>> ata_generic serio_raw pata_acpi
> >>> [22458.816586] virtio_pci virtio virtio_ring e1000
> >>> [22458.816741] CPU: 0 PID: 31415 Comm: kworker/u64:3 Tainted: G
> >>> W 4.12.0-rc7+ #9
> >>> [22458.816972] Hardware name: Parallels Software International Inc.
> >>> Parallels Virtual Platform/Parallels Virtual Platform, BIOS
> >>> 6.12.26068.1232434 02/27/2017
> >>> [22458.817355] Workqueue: netns cleanup_net
> >>> [22458.817500] task: ffff9f799f278000 task.stack: ffffb2b882934000
> >>> [22458.817676] RIP: 0010:__lock_acquire+0x127/0x12f0
> >>> [22458.817826] RSP: 0018:ffffb2b882937950 EFLAGS: 00010002
> >>> [22458.817995] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
> >>> [22458.818192] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000018
> >>> [22458.818422] RBP: ffffb2b882937a10 R08: ffffffff95237bca R09: 0000000000000000
> >>> [22458.818637] R10: 0000000000000000 R11: ffff9f799f278000 R12: 0000000000000001
> >>> [22458.818846] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000018
> >>> [22458.819046] FS: 0000000000000000(0000) GS:ffff9f79bce00000(0000)
> >>> knlGS:0000000000000000
> >>> [22458.819264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [22458.819468] CR2: 0000000000000018 CR3: 000000003f0cd000 CR4: 00000000000006f0
> >>> [22458.819669] Call Trace:
> >>> [22458.819784] ? __lock_acquire+0x39e/0x12f0
> >>> [22458.819919] lock_acquire+0xd5/0x1c0
> >>> [22458.820043] ? lock_acquire+0xd5/0x1c0
> >>> [22458.820174] ? list_lru_del+0x5a/0x170
> >>> [22458.820331] _raw_spin_lock+0x28/0x40
> >>> [22458.820475] ? list_lru_del+0x5a/0x170
> >>> [22458.820611] list_lru_del+0x5a/0x170
> >>> [22458.820748] d_lru_del+0x44/0x60
> >>> [22458.820870] __dentry_kill+0x149/0x180
> >>> [22458.820998] d_prune_aliases+0x5e/0x200
> >>> [22458.821130] drop_sysctl_table+0x135/0x310
> >>> [22458.821270] ? trace_hardirqs_on_caller+0xf5/0x190
> >>> [22458.821454] ? trace_hardirqs_on+0xd/0x10
> >>> [22458.821597] drop_sysctl_table+0x212/0x310
> >>> [22458.821741] ? trace_hardirqs_on_caller+0xf5/0x190
> >>> [22458.821898] ? trace_hardirqs_on+0xd/0x10
> >>> [22458.822034] drop_sysctl_table+0x212/0x310
> >>> [22458.822172] ? unregister_sysctl_table+0x36/0x80
> >>> [22458.822343] unregister_sysctl_table+0x3e/0x80
> >>> [22458.822507] unregister_net_sysctl_table+0xe/0x10
> >>> [22458.822664] __devinet_sysctl_unregister.isra.25+0x2b/0x50
> >>> [22458.822847] devinet_exit_net+0x47/0x70
> >>> [22458.822976] ops_exit_list.isra.4+0x38/0x60
> >>> [22458.823116] cleanup_net+0x1ee/0x2f0
> >>> [22458.823243] process_one_work+0x205/0x5d0
> >>> [22458.823405] worker_thread+0x4e/0x3a0
> >>> [22458.823549] kthread+0x109/0x140
> >>> [22458.823667] ? process_one_work+0x5d0/0x5d0
> >>> [22458.823819] ? kthread_create_on_node+0x70/0x70
> >>> [22458.823976] ret_from_fork+0x2a/0x40
> >>> [22458.824101] Code: c8 65 48 33 3c 25 28 00 00 00 44 89 f0 0f 85 2b
> >>> 0d 00 00 48 81 c4 90 00 00 00 5b 41 5a 41 5c 41 5d 41 5e 41 5f 5d 49
> >>> 8d 62 f8 c3 <49> 81 3f e0 38 44 96 41 bc 00 00 00 00 44 0f 45 e2 83 fe
> >>> 01 0f
> >>> [22458.824634] RIP: __lock_acquire+0x127/0x12f0 RSP: ffffb2b882937950
> >>> [22458.824821] CR2: 0000000000000018
> >>> [22458.824943] ---[ end trace b37db95b00f941ac ]---
> _______________________________________________
> CRIU mailing list
> [email protected]
> https://lists.openvz.org/mailman/listinfo/criu


2017-07-03 16:36:58

by Andrei Vagin

[permalink] [raw]
Subject: Re: [CRIU] BUG: Dentry ffff9f795a08fe60{i=af565f, n=lo} still in use (1) [unmount of proc proc]

On Fri, Jun 30, 2017 at 12:11:07PM -0700, Andrei Vagin wrote:
> On Thu, Jun 29, 2017 at 08:42:23PM -0500, Eric W. Biederman wrote:
> > Andrei Vagin <[email protected]> writes:
> >
> > > On Thu, Jun 29, 2017 at 12:06 PM, Eric W. Biederman
> > > <[email protected]> wrote:
> > >> Andrei Vagin <[email protected]> writes:
> > >>
> > >>> Hello,
> > >>>
> > >>> We run CRIU tests on linus' tree and today we found this issue.
> > >>>
> > >>> CRIU tests are the set of small programs to check checkpoint/restore
> > >>> of different primitives (files, sockets, signals, pipes, etc).
> > >>> https://github.com/xemul/criu/tree/master/test
> > >>>
> > >>> Each test is executed three times: without namespaces, in a set of all
> > >>> namespaces except userns, in a set of all namespaces. When a test
> > >>> passed the preparation tests, it sends a signal to an executer, and
> > >>> then the executer dumps and restores tests processes, and sends a
> > >>> signal to the test back to check that everything are restored
> > >>> correctly.
> > >>
> > >> I am not certain what you are saying, and you seem to have Cc'd
> > >> every list except the netdev and netfilter lists that are needed
> > >> to deal with this.
> > >>
> > >> Are you saing that the change from Liping Zhang is needed? Or are you
> > >> saying that change introduces the problem below?
> > >
> > > Hi Eric,
> > >
> > > Here I tried to explain our usecase. I don't know which changes in the
> > > kernel affect this issue.
> > >
> > > Actually I reported about the similar problem a few month ago on the linux-next:
> > > https://lkml.org/lkml/2017/3/10/1586
> > >
> > > So I don't think that the change from Liping Zhang affects this issue
> > > somehow. I mentioned it just to describe what kernel we used.
> > >
> > > And I don't know how to reproduce the issue. You can see from the
> > > kernel log, that the kernel worked for more than 6 hours in out case.
> > > During this perioud we run all our tests a few times, so I think there
> > > is a kind of race.
> > >
> > >>
> > >> I could not find the mentioned commits. Are the in Linus's tree or
> > >> someone's next tree that feeds into linux-next?
> > >
> > > Here is the patch from Liping Zhang
> > > https://patchwork.ozlabs.org/patch/770887/
> > >
> > > The second mentioned commit is HEAD of the master branch in Linus' tree:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6474924e2b5ddb0030c355558966adcbe3b49022
> >
> > Apologies I somehow thought that g in the kernel version you mentioned
> > was part of the commit id and thus I could not find it. Sigh.
> >
> > Ok so with Linus's tree and that one extra patch from Liping Zhang you
> > have kernel problems sometimes.
> >
> > The warning and the oops combined are quite suggestive of what is going
> > on. It does sound like while the pid namespace is being unregistered
> > something under /proc/sys/net/ipv4/conf/<if>/... is being accessed
> > and keeping the inode busy.
> >
> > Which then leads to an oops when the network namespace is being cleaned
> > up later, as it tries to purge all of the inodes.
> >
> > Which raises the question how in the world can the count of the
> > superblock drop to zero with an inode in use.
> >
> > As devinet is where things go strange this does seem completely
> > independent of what Liping Zhang was looking at.
> >
> > This does smell like a bug in the generic code. Hmm.
> >
> > Is this consistently reproducible when you run your tests...
>
> I'm not sure about that. I'm going to do some experiments to understand
> how often it is reproduced on our test system, and then will try to
> revert the patch from Konstantin.

I did a few experiments and found that the bug is reproduced for 6-12
hours on the our test server. Then I reverted two patches and the server
is working normally for more than 24 hours already, so the bug is
probably in one of these patches.

commit e3d0065ab8535cbeee69a4c46a59f4d7360803ae
Author: Andrei Vagin <[email protected]>
Date: Sun Jul 2 07:41:25 2017 +0200

Revert "proc/sysctl: prune stale dentries during unregistering"

This reverts commit d6cffbbe9a7e51eb705182965a189457c17ba8a3.

commit 2d3c50dac81011c1da4d2f7a63b84bd75287e320
Author: Andrei Vagin <[email protected]>
Date: Sun Jul 2 07:40:08 2017 +0200

Revert "proc/sysctl: Don't grab i_lock under sysctl_lock."

This reverts commit ace0c791e6c3cf5ef37cad2df69f0d90ccc40ffb.


FYI: This bug has been reproduced on 4.11.7
[192885.875105] BUG: Dentry ffff895a3dd01240{i=4e7c09a,n=lo} still in use (1) [unmount of proc proc]
[192885.875313] ------------[ cut here ]------------
[192885.875328] WARNING: CPU: 1 PID: 13588 at fs/dcache.c:1445 umount_check+0x6e/0x80
[192885.875331] Modules linked in: veth tun macvlan nf_conntrack_netlink xt_mark udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_security iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep sunrpc edac_mce_amd edac_core kvm_amd btusb kvm btrtl btbcm btintel bluetooth snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel iwlwifi snd_hda_codec cfg80211 irqbypass crct10dif_pclmul
[192885.875441] snd_hda_core crc32_pclmul snd_pcsp ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device snd_pcm sp5100_tco k10temp fam15h_power i2c_piix4 snd_timer snd rfkill ccp soundcore shpchp acpi_cpufreq tpm_tis video tpm_tis_core tpm amdkfd amd_iommu_v2 radeon crc32c_intel drm_kms_helper sdhci_pci sdhci ttm mmc_core igb drm ptp pps_core dca i2c_algo_bit
[192885.875512] CPU: 1 PID: 13588 Comm: kworker/1:1 Not tainted 4.11.7-200.fc25.x86_64 #1
[192885.875514] Hardware name: CompuLab sbc-flt1/fitlet, BIOS SBCFLT_0.08.04 06/27/2015
[192885.875528] Workqueue: events proc_cleanup_work
[192885.875533] Call Trace:
[192885.875552] dump_stack+0x63/0x86
[192885.875558] __warn+0xcb/0xf0
[192885.875562] warn_slowpath_null+0x1d/0x20
[192885.875566] umount_check+0x6e/0x80
[192885.875570] d_walk+0xc6/0x270
[192885.875574] ? dentry_free+0x80/0x80
[192885.875578] do_one_tree+0x26/0x40
[192885.875582] shrink_dcache_for_umount+0x2d/0x90
[192885.875588] generic_shutdown_super+0x1f/0xf0
[192885.875592] kill_anon_super+0x12/0x20
[192885.875599] proc_kill_sb+0x40/0x50
[192885.875604] deactivate_locked_super+0x43/0x70
[192885.875609] deactivate_super+0x5a/0x60
[192885.875613] cleanup_mnt+0x3f/0x90
[192885.875617] mntput_no_expire+0x13b/0x190
[192885.875621] kern_unmount+0x3e/0x50
[192885.875626] pid_ns_release_proc+0x15/0x20
[192885.875630] proc_cleanup_work+0x15/0x20
[192885.875636] process_one_work+0x197/0x450
[192885.875641] worker_thread+0x4e/0x4a0
[192885.875648] kthread+0x109/0x140
[192885.875652] ? process_one_work+0x450/0x450
[192885.875657] ? kthread_park+0x90/0x90
[192885.875665] ret_from_fork+0x2c/0x40
[192885.875670] ---[ end trace e1c109611e5d0b41 ]---
[192885.875688] VFS: Busy inodes after unmount of proc. Self-destruct in 5 seconds. Have a nice day...
[192885.875755] BUG: unable to handle kernel NULL pointer dereference at (null)
[192885.875959] IP: _raw_spin_lock+0xc/0x30
[192885.876033] PGD 0


>
> Thanks!
>
> >
> > I think the following change where it uses igrab has the possibility
> > of increasing the inode count while a pid namespace is being shut down.
> > So it is worth a good hard look and fixes and possibly a revert if we
> > can prove that this is the issue.
> >
> > commit d6cffbbe9a7e51eb705182965a189457c17ba8a3
> > Author: Konstantin Khlebnikov <[email protected]>
> > Date: Fri Feb 10 10:35:02 2017 +0300
> >
> > proc/sysctl: prune stale dentries during unregistering
> >
> > Eric
> >
> >
> > > Thanks,
> > > Andrei
> > >
> > >>
> > >> Eric
> > >>
> > >>> [root@zdtm linux]# git describe HEAD~1
> > >>> v4.12-rc7-25-g6474924
> > >>>
> > >>> And there is one more patch from the netfilter tree:
> > >>> commit b216759c0cb5d37d1eec3cd5b67ba38bace94fd8
> > >>> Author: Liping Zhang
> > >>> Date: Sun Jun 4 19:17:34 2017 +0800
> > >>>
> > >>> netfilter: nf_ct_dccp/sctp: fix memory leak after netns cleanup
> > >>>
> > >>> [root@zdtm linux]# uname -a
> > >>> Linux zdtm.openvz.org 4.12.0-rc7+ #9 SMP Thu Jun 29 08:28:18 CEST 2017
> > >>> x86_64 x86_64 x86_64 GNU/Linux
> > >>>
> > >>> A kernel config is attached.
> > >>>
> > >>>
> > >>> [22458.504137] BUG: Dentry ffff9f795a08fe60{i=af565f,n=lo} still in
> > >>> use (1) [unmount of proc proc]
> > >>> [22458.505117] ------------[ cut here ]------------
> > >>> [22458.505299] WARNING: CPU: 0 PID: 15036 at fs/dcache.c:1445
> > >>> umount_check+0x66/0x80
> > >>> [22458.505564] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
> > >>> sunrpc macvlan tun veth nf_conntrack_netlink xt_mark udp_diag tcp_diag
> > >>> inet_diag netlink_diag af_packet_diag unix_diag binfmt_misc
> > >>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
> > >>> ebtable_broute bridge stp llc ebtable_nat ip6table_security
> > >>> ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6
> > >>> nf_defrag_ipv6 nf_nat_ipv6 iptable_security iptable_mangle iptable_raw
> > >>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> > >>> nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables btrfs
> > >>> xor raid6_pq loop ppdev crct10dif_pclmul crc32_pclmul
> > >>> ghash_clmulni_intel lpc_ich sbs virtio_balloon shpchp sbshc parport_pc
> > >>> parport tpm_tis tpm_tis_core tpm xfs libcrc32c crc32c_intel
> > >>> ata_generic serio_raw pata_acpi
> > >>> [22458.507586] virtio_pci virtio virtio_ring e1000
> > >>> [22458.507771] CPU: 0 PID: 15036 Comm: kworker/0:2 Not tainted 4.12.0-rc7+ #9
> > >>> [22458.507830] systemd-journald[561]: Compressed data object 807 ->
> > >>> 605 using LZ4
> > >>> [22458.508184] Hardware name: Parallels Software International Inc.
> > >>> Parallels Virtual Platform/Parallels Virtual Platform, BIOS
> > >>> 6.12.26068.1232434 02/27/2017
> > >>> [22458.508641] Workqueue: events proc_cleanup_work
> > >>> [22458.508848] task: ffff9f797be80000 task.stack: ffffb2b8825f0000
> > >>> [22458.509033] RIP: 0010:umount_check+0x66/0x80
> > >>> [22458.509172] RSP: 0018:ffffb2b8825f3c68 EFLAGS: 00010286
> > >>> [22458.509363] RAX: 0000000000000054 RBX: ffff9f798492afa0 RCX: 0000000000000000
> > >>> [22458.509589] RDX: 0000000000000000 RSI: ffff9f79bce0e388 RDI: ffff9f79bce0e388
> > >>> [22458.509823] RBP: ffffb2b8825f3c70 R08: 0000000000000000 R09: 0000000000000000
> > >>> [22458.510022] R10: ffffb2b8825f3bf0 R11: 0000000000000000 R12: ffff9f795a08fe60
> > >>> [22458.510217] R13: ffff9f795a08ff50 R14: ffff9f795a08fee0 R15: ffff9f798492afa0
> > >>> [22458.510456] FS: 0000000000000000(0000) GS:ffff9f79bce00000(0000)
> > >>> knlGS:0000000000000000
> > >>> [22458.510726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >>> [22458.510953] CR2: 00007f834657a6b0 CR3: 00000000602a3000 CR4: 00000000000006f0
> > >>> [22458.511163] Call Trace:
> > >>> [22458.511289] d_walk+0x144/0x510
> > >>> [22458.511442] ? detach_and_collect+0x30/0x30
> > >>> [22458.511620] do_one_tree+0x26/0x40
> > >>> [22458.511793] shrink_dcache_for_umount+0x2d/0x90
> > >>> [22458.511954] generic_shutdown_super+0x1f/0x120
> > >>> [22458.512105] kill_anon_super+0x12/0x20
> > >>> [22458.512243] proc_kill_sb+0x40/0x50
> > >>> [22458.512406] deactivate_locked_super+0x43/0x70
> > >>> [22458.512576] deactivate_super+0x4e/0x60
> > >>> [22458.512731] cleanup_mnt+0x3f/0x80
> > >>> [22458.512890] mntput_no_expire+0x2fc/0x3b0
> > >>> [22458.513028] kern_unmount+0x3e/0x50
> > >>> [22458.513153] pid_ns_release_proc+0x15/0x20
> > >>> [22458.513309] proc_cleanup_work+0x15/0x20
> > >>> [22458.513469] process_one_work+0x205/0x5d0
> > >>> [22458.513635] worker_thread+0x4e/0x3a0
> > >>> [22458.513809] kthread+0x109/0x140
> > >>> [22458.513929] ? process_one_work+0x5d0/0x5d0
> > >>> [22458.514067] ? kthread_create_on_node+0x70/0x70
> > >>> [22458.514217] ? prepare_exit_to_usermode+0x7e/0x80
> > >>> [22458.514400] ret_from_fork+0x2a/0x40
> > >>> [22458.514550] Code: 48 8d 88 50 06 00 00 48 8b 40 28 4c 8b 08 48 8b
> > >>> 46 58 48 85 c0 74 1b 48 8b 50 40 51 48 c7 c7 60 f9 c9 95 48 89 f1 e8
> > >>> 99 39 f3 ff <0f> ff 58 31 c0 c9 c3 31 d2 eb e5 0f 1f 44 00 00 66 2e 0f
> > >>> 1f 84
> > >>> [22458.515141] ---[ end trace b37db95b00f941ab ]---
> > >>> [22458.519368] VFS: Busy inodes after unmount of proc. Self-destruct
> > >>> in 5 seconds. Have a nice day...
> > >>> [22458.813846] BUG: unable to handle kernel NULL pointer dereference
> > >>> at 0000000000000018
> > >>> [22458.814081] IP: __lock_acquire+0x127/0x12f0
> > >>> [22458.814213] PGD 5ce4e067
> > >>> [22458.814213] P4D 5ce4e067
> > >>> [22458.814340] PUD 5ce4f067
> > >>> [22458.814450] PMD 0
> > >>>
> > >>> [22458.814724] Oops: 0000 [#1] SMP
> > >>> [22458.814844] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace
> > >>> sunrpc macvlan tun veth nf_conntrack_netlink xt_mark udp_diag tcp_diag
> > >>> inet_diag netlink_diag af_packet_diag unix_diag binfmt_misc
> > >>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
> > >>> ebtable_broute bridge stp llc ebtable_nat ip6table_security
> > >>> ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6
> > >>> nf_defrag_ipv6 nf_nat_ipv6 iptable_security iptable_mangle iptable_raw
> > >>> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> > >>> nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables btrfs
> > >>> xor raid6_pq loop ppdev crct10dif_pclmul crc32_pclmul
> > >>> ghash_clmulni_intel lpc_ich sbs virtio_balloon shpchp sbshc parport_pc
> > >>> parport tpm_tis tpm_tis_core tpm xfs libcrc32c crc32c_intel
> > >>> ata_generic serio_raw pata_acpi
> > >>> [22458.816586] virtio_pci virtio virtio_ring e1000
> > >>> [22458.816741] CPU: 0 PID: 31415 Comm: kworker/u64:3 Tainted: G
> > >>> W 4.12.0-rc7+ #9
> > >>> [22458.816972] Hardware name: Parallels Software International Inc.
> > >>> Parallels Virtual Platform/Parallels Virtual Platform, BIOS
> > >>> 6.12.26068.1232434 02/27/2017
> > >>> [22458.817355] Workqueue: netns cleanup_net
> > >>> [22458.817500] task: ffff9f799f278000 task.stack: ffffb2b882934000
> > >>> [22458.817676] RIP: 0010:__lock_acquire+0x127/0x12f0
> > >>> [22458.817826] RSP: 0018:ffffb2b882937950 EFLAGS: 00010002
> > >>> [22458.817995] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
> > >>> [22458.818192] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000018
> > >>> [22458.818422] RBP: ffffb2b882937a10 R08: ffffffff95237bca R09: 0000000000000000
> > >>> [22458.818637] R10: 0000000000000000 R11: ffff9f799f278000 R12: 0000000000000001
> > >>> [22458.818846] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000018
> > >>> [22458.819046] FS: 0000000000000000(0000) GS:ffff9f79bce00000(0000)
> > >>> knlGS:0000000000000000
> > >>> [22458.819264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >>> [22458.819468] CR2: 0000000000000018 CR3: 000000003f0cd000 CR4: 00000000000006f0
> > >>> [22458.819669] Call Trace:
> > >>> [22458.819784] ? __lock_acquire+0x39e/0x12f0
> > >>> [22458.819919] lock_acquire+0xd5/0x1c0
> > >>> [22458.820043] ? lock_acquire+0xd5/0x1c0
> > >>> [22458.820174] ? list_lru_del+0x5a/0x170
> > >>> [22458.820331] _raw_spin_lock+0x28/0x40
> > >>> [22458.820475] ? list_lru_del+0x5a/0x170
> > >>> [22458.820611] list_lru_del+0x5a/0x170
> > >>> [22458.820748] d_lru_del+0x44/0x60
> > >>> [22458.820870] __dentry_kill+0x149/0x180
> > >>> [22458.820998] d_prune_aliases+0x5e/0x200
> > >>> [22458.821130] drop_sysctl_table+0x135/0x310
> > >>> [22458.821270] ? trace_hardirqs_on_caller+0xf5/0x190
> > >>> [22458.821454] ? trace_hardirqs_on+0xd/0x10
> > >>> [22458.821597] drop_sysctl_table+0x212/0x310
> > >>> [22458.821741] ? trace_hardirqs_on_caller+0xf5/0x190
> > >>> [22458.821898] ? trace_hardirqs_on+0xd/0x10
> > >>> [22458.822034] drop_sysctl_table+0x212/0x310
> > >>> [22458.822172] ? unregister_sysctl_table+0x36/0x80
> > >>> [22458.822343] unregister_sysctl_table+0x3e/0x80
> > >>> [22458.822507] unregister_net_sysctl_table+0xe/0x10
> > >>> [22458.822664] __devinet_sysctl_unregister.isra.25+0x2b/0x50
> > >>> [22458.822847] devinet_exit_net+0x47/0x70
> > >>> [22458.822976] ops_exit_list.isra.4+0x38/0x60
> > >>> [22458.823116] cleanup_net+0x1ee/0x2f0
> > >>> [22458.823243] process_one_work+0x205/0x5d0
> > >>> [22458.823405] worker_thread+0x4e/0x3a0
> > >>> [22458.823549] kthread+0x109/0x140
> > >>> [22458.823667] ? process_one_work+0x5d0/0x5d0
> > >>> [22458.823819] ? kthread_create_on_node+0x70/0x70
> > >>> [22458.823976] ret_from_fork+0x2a/0x40
> > >>> [22458.824101] Code: c8 65 48 33 3c 25 28 00 00 00 44 89 f0 0f 85 2b
> > >>> 0d 00 00 48 81 c4 90 00 00 00 5b 41 5a 41 5c 41 5d 41 5e 41 5f 5d 49
> > >>> 8d 62 f8 c3 <49> 81 3f e0 38 44 96 41 bc 00 00 00 00 44 0f 45 e2 83 fe
> > >>> 01 0f
> > >>> [22458.824634] RIP: __lock_acquire+0x127/0x12f0 RSP: ffffb2b882937950
> > >>> [22458.824821] CR2: 0000000000000018
> > >>> [22458.824943] ---[ end trace b37db95b00f941ac ]---
> > _______________________________________________
> > CRIU mailing list
> > [email protected]
> > https://lists.openvz.org/mailman/listinfo/criu

2017-07-06 13:48:57

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [CRIU] BUG: Dentry ffff9f795a08fe60{i=af565f, n=lo} still in use (1) [unmount of proc proc]

Andrei Vagin <[email protected]> writes:

> I did a few experiments and found that the bug is reproduced for 6-12
> hours on the our test server. Then I reverted two patches and the server
> is working normally for more than 24 hours already, so the bug is
> probably in one of these patches.
>
> commit e3d0065ab8535cbeee69a4c46a59f4d7360803ae
> Author: Andrei Vagin <[email protected]>
> Date: Sun Jul 2 07:41:25 2017 +0200
>
> Revert "proc/sysctl: prune stale dentries during unregistering"
>
> This reverts commit d6cffbbe9a7e51eb705182965a189457c17ba8a3.
>
> commit 2d3c50dac81011c1da4d2f7a63b84bd75287e320
> Author: Andrei Vagin <[email protected]>
> Date: Sun Jul 2 07:40:08 2017 +0200
>
> Revert "proc/sysctl: Don't grab i_lock under sysctl_lock."
>
> This reverts commit ace0c791e6c3cf5ef37cad2df69f0d90ccc40ffb.
>
>
> FYI: This bug has been reproduced on 4.11.7

Instead of the revert could you test the patch below?

This should fix the issue by grabbing a s_active reference
to the proc super block for every inode we flush.

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index c5ae09b6c726..18694598bebf 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -67,7 +67,7 @@ struct proc_inode {
struct proc_dir_entry *pde;
struct ctl_table_header *sysctl;
struct ctl_table *sysctl_entry;
- struct list_head sysctl_inodes;
+ struct hlist_node sysctl_inodes;
const struct proc_ns_operations *ns_ops;
struct inode vfs_inode;
};
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 67985a7233c2..9bf06e2b1284 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -191,7 +191,7 @@ static void init_header(struct ctl_table_header *head,
head->set = set;
head->parent = NULL;
head->node = node;
- INIT_LIST_HEAD(&head->inodes);
+ INIT_HLIST_HEAD(&head->inodes);
if (node) {
struct ctl_table *entry;
for (entry = table; entry->procname; entry++, node++)
@@ -261,25 +261,42 @@ static void unuse_table(struct ctl_table_header *p)
complete(p->unregistering);
}

-/* called under sysctl_lock */
static void proc_sys_prune_dcache(struct ctl_table_header *head)
{
- struct inode *inode, *prev = NULL;
+ struct inode *inode;
struct proc_inode *ei;
+ struct hlist_node *node;
+ struct super_block *sb;

rcu_read_lock();
- list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) {
- inode = igrab(&ei->vfs_inode);
- if (inode) {
- rcu_read_unlock();
- iput(prev);
- prev = inode;
- d_prune_aliases(inode);
+ for (;;) {
+ node = hlist_first_rcu(&head->inodes);
+ if (!node)
+ break;
+ ei = hlist_entry(node, struct proc_inode, sysctl_inodes);
+ spin_lock(&sysctl_lock);
+ hlist_del_init_rcu(&ei->sysctl_inodes);
+ spin_unlock(&sysctl_lock);
+
+ inode = &ei->vfs_inode;
+ sb = inode->i_sb;
+ if (!atomic_inc_not_zero(&sb->s_active))
+ continue;
+ inode = igrab(inode);
+ rcu_read_unlock();
+ if (unlikely(!inode)) {
+ deactivate_super(sb);
rcu_read_lock();
+ continue;
}
+
+ d_prune_aliases(inode);
+ iput(inode);
+ deactivate_super(sb);
+
+ rcu_read_lock();
}
rcu_read_unlock();
- iput(prev);
}

/* called under sysctl_lock, will reacquire if has to wait */
@@ -461,7 +478,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
}
ei->sysctl = head;
ei->sysctl_entry = table;
- list_add_rcu(&ei->sysctl_inodes, &head->inodes);
+ hlist_add_head_rcu(&ei->sysctl_inodes, &head->inodes);
head->count++;
spin_unlock(&sysctl_lock);

@@ -489,7 +506,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head)
{
spin_lock(&sysctl_lock);
- list_del_rcu(&PROC_I(inode)->sysctl_inodes);
+ hlist_del_init_rcu(&PROC_I(inode)->sysctl_inodes);
if (!--head->count)
kfree_rcu(head, rcu);
spin_unlock(&sysctl_lock);
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 80d07816def0..1c04a26bfd2f 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -143,7 +143,7 @@ struct ctl_table_header
struct ctl_table_set *set;
struct ctl_dir *parent;
struct ctl_node *node;
- struct list_head inodes; /* head for proc_inode->sysctl_inodes */
+ struct hlist_head inodes; /* head for proc_inode->sysctl_inodes */
};

struct ctl_dir {

2017-07-07 20:34:16

by Andrei Vagin

[permalink] [raw]
Subject: Re: [CRIU] BUG: Dentry ffff9f795a08fe60{i=af565f, n=lo} still in use (1) [unmount of proc proc]

On Thu, Jul 06, 2017 at 08:41:00AM -0500, Eric W. Biederman wrote:
> Andrei Vagin <[email protected]> writes:
>
> > I did a few experiments and found that the bug is reproduced for 6-12
> > hours on the our test server. Then I reverted two patches and the server
> > is working normally for more than 24 hours already, so the bug is
> > probably in one of these patches.
> >
> > commit e3d0065ab8535cbeee69a4c46a59f4d7360803ae
> > Author: Andrei Vagin <[email protected]>
> > Date: Sun Jul 2 07:41:25 2017 +0200
> >
> > Revert "proc/sysctl: prune stale dentries during unregistering"
> >
> > This reverts commit d6cffbbe9a7e51eb705182965a189457c17ba8a3.
> >
> > commit 2d3c50dac81011c1da4d2f7a63b84bd75287e320
> > Author: Andrei Vagin <[email protected]>
> > Date: Sun Jul 2 07:40:08 2017 +0200
> >
> > Revert "proc/sysctl: Don't grab i_lock under sysctl_lock."
> >
> > This reverts commit ace0c791e6c3cf5ef37cad2df69f0d90ccc40ffb.
> >
> >
> > FYI: This bug has been reproduced on 4.11.7
>
> Instead of the revert could you test the patch below?

Our CI server are working with this patch for more than one day without
any problem.

Tested-by: Andrei Vagin <[email protected]>

Thanks,
Andrei
>
> This should fix the issue by grabbing a s_active reference
> to the proc super block for every inode we flush.
>
> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
> index c5ae09b6c726..18694598bebf 100644
> --- a/fs/proc/internal.h
> +++ b/fs/proc/internal.h
> @@ -67,7 +67,7 @@ struct proc_inode {
> struct proc_dir_entry *pde;
> struct ctl_table_header *sysctl;
> struct ctl_table *sysctl_entry;
> - struct list_head sysctl_inodes;
> + struct hlist_node sysctl_inodes;
> const struct proc_ns_operations *ns_ops;
> struct inode vfs_inode;
> };
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 67985a7233c2..9bf06e2b1284 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -191,7 +191,7 @@ static void init_header(struct ctl_table_header *head,
> head->set = set;
> head->parent = NULL;
> head->node = node;
> - INIT_LIST_HEAD(&head->inodes);
> + INIT_HLIST_HEAD(&head->inodes);
> if (node) {
> struct ctl_table *entry;
> for (entry = table; entry->procname; entry++, node++)
> @@ -261,25 +261,42 @@ static void unuse_table(struct ctl_table_header *p)
> complete(p->unregistering);
> }
>
> -/* called under sysctl_lock */
> static void proc_sys_prune_dcache(struct ctl_table_header *head)
> {
> - struct inode *inode, *prev = NULL;
> + struct inode *inode;
> struct proc_inode *ei;
> + struct hlist_node *node;
> + struct super_block *sb;
>
> rcu_read_lock();
> - list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) {
> - inode = igrab(&ei->vfs_inode);
> - if (inode) {
> - rcu_read_unlock();
> - iput(prev);
> - prev = inode;
> - d_prune_aliases(inode);
> + for (;;) {
> + node = hlist_first_rcu(&head->inodes);
> + if (!node)
> + break;
> + ei = hlist_entry(node, struct proc_inode, sysctl_inodes);
> + spin_lock(&sysctl_lock);
> + hlist_del_init_rcu(&ei->sysctl_inodes);
> + spin_unlock(&sysctl_lock);
> +
> + inode = &ei->vfs_inode;
> + sb = inode->i_sb;
> + if (!atomic_inc_not_zero(&sb->s_active))
> + continue;
> + inode = igrab(inode);
> + rcu_read_unlock();
> + if (unlikely(!inode)) {
> + deactivate_super(sb);
> rcu_read_lock();
> + continue;
> }
> +
> + d_prune_aliases(inode);
> + iput(inode);
> + deactivate_super(sb);
> +
> + rcu_read_lock();
> }
> rcu_read_unlock();
> - iput(prev);
> }
>
> /* called under sysctl_lock, will reacquire if has to wait */
> @@ -461,7 +478,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
> }
> ei->sysctl = head;
> ei->sysctl_entry = table;
> - list_add_rcu(&ei->sysctl_inodes, &head->inodes);
> + hlist_add_head_rcu(&ei->sysctl_inodes, &head->inodes);
> head->count++;
> spin_unlock(&sysctl_lock);
>
> @@ -489,7 +506,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb,
> void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head)
> {
> spin_lock(&sysctl_lock);
> - list_del_rcu(&PROC_I(inode)->sysctl_inodes);
> + hlist_del_init_rcu(&PROC_I(inode)->sysctl_inodes);
> if (!--head->count)
> kfree_rcu(head, rcu);
> spin_unlock(&sysctl_lock);
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 80d07816def0..1c04a26bfd2f 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -143,7 +143,7 @@ struct ctl_table_header
> struct ctl_table_set *set;
> struct ctl_dir *parent;
> struct ctl_node *node;
> - struct list_head inodes; /* head for proc_inode->sysctl_inodes */
> + struct hlist_head inodes; /* head for proc_inode->sysctl_inodes */
> };
>
> struct ctl_dir {
>