Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752721AbdGFNs5 (ORCPT ); Thu, 6 Jul 2017 09:48:57 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:38254 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752010AbdGFNsz (ORCPT ); Thu, 6 Jul 2017 09:48:55 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Andrei Vagin Cc: Andrei Vagin , Konstantin Khlebnikov , linux-fsdevel , Linux Kernel Network Developers , Linux Containers , LKML , "criu\@openvz.org" References: <8737ai4ns6.fsf@xmission.com> <87vane1cao.fsf@xmission.com> <20170630191106.GB8899@outlook.office365.com> <20170703163646.GA9346@gmail.com> Date: Thu, 06 Jul 2017 08:41:00 -0500 In-Reply-To: <20170703163646.GA9346@gmail.com> (Andrei Vagin's message of "Mon, 3 Jul 2017 09:36:48 -0700") Message-ID: <87van54r9v.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1dT79E-0008Nt-4g;;;mid=<87van54r9v.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.213.87;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+zhm5p4qq8yKhm8TTZ8UOffJhjJ1tP88w= X-SA-Exim-Connect-IP: 67.3.213.87 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 6.0 XMLtrSubs Symbols that represent words * 0.7 XMSubLong Long Subject * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa04 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_05 8+ unique symbols in subject * 0.0 T_TooManySym_03 6+ unique symbols in subject * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.0 T_TooManySym_04 7+ unique symbols in subject * 0.0 T_TooManySym_02 5+ unique symbols in subject X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ******;Andrei Vagin X-Spam-Relay-Country: X-Spam-Timing: total 5312 ms - load_scoreonly_sql: 0.06 (0.0%), signal_user_changed: 3.3 (0.1%), b_tie_ro: 2.3 (0.0%), parse: 1.73 (0.0%), extract_message_metadata: 20 (0.4%), get_uri_detail_list: 4.2 (0.1%), tests_pri_-1000: 8 (0.1%), tests_pri_-950: 1.74 (0.0%), tests_pri_-900: 1.43 (0.0%), tests_pri_-400: 35 (0.7%), check_bayes: 33 (0.6%), b_tokenize: 15 (0.3%), b_tok_get_all: 9 (0.2%), b_comp_prob: 3.1 (0.1%), b_tok_touch_all: 4.4 (0.1%), b_finish: 0.71 (0.0%), tests_pri_0: 486 (9.2%), check_dkim_signature: 0.70 (0.0%), check_dkim_adsp: 4.9 (0.1%), tests_pri_500: 4750 (89.4%), poll_dns_idle: 4744 (89.3%), rewrite_mail: 0.00 (0.0%) Subject: Re: [CRIU] BUG: Dentry ffff9f795a08fe60{i=af565f, n=lo} still in use (1) [unmount of proc proc] X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4313 Lines: 141 Andrei Vagin writes: > I did a few experiments and found that the bug is reproduced for 6-12 > hours on the our test server. Then I reverted two patches and the server > is working normally for more than 24 hours already, so the bug is > probably in one of these patches. > > commit e3d0065ab8535cbeee69a4c46a59f4d7360803ae > Author: Andrei Vagin > Date: Sun Jul 2 07:41:25 2017 +0200 > > Revert "proc/sysctl: prune stale dentries during unregistering" > > This reverts commit d6cffbbe9a7e51eb705182965a189457c17ba8a3. > > commit 2d3c50dac81011c1da4d2f7a63b84bd75287e320 > Author: Andrei Vagin > Date: Sun Jul 2 07:40:08 2017 +0200 > > Revert "proc/sysctl: Don't grab i_lock under sysctl_lock." > > This reverts commit ace0c791e6c3cf5ef37cad2df69f0d90ccc40ffb. > > > FYI: This bug has been reproduced on 4.11.7 Instead of the revert could you test the patch below? This should fix the issue by grabbing a s_active reference to the proc super block for every inode we flush. diff --git a/fs/proc/internal.h b/fs/proc/internal.h index c5ae09b6c726..18694598bebf 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -67,7 +67,7 @@ struct proc_inode { struct proc_dir_entry *pde; struct ctl_table_header *sysctl; struct ctl_table *sysctl_entry; - struct list_head sysctl_inodes; + struct hlist_node sysctl_inodes; const struct proc_ns_operations *ns_ops; struct inode vfs_inode; }; diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 67985a7233c2..9bf06e2b1284 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -191,7 +191,7 @@ static void init_header(struct ctl_table_header *head, head->set = set; head->parent = NULL; head->node = node; - INIT_LIST_HEAD(&head->inodes); + INIT_HLIST_HEAD(&head->inodes); if (node) { struct ctl_table *entry; for (entry = table; entry->procname; entry++, node++) @@ -261,25 +261,42 @@ static void unuse_table(struct ctl_table_header *p) complete(p->unregistering); } -/* called under sysctl_lock */ static void proc_sys_prune_dcache(struct ctl_table_header *head) { - struct inode *inode, *prev = NULL; + struct inode *inode; struct proc_inode *ei; + struct hlist_node *node; + struct super_block *sb; rcu_read_lock(); - list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) { - inode = igrab(&ei->vfs_inode); - if (inode) { - rcu_read_unlock(); - iput(prev); - prev = inode; - d_prune_aliases(inode); + for (;;) { + node = hlist_first_rcu(&head->inodes); + if (!node) + break; + ei = hlist_entry(node, struct proc_inode, sysctl_inodes); + spin_lock(&sysctl_lock); + hlist_del_init_rcu(&ei->sysctl_inodes); + spin_unlock(&sysctl_lock); + + inode = &ei->vfs_inode; + sb = inode->i_sb; + if (!atomic_inc_not_zero(&sb->s_active)) + continue; + inode = igrab(inode); + rcu_read_unlock(); + if (unlikely(!inode)) { + deactivate_super(sb); rcu_read_lock(); + continue; } + + d_prune_aliases(inode); + iput(inode); + deactivate_super(sb); + + rcu_read_lock(); } rcu_read_unlock(); - iput(prev); } /* called under sysctl_lock, will reacquire if has to wait */ @@ -461,7 +478,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, } ei->sysctl = head; ei->sysctl_entry = table; - list_add_rcu(&ei->sysctl_inodes, &head->inodes); + hlist_add_head_rcu(&ei->sysctl_inodes, &head->inodes); head->count++; spin_unlock(&sysctl_lock); @@ -489,7 +506,7 @@ static struct inode *proc_sys_make_inode(struct super_block *sb, void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head) { spin_lock(&sysctl_lock); - list_del_rcu(&PROC_I(inode)->sysctl_inodes); + hlist_del_init_rcu(&PROC_I(inode)->sysctl_inodes); if (!--head->count) kfree_rcu(head, rcu); spin_unlock(&sysctl_lock); diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 80d07816def0..1c04a26bfd2f 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -143,7 +143,7 @@ struct ctl_table_header struct ctl_table_set *set; struct ctl_dir *parent; struct ctl_node *node; - struct list_head inodes; /* head for proc_inode->sysctl_inodes */ + struct hlist_head inodes; /* head for proc_inode->sysctl_inodes */ }; struct ctl_dir {