Received: by 2002:a4a:311b:0:0:0:0:0 with SMTP id k27-v6csp4800512ooa; Tue, 14 Aug 2018 10:43:50 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxzSk1B37A9BmOsQvWuMhlbjYgLfLebrHW5CIwYSJrNkA8OV/Xv942lRmOy23xSxLe8jZGU X-Received: by 2002:a62:a649:: with SMTP id t70-v6mr24562991pfe.149.1534268630841; Tue, 14 Aug 2018 10:43:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534268630; cv=none; d=google.com; s=arc-20160816; b=FJDomu53wi42VShopp9i1NsQc5y9DdwQ5tDZRtc6dm7z278l9REY9R5H4dax1m67CI j3Gvmu6sVUq2vDpIXC8ZqKiVu6LSQ4H0jKHzCCBra3ImM6fd1rv6tXAn00PqUXc21IG9 9NhAufcmwD0WytrkYusiGj3zIhxQXuLmHlmwgP8sOBzfn10aV+ACNv0XknDNkdlaj/EN aADKB0d6CMS5zhr0n7X8nuHzUN3ocz18yKa5EcrErbmEjFHttCWmWxGl7dPg4mRT47si vzuFo3QMcKLf3d0HZdlqcDayPr0+1Gb5AC/SuXqkBAWRoQjjnIAu15HgC1BrjWwROgSO l5KA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=ck3qSLzCFI27ktfMCIkzmT16IiQnsBJ+L+nf7s5nLiY=; b=qXKtlRVSegYWd/oFIv8JFzvMjqUcE3iSKLufiwXuXe4idDqcw2uCIHWfR2ggf8bHMX i7KBFtx7ZUMK3mTVHAV1hXUPYY6pUgBYgpEI5stGAd6ClHUOFZtLTy+ApveFwbThvnPb 7OG455NlTzRlI7RmLA2a31h/Pi7denmaz7s37zcq37TGt/moD7BhpjBEYGzHBsvZvmh0 qQjtMJeHNoAdnw6JPjj8/j8bb/tZsm3e60OMnKJ0K+WM8mgRmSIc248L7wixDtX4NWoQ L+KZvQxITcpuoh6riQwPWYtsG6M2YYj31T9j7E4WOxPiz0sPRC0wNgIVo9fkKcukBSxs xRRA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o69-v6si23844285pfo.342.2018.08.14.10.43.36; Tue, 14 Aug 2018 10:43:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390331AbeHNUaR (ORCPT + 99 others); Tue, 14 Aug 2018 16:30:17 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:59344 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390232AbeHNUaQ (ORCPT ); Tue, 14 Aug 2018 16:30:16 -0400 Received: from localhost (unknown [194.244.16.108]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 956C6C7B; Tue, 14 Aug 2018 17:42:00 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Konstantin Khlebnikov , Al Viro , "Eric W. Biederman" Subject: [PATCH 4.9 016/107] proc/sysctl: Dont grab i_lock under sysctl_lock. Date: Tue, 14 Aug 2018 19:16:39 +0200 Message-Id: <20180814171521.727718884@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180814171520.883143803@linuxfoundation.org> References: <20180814171520.883143803@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Eric W. Biederman commit ace0c791e6c3cf5ef37cad2df69f0d90ccc40ffb upstream. Konstantin Khlebnikov writes: > This patch has locking problem. I've got lockdep splat under LTP. > > [ 6633.115456] ====================================================== > [ 6633.115502] [ INFO: possible circular locking dependency detected ] > [ 6633.115553] 4.9.10-debug+ #9 Tainted: G L > [ 6633.115584] ------------------------------------------------------- > [ 6633.115627] ksm02/284980 is trying to acquire lock: > [ 6633.115659] (&sb->s_type->i_lock_key#4){+.+...}, at: [] igrab+0x1e/0x80 > [ 6633.115834] but task is already holding lock: > [ 6633.115882] (sysctl_lock){+.+...}, at: [] unregister_sysctl_table+0x6b/0x110 > [ 6633.116026] which lock already depends on the new lock. > [ 6633.116026] > [ 6633.116080] > [ 6633.116080] the existing dependency chain (in reverse order) is: > [ 6633.116117] > -> #2 (sysctl_lock){+.+...}: > -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}: > -> #0 (&sb->s_type->i_lock_key#4){+.+...}: > > d_lock nests inside i_lock > sysctl_lock nests inside d_lock in d_compare > > This patch adds i_lock nesting inside sysctl_lock. Al Viro replied: > Once ->unregistering is set, you can drop sysctl_lock just fine. So I'd > try something like this - use rcu_read_lock() in proc_sys_prune_dcache(), > drop sysctl_lock() before it and regain after. Make sure that no inodes > are added to the list ones ->unregistering has been set and use RCU list > primitives for modifying the inode list, with sysctl_lock still used to > serialize its modifications. > > Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing > igrab() is safe there. Since we don't drop inode reference until after we'd > passed beyond it in the list, list_for_each_entry_rcu() should be fine. I agree with Al Viro's analsysis of the situtation. Fixes: d6cffbbe9a7e ("proc/sysctl: prune stale dentries during unregistering") Reported-by: Konstantin Khlebnikov Tested-by: Konstantin Khlebnikov Suggested-by: Al Viro Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman --- fs/proc/proc_sysctl.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -266,21 +266,19 @@ static void proc_sys_prune_dcache(struct struct inode *inode, *prev = NULL; struct proc_inode *ei; - list_for_each_entry(ei, &head->inodes, sysctl_inodes) { + rcu_read_lock(); + list_for_each_entry_rcu(ei, &head->inodes, sysctl_inodes) { inode = igrab(&ei->vfs_inode); if (inode) { - spin_unlock(&sysctl_lock); + rcu_read_unlock(); iput(prev); prev = inode; d_prune_aliases(inode); - spin_lock(&sysctl_lock); + rcu_read_lock(); } } - if (prev) { - spin_unlock(&sysctl_lock); - iput(prev); - spin_lock(&sysctl_lock); - } + rcu_read_unlock(); + iput(prev); } /* called under sysctl_lock, will reacquire if has to wait */ @@ -296,10 +294,10 @@ static void start_unregistering(struct c p->unregistering = &wait; spin_unlock(&sysctl_lock); wait_for_completion(&wait); - spin_lock(&sysctl_lock); } else { /* anything non-NULL; we'll never dereference it */ p->unregistering = ERR_PTR(-EINVAL); + spin_unlock(&sysctl_lock); } /* * Prune dentries for unregistered sysctls: namespaced sysctls @@ -310,6 +308,7 @@ static void start_unregistering(struct c * do not remove from the list until nobody holds it; walking the * list in do_sysctl() relies on that. */ + spin_lock(&sysctl_lock); erase_header(p); } @@ -455,11 +454,17 @@ static struct inode *proc_sys_make_inode inode->i_ino = get_next_ino(); ei = PROC_I(inode); - ei->sysctl = head; - ei->sysctl_entry = table; spin_lock(&sysctl_lock); - list_add(&ei->sysctl_inodes, &head->inodes); + if (unlikely(head->unregistering)) { + spin_unlock(&sysctl_lock); + iput(inode); + inode = NULL; + goto out; + } + ei->sysctl = head; + ei->sysctl_entry = table; + list_add_rcu(&ei->sysctl_inodes, &head->inodes); head->count++; spin_unlock(&sysctl_lock); @@ -487,7 +492,7 @@ out: void proc_sys_evict_inode(struct inode *inode, struct ctl_table_header *head) { spin_lock(&sysctl_lock); - list_del(&PROC_I(inode)->sysctl_inodes); + list_del_rcu(&PROC_I(inode)->sysctl_inodes); if (!--head->count) kfree_rcu(head, rcu); spin_unlock(&sysctl_lock);