Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752998AbcJLDHF convert rfc822-to-8bit (ORCPT ); Tue, 11 Oct 2016 23:07:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47042 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751171AbcJLDHC (ORCPT ); Tue, 11 Oct 2016 23:07:02 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.0 \(3226\)) Subject: Re: [PATCHv2] cephfs: Fix scheduler warning due to nested blocking From: "Yan, Zheng" In-Reply-To: <1476177371-13652-1-git-send-email-kernel@kyup.com> Date: Wed, 12 Oct 2016 11:05:13 +0800 Cc: idryomov@gmail.com, linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org Content-Transfer-Encoding: 8BIT Message-Id: References: <1476176649-13393-1-git-send-email-kernel@kyup.com> <1476177371-13652-1-git-send-email-kernel@kyup.com> To: Nikolay Borisov X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 12 Oct 2016 03:05:20 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3858 Lines: 93 > On 11 Oct 2016, at 17:16, Nikolay Borisov wrote: > > try_get_cap_refs can be used as a condition in a wait_event* calls. > This is all fine until it has to call __ceph_do_pending_vmtruncate, > which in turn acquires the i_truncate_mutex. This leads to a situation > in which a task's state is !TASK_RUNNING and at the same time it's > trying to acquire a sleeping primitive. In essence a nested sleeping > primitives are being used. This causes the following warning: > > WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0() > do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait_event+0x5d/0x110 > ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6 > CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G O 4.4.20-clouder2 #6 > Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015 > 0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0 > ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c > 0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0 > Call Trace: > [] dump_stack+0x67/0x9e > [] warn_slowpath_common+0x86/0xc0 > [] warn_slowpath_fmt+0x4c/0x50 > [] ? prepare_to_wait_event+0x5d/0x110 > [] ? prepare_to_wait_event+0x5d/0x110 > [] __might_sleep+0x9f/0xb0 > [] mutex_lock+0x20/0x40 > [] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph] > [] try_get_cap_refs+0xa2/0x320 [ceph] > [] ceph_get_caps+0x255/0x2b0 [ceph] > [] ? wait_woken+0xb0/0xb0 > [] ceph_write_iter+0x2b1/0xde0 [ceph] > [] ? schedule_timeout+0x202/0x260 > [] ? kmem_cache_free+0x1ea/0x200 > [] ? iput+0x9e/0x230 > [] ? __might_sleep+0x52/0xb0 > [] ? __might_fault+0x37/0x40 > [] ? cp_new_stat+0x153/0x170 > [] __vfs_write+0xaa/0xe0 > [] vfs_write+0xa9/0x190 > [] ? set_close_on_exec+0x31/0x70 > [] SyS_write+0x46/0xa0 > > This happens since wait_event_interruptible can interfere with the > mutex locking code, since they both fiddle with the task state. > > Fix the issue by using the newly-added nested blocking infrastructure > in 61ada528dea0 ("sched/wait: Provide infrastructure to deal with > nested blocking") > > Link: https://lwn.net/Articles/628628/ > Signed-off-by: Nikolay Borisov > --- > fs/ceph/caps.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > index c69e1253b47b..9d401520b981 100644 > --- a/fs/ceph/caps.c > +++ b/fs/ceph/caps.c > @@ -2467,6 +2467,7 @@ int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, > loff_t endoff, int *got, struct page **pinned_page) > { > int _got, ret, err = 0; > + DEFINE_WAIT_FUNC(wait, woken_wake_function); > > ret = ceph_pool_perm_check(ci, need); > if (ret < 0) > @@ -2486,9 +2487,14 @@ int ceph_get_caps(struct ceph_inode_info *ci, int need, int want, > if (err < 0) > return err; > } else { > - ret = wait_event_interruptible(ci->i_cap_wq, > - try_get_cap_refs(ci, need, want, endoff, > - true, &_got, &err)); > + add_wait_queue(&ci->i_cap_wq, &wait); > + > + while (!try_get_cap_refs(ci, need, want, endoff, > + true, &_got, &err)) > + wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); > + > + remove_wait_queue(&ci->i_cap_wq, &wait); > + > if (err == -EAGAIN) > continue; > if (err < 0) > -- > 2.5.0 > Applied, thanks Yan, Zheng