Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941337AbcJGRPc (ORCPT ); Fri, 7 Oct 2016 13:15:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54228 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964900AbcJGRPV (ORCPT ); Fri, 7 Oct 2016 13:15:21 -0400 Date: Fri, 7 Oct 2016 19:15:18 +0200 From: Oleg Nesterov To: Dave Chinner Cc: Jan Kara , Al Viro , Nikolay Borisov , "Paul E. McKenney" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, fstests@vger.kernel.org Subject: Re: [PATCH V2 2/2] fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths Message-ID: <20161007171517.GA23721@redhat.com> References: <20160927065135.GA1139@quack2.suse.cz> <20160927172901.GA11879@redhat.com> <20160930171434.GA2373@redhat.com> <20161002214225.GS9806@dastard> <20161003164435.GB6634@redhat.com> <20161004114341.GA8572@redhat.com> <20161004194435.GW9806@dastard> <20161005164432.GB15121@redhat.com> <20161006171758.GA21707@redhat.com> <20161006215920.GE9806@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161006215920.GE9806@dastard> User-Agent: Mutt/1.5.24 (2015-08-30) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 07 Oct 2016 17:15:21 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 15202 Lines: 294 On 10/07, Dave Chinner wrote: > > On Thu, Oct 06, 2016 at 07:17:58PM +0200, Oleg Nesterov wrote: > > Probably false positive? Although when I look at the comment above xfs_sync_sb() > > I think that may be sometging like below makes sense, but I know absolutely nothing > > about fs/ and XFS in particular. > > > > Oleg. > > > > > > --- x/fs/xfs/xfs_trans.c > > +++ x/fs/xfs/xfs_trans.c > > @@ -245,7 +245,8 @@ xfs_trans_alloc( > > atomic_inc(&mp->m_active_trans); > > > > tp = kmem_zone_zalloc(xfs_trans_zone, > > - (flags & XFS_TRANS_NOFS) ? KM_NOFS : KM_SLEEP); > > + (flags & (XFS_TRANS_NOFS | XFS_TRANS_NO_WRITECOUNT)) > > + ? KM_NOFS : KM_SLEEP); > > tp->t_magic = XFS_TRANS_HEADER_MAGIC; > > tp->t_flags = flags; > > tp->t_mountp = mp; > > Brief examination says caller should set XFS_TRANS_NOFS, not change > the implementation to make XFS_TRANS_NO_WRITECOUNT flag to also mean > XFS_TRANS_NOFS. I didn't mean the change above can fix the problem, and I don't really understand your suggestion. Obviously any GFP_FS allocation in xfs_fs_freeze() paths will trigger the same warning. I added this hack --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1333,10 +1333,15 @@ xfs_fs_freeze( struct super_block *sb) { struct xfs_mount *mp = XFS_M(sb); + int ret; + current->flags |= PF_FSTRANS; // tell kmem_flags_convert() to remove GFP_FS xfs_save_resvblks(mp); xfs_quiesce_attr(mp); - return xfs_sync_sb(mp, true); + ret = xfs_sync_sb(mp, true); + current->flags &= ~PF_FSTRANS; + + return ret; } just for testing purposes and after that I got another warning below. I didn't read it carefully yet, but _at first glance_ it looks like the lock inversion uncovered by 2/2, although I can be easily wrong. cancel_delayed_work_sync(l_work) under sb_internal can hang if xfs_log_worker() waits for this rwsem?` Oleg. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 4.8.0+ #10 Tainted: G W --------------------------------------------------------- kswapd0/32 just changed the state of lock: (sb_internal){++++.?}, at: [] __sb_start_write+0xb7/0xf0 but this lock took another, RECLAIM_FS-unsafe lock in the past: ((&(&log->l_work)->work)){+.+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock((&(&log->l_work)->work)); local_irq_disable(); lock(sb_internal); lock((&(&log->l_work)->work)); lock(sb_internal); *** DEADLOCK *** no locks held by kswapd0/32. the shortest dependencies between 2nd lock and 1st lock: -> ((&(&log->l_work)->work)){+.+.+.} ops: 803 { HARDIRQ-ON-W at: [] __lock_acquire+0x611/0x1870 [] lock_acquire+0x10d/0x200 [] process_one_work+0x1c9/0x690 [] worker_thread+0x4e/0x480 [] kthread+0xf3/0x110 [] ret_from_fork+0x1f/0x40 SOFTIRQ-ON-W at: [] __lock_acquire+0x643/0x1870 [] lock_acquire+0x10d/0x200 [] process_one_work+0x1c9/0x690 [] worker_thread+0x4e/0x480 [] kthread+0xf3/0x110 [] ret_from_fork+0x1f/0x40 RECLAIM_FS-ON-W at: [] mark_held_locks+0x6f/0xa0 [] lockdep_trace_alloc+0xd3/0x120 [] kmem_cache_alloc+0x2f/0x280 [] kmem_zone_alloc+0x81/0x120 [] xfs_trans_alloc+0x6c/0x130 [] xfs_sync_sb+0x39/0x80 [] xfs_log_worker+0xd4/0xf0 [] process_one_work+0x1f0/0x690 [] worker_thread+0x4e/0x480 [] kthread+0xf3/0x110 [] ret_from_fork+0x1f/0x40 INITIAL USE at: [] __lock_acquire+0x24b/0x1870 [] lock_acquire+0x10d/0x200 [] process_one_work+0x1c9/0x690 [] worker_thread+0x4e/0x480 [] kthread+0xf3/0x110 [] ret_from_fork+0x1f/0x40 } ... key at: [] __key.130101+0x0/0x8 ... acquired at: [] lock_acquire+0x10d/0x200 [] flush_work+0x4c/0x2c0 [] __cancel_work_timer+0x131/0x1f0 [] cancel_delayed_work_sync+0x13/0x20 [] xfs_log_quiesce+0x34/0x3b0 [] xfs_quiesce_attr+0x6d/0xb0 [] xfs_fs_freeze+0x33/0x50 [] freeze_super+0xcf/0x190 [] do_vfs_ioctl+0x55f/0x6c0 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x1f/0xbd -> (sb_internal){++++.?} ops: 28351355 { HARDIRQ-ON-W at: [] __lock_acquire+0x611/0x1870 [] lock_acquire+0x10d/0x200 [] down_write+0x36/0x70 [] percpu_down_write+0x33/0x100 [] freeze_super+0xbd/0x190 [] do_vfs_ioctl+0x55f/0x6c0 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x1f/0xbd HARDIRQ-ON-R at: [] __lock_acquire+0x30d/0x1870 [] lock_acquire+0x10d/0x200 [] percpu_down_read_trylock+0x48/0xa0 [] __sb_start_write+0x7a/0xf0 [] xfs_trans_alloc+0xe3/0x130 [] xfs_vn_update_time+0x7a/0x210 [] touch_atime+0xa8/0xd0 [] generic_file_read_iter+0x703/0x8d0 [] xfs_file_buffered_aio_read+0x63/0x170 [] xfs_file_read_iter+0x68/0xc0 [] __vfs_read+0xe0/0x150 [] vfs_read+0x95/0x140 [] SyS_read+0x58/0xc0 [] do_syscall_64+0x6c/0x1e0 [] return_from_SYSCALL_64+0x0/0x7a SOFTIRQ-ON-W at: [] __lock_acquire+0x643/0x1870 [] lock_acquire+0x10d/0x200 [] down_write+0x36/0x70 [] percpu_down_write+0x33/0x100 [] freeze_super+0xbd/0x190 [] do_vfs_ioctl+0x55f/0x6c0 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x1f/0xbd SOFTIRQ-ON-R at: [] __lock_acquire+0x643/0x1870 [] lock_acquire+0x10d/0x200 [] percpu_down_read_trylock+0x48/0xa0 [] __sb_start_write+0x7a/0xf0 [] xfs_trans_alloc+0xe3/0x130 [] xfs_vn_update_time+0x7a/0x210 [] touch_atime+0xa8/0xd0 [] generic_file_read_iter+0x703/0x8d0 [] xfs_file_buffered_aio_read+0x63/0x170 [] xfs_file_read_iter+0x68/0xc0 [] __vfs_read+0xe0/0x150 [] vfs_read+0x95/0x140 [] SyS_read+0x58/0xc0 [] do_syscall_64+0x6c/0x1e0 [] return_from_SYSCALL_64+0x0/0x7a IN-RECLAIM_FS-R at: [] __lock_acquire+0x36d/0x1870 [] lock_acquire+0x10d/0x200 [] percpu_down_read+0x3c/0x90 [] __sb_start_write+0xb7/0xf0 [] xfs_trans_alloc+0xe3/0x130 [] xfs_iomap_write_allocate+0x1f7/0x380 [] xfs_map_blocks+0x22a/0x380 [] xfs_do_writepage+0x188/0x6c0 [] xfs_vm_writepage+0x3b/0x70 [] pageout.isra.46+0x190/0x380 [] shrink_page_list+0x9ab/0xa70 [] shrink_inactive_list+0x252/0x5d0 [] shrink_node_memcg+0x5af/0x790 [] shrink_node+0xe1/0x320 [] kswapd+0x387/0x8b0 [] kthread+0xf3/0x110 [] ret_from_fork+0x1f/0x40 RECLAIM_FS-ON-R at: [] mark_held_locks+0x6f/0xa0 [] lockdep_trace_alloc+0xd3/0x120 [] kmem_cache_alloc+0x2f/0x280 [] kmem_zone_alloc+0x81/0x120 [] xfs_trans_alloc+0x6c/0x130 [] xfs_vn_update_time+0x7a/0x210 [] touch_atime+0xa8/0xd0 [] generic_file_read_iter+0x703/0x8d0 [] xfs_file_buffered_aio_read+0x63/0x170 [] xfs_file_read_iter+0x68/0xc0 [] __vfs_read+0xe0/0x150 [] vfs_read+0x95/0x140 [] SyS_read+0x58/0xc0 [] do_syscall_64+0x6c/0x1e0 [] return_from_SYSCALL_64+0x0/0x7a INITIAL USE at: [] __lock_acquire+0x24b/0x1870 [] lock_acquire+0x10d/0x200 [] percpu_down_read_trylock+0x48/0xa0 [] __sb_start_write+0x7a/0xf0 [] xfs_trans_alloc+0xe3/0x130 [] xfs_vn_update_time+0x7a/0x210 [] touch_atime+0xa8/0xd0 [] generic_file_read_iter+0x703/0x8d0 [] xfs_file_buffered_aio_read+0x63/0x170 [] xfs_file_read_iter+0x68/0xc0 [] __vfs_read+0xe0/0x150 [] vfs_read+0x95/0x140 [] SyS_read+0x58/0xc0 [] do_syscall_64+0x6c/0x1e0 [] return_from_SYSCALL_64+0x0/0x7a } ... key at: [] xfs_fs_type+0x60/0x80 ... acquired at: [] check_usage_forwards+0x151/0x160 [] mark_lock+0x34a/0x650 [] __lock_acquire+0x36d/0x1870 [] lock_acquire+0x10d/0x200 [] percpu_down_read+0x3c/0x90 [] __sb_start_write+0xb7/0xf0 [] xfs_trans_alloc+0xe3/0x130 [] xfs_iomap_write_allocate+0x1f7/0x380 [] xfs_map_blocks+0x22a/0x380 [] xfs_do_writepage+0x188/0x6c0 [] xfs_vm_writepage+0x3b/0x70 [] pageout.isra.46+0x190/0x380 [] shrink_page_list+0x9ab/0xa70 [] shrink_inactive_list+0x252/0x5d0 [] shrink_node_memcg+0x5af/0x790 [] shrink_node+0xe1/0x320 [] kswapd+0x387/0x8b0 [] kthread+0xf3/0x110 [] ret_from_fork+0x1f/0x40 stack backtrace: CPU: 0 PID: 32 Comm: kswapd0 Tainted: G W 4.8.0+ #10 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 0000000000000086 0000000020cdca74 ffff880136c5b490 ffffffffb95b7a73 ffffffffbae7ed80 ffff880136c5b4f0 ffff880136c5b4d0 ffffffffb91e7c5a ffff880136c3aee8 ffff880136c3aee8 ffff880136c3a680 ffffffffb9e8bbdd Call Trace: [] dump_stack+0x85/0xc2 [] print_irq_inversion_bug.part.36+0x1a4/0x1b0 [] check_usage_forwards+0x151/0x160 [] mark_lock+0x34a/0x650 [] ? print_shortest_lock_dependencies+0x1a0/0x1a0 [] __lock_acquire+0x36d/0x1870 [] ? __lock_acquire+0x475/0x1870 [] lock_acquire+0x10d/0x200 [] ? __sb_start_write+0xb7/0xf0 [] percpu_down_read+0x3c/0x90 [] ? __sb_start_write+0xb7/0xf0 [] __sb_start_write+0xb7/0xf0 [] xfs_trans_alloc+0xe3/0x130 [] xfs_iomap_write_allocate+0x1f7/0x380 [] ? xfs_map_blocks+0xe3/0x380 [] ? rcu_read_lock_sched_held+0x58/0x60 [] xfs_map_blocks+0x22a/0x380 [] xfs_do_writepage+0x188/0x6c0 [] xfs_vm_writepage+0x3b/0x70 [] pageout.isra.46+0x190/0x380 [] shrink_page_list+0x9ab/0xa70 [] shrink_inactive_list+0x252/0x5d0 [] shrink_node_memcg+0x5af/0x790 [] shrink_node+0xe1/0x320 [] kswapd+0x387/0x8b0 [] ? mem_cgroup_shrink_node+0x2e0/0x2e0 [] kthread+0xf3/0x110 [] ret_from_fork+0x1f/0x40 [] ? kthread_create_on_node+0x230/0x230