Received: by 10.223.176.5 with SMTP id f5csp1146411wra; Sat, 27 Jan 2018 17:18:47 -0800 (PST) X-Google-Smtp-Source: AH8x227SXGTOSyxQFL0oeb2ni2JxdLR0yX1gCtJaAkewuw1qoOWn+2J6Y+f/dvvn6+zuUpS9uH57 X-Received: by 10.99.111.10 with SMTP id k10mr18855160pgc.421.1517102327603; Sat, 27 Jan 2018 17:18:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517102327; cv=none; d=google.com; s=arc-20160816; b=LPfZQif1cgypEIUJf+j+XSyk2dHvDxg4jk9R3RCnnBVOYOWiVpyS/qxmaNFzzZRPZ0 ioBHqKSDL2IaGrrBJKDhlfbVG06XmFncilSVz6r+DHPGB+LiCDmhODGdybgFu/KT5Ztu qEWceI/JiUj9KAvWSunvYrR4dj+xDnJxzeOpyvf8tI5dipaCdUOxC8JRjsoPk3Ac4Ff+ zM4Nl+Ld8aXjKYZx18NZJO1HiMJpOq5ChTaPf0CKy3Y3NlhQJXbM5/jWOyMO+SWWyWAQ p+rs7FIJAILRBTZXazu5on7UPmk1NPKhKUhURbKtu6MzPpWe4Nz6vUYYgZvd8NKIE84W EG8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:cc:references:to:subject:arc-authentication-results; bh=2t8PJnTyzMub/1FsabEAicB+IofZycLodxVJEBKrWzA=; b=h6i8Kz98r5Qtcr/Lgy1hXJQMJgCVI+4IyB6qEH7JGgIvNoBQG0ZywRYHWqNtQtRXIo T3W61OJd468UnnwhZgqO8z1BXJgYHAbI4RQ9ekRcb1RPTdfYNiMic7jGb5V6p9DMCKAq DaC+cHwtnPBTph/txgt/Bne7u2wMWcWKisGdIJ3Bvl38bBzLlVWz0p0KF/weD3r0N02e TnWFkNJEGgF+BQFa6r1m/hXdXJ4/89piOmL+2bQKmUF5lEq0jdxn3YFY/UtaRBzUubJF nukyv3I1gxKDBcUW8/dIhnGdp12Ki85RJ6glsJ8sKpcVWYkGci6Dky/dA242my/VtXVg V4eQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e29-v6si5542199plj.474.2018.01.27.17.18.05; Sat, 27 Jan 2018 17:18:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753159AbeA1BRP (ORCPT + 99 others); Sat, 27 Jan 2018 20:17:15 -0500 Received: from www262.sakura.ne.jp ([202.181.97.72]:51717 "EHLO www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751452AbeA1BRO (ORCPT ); Sat, 27 Jan 2018 20:17:14 -0500 Received: from fsav303.sakura.ne.jp (fsav303.sakura.ne.jp [153.120.85.134]) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id w0S1GIe7022831; Sun, 28 Jan 2018 10:16:18 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav303.sakura.ne.jp (F-Secure/fsigk_smtp/530/fsav303.sakura.ne.jp); Sun, 28 Jan 2018 10:16:18 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/530/fsav303.sakura.ne.jp) Received: from [192.168.1.8] (softbank126074156036.bbtec.net [126.74.156.36]) (authenticated bits=0) by www262.sakura.ne.jp (8.14.5/8.14.5) with ESMTP id w0S1G1BH022785 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 28 Jan 2018 10:16:18 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Subject: Re: [4.15-rc9] fs_reclaim lockdep trace To: Linus Torvalds , Dave Jones , Peter Zijlstra References: <20180124013651.GA1718@codemonkey.org.uk> <20180127222433.GA24097@codemonkey.org.uk> Cc: Linux Kernel , linux-mm , Network Development From: Tetsuo Handa Message-ID: Date: Sun, 28 Jan 2018 10:16:02 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Linus Torvalds wrote: > On Sat, Jan 27, 2018 at 2:24 PM, Dave Jones wrote: >> On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote: >> > Just triggered this on a server I was rsync'ing to. >> >> Actually, I can trigger this really easily, even with an rsync from one >> disk to another. Though that also smells a little like networking in >> the traces. Maybe netdev has ideas. > > Is this new to 4.15? Or is it just that you're testing something new? > > If it's new and easy to repro, can you just bisect it? And if it isn't > new, can you perhaps check whether it's new to 4.14 (ie 4.13 being > ok)? > > Because that fs_reclaim_acquire/release() debugging isn't new to 4.15, > but it was rewritten for 4.14.. I'm wondering if that remodeling ended > up triggering something. --- linux-4.13.16/mm/page_alloc.c +++ linux-4.14.15/mm/page_alloc.c @@ -3527,53 +3519,12 @@ return true; } return false; } #endif /* CONFIG_COMPACTION */ -#ifdef CONFIG_LOCKDEP -struct lockdep_map __fs_reclaim_map = - STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map); - -static bool __need_fs_reclaim(gfp_t gfp_mask) -{ - gfp_mask = current_gfp_context(gfp_mask); - - /* no reclaim without waiting on it */ - if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) - return false; - - /* this guy won't enter reclaim */ - if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC)) - return false; - - /* We're only interested __GFP_FS allocations for now */ - if (!(gfp_mask & __GFP_FS)) - return false; - - if (gfp_mask & __GFP_NOLOCKDEP) - return false; - - return true; -} - -void fs_reclaim_acquire(gfp_t gfp_mask) -{ - if (__need_fs_reclaim(gfp_mask)) - lock_map_acquire(&__fs_reclaim_map); -} -EXPORT_SYMBOL_GPL(fs_reclaim_acquire); - -void fs_reclaim_release(gfp_t gfp_mask) -{ - if (__need_fs_reclaim(gfp_mask)) - lock_map_release(&__fs_reclaim_map); -} -EXPORT_SYMBOL_GPL(fs_reclaim_release); -#endif - /* Perform direct synchronous page reclaim */ static int __perform_reclaim(gfp_t gfp_mask, unsigned int order, const struct alloc_context *ac) { struct reclaim_state reclaim_state; @@ -3582,21 +3533,21 @@ cond_resched(); /* We now go into synchronous reclaim */ cpuset_memory_pressure_bump(); noreclaim_flag = memalloc_noreclaim_save(); - fs_reclaim_acquire(gfp_mask); + lockdep_set_current_reclaim_state(gfp_mask); reclaim_state.reclaimed_slab = 0; current->reclaim_state = &reclaim_state; progress = try_to_free_pages(ac->zonelist, order, gfp_mask, ac->nodemask); current->reclaim_state = NULL; - fs_reclaim_release(gfp_mask); + lockdep_clear_current_reclaim_state(); memalloc_noreclaim_restore(noreclaim_flag); cond_resched(); return progress; } > > Adding PeterZ to the participants list in case he has ideas. I'm not > seeing what would be the problem in that call chain from hell. > > Linus Dave Jones wrote: > ============================================ > WARNING: possible recursive locking detected > 4.15.0-rc9-backup-debug+ #1 Not tainted > -------------------------------------------- > sshd/24800 is trying to acquire lock: > (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > but task is already holding lock: > (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(fs_reclaim); > lock(fs_reclaim); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by sshd/24800: > #0: (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40 > #1: (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 > > stack backtrace: > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 > Call Trace: > dump_stack+0xbc/0x13f > __lock_acquire+0xa09/0x2040 > lock_acquire+0x12e/0x350 > fs_reclaim_acquire.part.102+0x29/0x30 > kmem_cache_alloc+0x3d/0x2c0 > alloc_extent_state+0xa7/0x410 > __clear_extent_bit+0x3ea/0x570 > try_release_extent_mapping+0x21a/0x260 > __btrfs_releasepage+0xb0/0x1c0 > btrfs_releasepage+0x161/0x170 > try_to_release_page+0x162/0x1c0 > shrink_page_list+0x1d5a/0x2fb0 > shrink_inactive_list+0x451/0x940 > shrink_node_memcg.constprop.88+0x4c9/0x5e0 > shrink_node+0x12d/0x260 > try_to_free_pages+0x418/0xaf0 > __alloc_pages_slowpath+0x976/0x1790 > __alloc_pages_nodemask+0x52c/0x5c0 > new_slab+0x374/0x3f0 > ___slab_alloc.constprop.81+0x47e/0x5a0 > __slab_alloc.constprop.80+0x32/0x60 > __kmalloc_track_caller+0x267/0x310 > __kmalloc_reserve.isra.40+0x29/0x80 > __alloc_skb+0xee/0x390 > sk_stream_alloc_skb+0xb8/0x340 > tcp_sendmsg_locked+0x8e6/0x1d30 > tcp_sendmsg+0x27/0x40 > inet_sendmsg+0xd0/0x310 > sock_write_iter+0x17a/0x240 > __vfs_write+0x2ab/0x380 > vfs_write+0xfb/0x260 > SyS_write+0xb6/0x140 > do_syscall_64+0x1e5/0xc05 > entry_SYSCALL64_slow_path+0x25/0x25 > ============================================ > WARNING: possible recursive locking detected > 4.15.0-rc9-backup-debug+ #7 Not tainted > -------------------------------------------- > snmpd/892 is trying to acquire lock: > (fs_reclaim){+.+.}, at: [<0000000002e4c185>] fs_reclaim_acquire.part.101+0x5/0x30 > > but task is already holding lock: > (fs_reclaim){+.+.}, at: [<0000000002e4c185>] fs_reclaim_acquire.part.101+0x5/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(fs_reclaim); > lock(fs_reclaim); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by snmpd/892: > #0: (rtnl_mutex){+.+.}, at: [<00000000dcd3ba2f>] netlink_dump+0x89/0x520 > #1: (fs_reclaim){+.+.}, at: [<0000000002e4c185>] fs_reclaim_acquire.part.101+0x5/0x30 > > stack backtrace: > CPU: 5 PID: 892 Comm: snmpd Not tainted 4.15.0-rc9-backup-debug+ #7 > Call Trace: > dump_stack+0xbc/0x13f > __lock_acquire+0xa09/0x2040 > lock_acquire+0x12e/0x350 > fs_reclaim_acquire.part.101+0x29/0x30 > kmem_cache_alloc+0x3d/0x2c0 > alloc_extent_state+0xa7/0x410 > __clear_extent_bit+0x3ea/0x570 > try_release_extent_mapping+0x21a/0x260 > __btrfs_releasepage+0xb0/0x1c0 > btrfs_releasepage+0x161/0x170 > try_to_release_page+0x162/0x1c0 > shrink_page_list+0x1d5a/0x2fb0 > shrink_inactive_list+0x451/0x940 > shrink_node_memcg.constprop.84+0x4c9/0x5e0 > shrink_node+0x1c2/0x510 > try_to_free_pages+0x425/0xb90 > __alloc_pages_slowpath+0x955/0x1a00 > __alloc_pages_nodemask+0x52c/0x5c0 > new_slab+0x374/0x3f0 > ___slab_alloc.constprop.81+0x47e/0x5a0 > __slab_alloc.constprop.80+0x32/0x60 > __kmalloc_track_caller+0x267/0x310 > __kmalloc_reserve.isra.40+0x29/0x80 > __alloc_skb+0xee/0x390 > netlink_dump+0x2e1/0x520 > __netlink_dump_start+0x201/0x280 > rtnetlink_rcv_msg+0x6d6/0xa90 > netlink_rcv_skb+0xb6/0x1d0 > netlink_unicast+0x298/0x320 > netlink_sendmsg+0x57e/0x630 > SYSC_sendto+0x296/0x320 > do_syscall_64+0x1e5/0xc05 > entry_SYSCALL64_slow_path+0x25/0x25 > RIP: 0033:0x7f204299f54d > RSP: 002b:00007ffc49024fd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c > RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f204299f54d > RDX: 0000000000000018 RSI: 00007ffc49025010 RDI: 0000000000000012 > RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000012 > R13: 00007ffc49029550 R14: 000055e31307a250 R15: 00007ffc49029530 Both traces are identical and no fs locks held? And therefore, doing GFP_KERNEL allocation should be safe (as long as there is PF_MEMALLOC safeguard which prevents infinite recursion), isn't it? Then, I think that "git bisect" should reach commit d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation").