Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp68653imu; Thu, 3 Jan 2019 14:15:01 -0800 (PST) X-Google-Smtp-Source: ALg8bN4f5iK2BtOtYuhtmXJyBBo16wbTJOQaFErbj7nIRrrNT3YALktScqX13aFtPWuKTZsLwatO X-Received: by 2002:a63:f109:: with SMTP id f9mr18363599pgi.286.1546553701764; Thu, 03 Jan 2019 14:15:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1546553701; cv=none; d=google.com; s=arc-20160816; b=vWySDSRDuf7fwxIwZsyYtalPS6uDbuO+tjy7OcfzvXYMl1DwB5dJUra7gq4pWPA5d8 v1WNg6u0JA7QDI9rJKVpy2qbNM2xWVGfiVjbQhaPJVfBda3H+bJ4DYuCn82zVTIYg0+B D8Bb7RdIP7+xeapeCdRhz8GuLwClSzeImUXDf5Q39ktwMUyl0mqeYyqXJ8bSwgFxUoCX e5jrop9MSeZCWa70iBi0y87wpPY68S+U3F/yzFZCf4iUmoLtnadsfM+m8hHeD+zRCgWj UlS79YYcJ2ZmheyfX+cVMlFve7uJuNp7/OeZL6MYv+1L47xTCCd5DVuaL7H8XsGMIkQ9 S7Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Kt7th4owl973QnOYNFU3eH4yJaY78UXSmPgY/uqZ9x4=; b=m9Cg2cp+p+t2pIp0uuxeHQ1L1/jGi7AJnsL5PbiaOsgdle8ejBj7XdCwIiQJxfVo/3 u023kFs4bsgzP9CeuN1QW4r4LsqIB6yajViLucgeue3bn01WCzETC5LnEJyR+q2rH/xx 9i/fk88a6QWt/fJVIUXhSxuFvMQw9559ffYfnGnytdCaU34nA/SLOxt5fKFo5asDETcz QyEbcZVHUymFxlfC1EqqfL4s3fDyFiZRpVNcPCa3aFHZ13S9tav2xGvVOO7o9QNXcWyg /KeoFYVi2fiY5PoyR9cYEKQz+8yEjNWLsOfl/vYE6KKTY9ZrImUyICk1bi7GyRuhi4VQ j38g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 44si760896plb.57.2019.01.03.14.14.38; Thu, 03 Jan 2019 14:15:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731583AbfACQh4 (ORCPT + 99 others); Thu, 3 Jan 2019 11:37:56 -0500 Received: from outbound-smtp11.blacknight.com ([46.22.139.106]:56538 "EHLO outbound-smtp11.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729598AbfACQhz (ORCPT ); Thu, 3 Jan 2019 11:37:55 -0500 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp11.blacknight.com (Postfix) with ESMTPS id DC75F1C2349 for ; Thu, 3 Jan 2019 16:37:51 +0000 (GMT) Received: (qmail 16531 invoked from network); 3 Jan 2019 16:37:51 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[37.228.229.96]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 3 Jan 2019 16:37:51 -0000 Date: Thu, 3 Jan 2019 16:37:50 +0000 From: Mel Gorman To: Dmitry Vyukov Cc: Vlastimil Babka , syzbot , Andrea Arcangeli , Andrew Morton , "Kirill A. Shutemov" , LKML , Linux-MM , linux@dominikbrodowski.net, Michal Hocko , David Rientjes , syzkaller-bugs , xieyisheng1@huawei.com, zhong jiang , Peter Zijlstra , Ingo Molnar , Qian Cai Subject: Re: possible deadlock in __wake_up_common_lock Message-ID: <20190103163750.GH31517@techsingularity.net> References: <000000000000f67ca2057e75bec3@google.com> <1194004c-f176-6253-a5fd-682472dccacc@suse.cz> <20190102180611.GE31517@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 02, 2019 at 07:29:43PM +0100, Dmitry Vyukov wrote: > > > This wakeup_kswapd is new due to Mel's 1c30844d2dfe ("mm: reclaim small > > > amounts of memory when an external fragmentation event occurs") so CC Mel. > > > > > > > New year new bugs :( > > Old too :( > https://syzkaller.appspot.com/#upstream-open > Well, that can ruin a day! Lets see can we knock one off the list. > > While I recognise there is no test case available, how often does this > > trigger in syzbot as it would be nice to have some confirmation any > > patch is really fixing the problem. > > This info is always available over the "dashboard link" in the report: > https://syzkaller.appspot.com/bug?extid=93d94a001cfbce9e60e1 > Noted for future reference. > In this case it's 1. I don't know why. Lock inversions are easier to > trigger in some sense as information accumulates globally. Maybe one > of these stacks is hard to trigger, or maybe all these stacks are > rarely triggered on one machine. While the info accumulates globally, > non of the machines are actually run for any prolonged time: they all > crash right away on hundreds of known bugs. > > So good that Qian can reproduce this. I think this might simply be hard to reproduce. I tried for hours on two separate machines and failed. Nevertheless this should still fix it and hopefully syzbot picks this up automaticlly when cc'd. If I hear nothing, I'll send the patch unconditionally (and cc syzbot). Hopefully Qian can give it a whirl too. Thanks --8<-- mm, page_alloc: Do not wake kswapd with zone lock held syzbot reported the following and it was confirmed by Qian Cai that a similar bug was visible from a different context. ====================================================== WARNING: possible circular locking dependency detected 4.20.0+ #297 Not tainted ------------------------------------------------------ syz-executor0/8529 is trying to acquire lock: 000000005e7fb829 (&pgdat->kswapd_wait){....}, at: __wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120 but task is already holding lock: 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock include/linux/spinlock.h:329 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk mm/page_alloc.c:2548 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist mm/page_alloc.c:3021 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist mm/page_alloc.c:3050 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue mm/page_alloc.c:3072 [inline] 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491 It appears to be a false positive in that the only way the lock ordering should be inverted is if kswapd is waking itself and the wakeup allocates debugging objects which should already be allocated if it's kswapd doing the waking. Nevertheless, the possibility exists and so it's best to avoid the problem. This patch flags a zone as needing a kswapd using the, surprisingly, unused zone flag field. The flag is read without the lock held to do the wakeup. It's possible that the flag setting context is not the same as the flag clearing context or for small races to occur. However, each race possibility is harmless and there is no visible degredation in fragmentation treatment. While zone->flag could have continued to be unused, there is potential for moving some existing fields into the flags field instead. Particularly read-mostly ones like zone->initialized and zone->contiguous. Signed-off-by: Mel Gorman --- include/linux/mmzone.h | 6 ++++++ mm/page_alloc.c | 8 +++++++- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index cc4a507d7ca4..842f9189537b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -520,6 +520,12 @@ enum pgdat_flags { PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */ }; +enum zone_flags { + ZONE_BOOSTED_WATERMARK, /* zone recently boosted watermarks. + * Cleared when kswapd is woken. + */ +}; + static inline unsigned long zone_managed_pages(struct zone *zone) { return (unsigned long)atomic_long_read(&zone->managed_pages); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cde5dac6229a..d295c9bc01a8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2214,7 +2214,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page, */ boost_watermark(zone); if (alloc_flags & ALLOC_KSWAPD) - wakeup_kswapd(zone, 0, 0, zone_idx(zone)); + set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); /* We are not allowed to try stealing from the whole block */ if (!whole_block) @@ -3102,6 +3102,12 @@ struct page *rmqueue(struct zone *preferred_zone, local_irq_restore(flags); out: + /* Separate test+clear to avoid unnecessary atomics */ + if (test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) { + clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); + wakeup_kswapd(zone, 0, 0, zone_idx(zone)); + } + VM_BUG_ON_PAGE(page && bad_range(zone, page), page); return page;