Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2323556rwd; Fri, 19 May 2023 04:20:14 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4v6AhIV0aDois2lE20Z3P9DuM60xHWqLpEK7uvAif0s/xV3MKP8348+KlNz4SUPzl4VV4c X-Received: by 2002:a05:6a20:32a9:b0:e5:58e6:be37 with SMTP id g41-20020a056a2032a900b000e558e6be37mr1295841pzd.61.1684495214604; Fri, 19 May 2023 04:20:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684495214; cv=none; d=google.com; s=arc-20160816; b=wC6XAGWihkdR2oEZ3kYeLSeApwZPpPm7wvFwoLm3A0uZRyOQeoJNZ7JnrZ2A3uELnU hI1xbm/r44W8yW9cN0EUEuP7yf9QtmAqrJpSjXyQz3FCXJkbjnw+y5KgBtNOiAhR9Qxs /+TWpAK78fA6e6yUhT8xjcME2Zfx0qbVA86gJNNOfFXhu5NUixymyHSpKldQRkv5XH1h yCthXPVgL4gQAJPSVcXDWG0EZ5OBfuCk7g217kQn+kROM2Cdy5dGKe6Xy9zn+yh7Z1Z1 BtUXIijZ8gIUugJ9HhzkdNqcZKqfK7TxiLz2hqvVsVQNYVu+sv9huFTYKBPPXw9lBQCQ kVDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=4+L9HR/cxdd2EoQqIApS/F4mIL7RlSJb5MUrnsXMoGw=; b=ij1zNTWZzIAfkURrvxBnyFXOy1D6xg7YO0x6cCuqjjFstfOO7BkK5r8hvFv0sz37br dCNhcl1Obhr/aCeYzObBxGjpEWFmO4ueqcGWrURK3LljIO82lFsEL0gSEUt64gtz4IE4 F0mTj4mnq2kA7zV3RfJszu6aiUE0jNdfRrtUEYEdoEmzN0QpU4K3E2sE5zjqgmcLcdLu sZLAH2kcdpXacwsBDI85CCtlES6BGuo7n9+gb+CB6vxedbxIMI8+h2vwlv57Zq0zdHK/ V3zKhWr4yD6Hg6DAAqobZbrisADD1G7XtK1/6WaIVWdcyE/ZDdt6Tl/R8VCx37bl9yRQ H8rQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=yHVqeGy7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g189-20020a636bc6000000b00524f08d75f5si3481104pgc.567.2023.05.19.04.19.59; Fri, 19 May 2023 04:20:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=yHVqeGy7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230313AbjESLOE (ORCPT + 99 others); Fri, 19 May 2023 07:14:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229548AbjESLOD (ORCPT ); Fri, 19 May 2023 07:14:03 -0400 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B59819F for ; Fri, 19 May 2023 04:14:01 -0700 (PDT) Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-510e90d785fso2446229a12.2 for ; Fri, 19 May 2023 04:14:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1684494840; x=1687086840; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=4+L9HR/cxdd2EoQqIApS/F4mIL7RlSJb5MUrnsXMoGw=; b=yHVqeGy7uesoSL052/4GNJ7wDGNjsh5p/SyHiocWBSC4ig0qJmsUMSFxZjUXQpG4II sx7bvOG1WgdahaajQFDCdQNo5lq+Cl3GzPX3bimadgE1GtrCtvkllj7lvNHXlEf11+dO FjHor6izgaxO2yKOeW8DZoHHbUaNHgnlV7vka9hwVyN0OidXMy0dIA0MB+2FOrflG1dv JgictY+692VK0L9C38hnFv0IPYcg1Zov2Le+udVdTPa4+XcrJYI5cQBn1BxNoCKLO+cO GsUWwvGgM4XQXNA4uxQ0MvYUisPfsvHohd3QnI/xMwqqccJQwg0gmpTlarm9RCHajqz1 rqsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684494840; x=1687086840; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4+L9HR/cxdd2EoQqIApS/F4mIL7RlSJb5MUrnsXMoGw=; b=kaSTanGLquAGK6AmcX7EovY2mUKqWs1e0XqygEYp6tP0B0gVsQfvFF2Z4V/PhCxzQ+ iXupJZquh+NiVyFxqOpc351a8Or+txj5Z6I/qpX246X192e0Tl74i4x+TwyjvY3AByh1 eEwznpBEFg1acjIzQiY+7nNIGwCKqch/FBXm8302KecO65oymMKuGkX2RCG+rhX+OJGz EGcsVvc5lz9gp4NwDQ+ApQC6IC+iSH5BJykBOaVDdfPxxAU0/wSlOH4UT/vhkBYslWke XisETqkuvYAX06F4l8TWMNiNGavDhoFXm+ikN0zxh10VDLafZD8HshfUbM3NQgoW5seX NaPQ== X-Gm-Message-State: AC+VfDyC8WYaV8Pg/S81gYfEQiVeVCbDB/uqQdM/eBDT6DznRHvODRXR VqjO+PJtycKRfbJ4qbfaO4FsHQ== X-Received: by 2002:aa7:ce02:0:b0:510:d197:e873 with SMTP id d2-20020aa7ce02000000b00510d197e873mr1298299edv.31.1684494839920; Fri, 19 May 2023 04:13:59 -0700 (PDT) Received: from localhost ([2a02:8070:6389:7d40:e266:3092:9afb:a7b1]) by smtp.gmail.com with ESMTPSA id x23-20020aa7dad7000000b005067d089aafsm1536743eds.11.2023.05.19.04.13.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 May 2023 04:13:59 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Mel Gorman , Vlastimil Babka , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock Date: Fri, 19 May 2023 13:13:59 +0200 Message-Id: <20230519111359.40475-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org During stress testing with higher-order allocations, a deadlock scenario was observed in compaction: One GFP_NOFS allocation was sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in the system were busy with compactors spinning on buffer locks held by the sleeping GFP_NOFS allocation. Reclaim is susceptible to this same deadlock; we fixed it by granting GFP_NOFS allocations additional LRU isolation headroom, to ensure it makes forward progress while holding fs locks that other reclaimers might acquire. Do the same here. This code has been like this since compaction was initially merged, and I only managed to trigger this with out-of-tree patches that dramatically increase the contexts that do GFP_NOFS compaction. While the issue is real, it seems theoretical in nature given existing allocation sites. Worth fixing now, but no Fixes tag or stable CC. Signed-off-by: Johannes Weiner --- mm/compaction.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) v2: - clarify too_many_isolated() comment (Mel) - split isolation deadlock from no-contiguous-anon lockups as that's a different scenario and deserves its own patch diff --git a/mm/compaction.c b/mm/compaction.c index c8bcdea15f5f..c9a4b6dffcf2 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc, } /* Similar to reclaim, but different enough that they don't share logic */ -static bool too_many_isolated(pg_data_t *pgdat) +static bool too_many_isolated(struct compact_control *cc) { + pg_data_t *pgdat = cc->zone->zone_pgdat; bool too_many; unsigned long active, inactive, isolated; @@ -758,6 +759,17 @@ static bool too_many_isolated(pg_data_t *pgdat) isolated = node_page_state(pgdat, NR_ISOLATED_FILE) + node_page_state(pgdat, NR_ISOLATED_ANON); + /* + * Allow GFP_NOFS to isolate past the limit set for regular + * compaction runs. This prevents an ABBA deadlock when other + * compactors have already isolated to the limit, but are + * blocked on filesystem locks held by the GFP_NOFS thread. + */ + if (cc->gfp_mask & __GFP_FS) { + inactive >>= 3; + active >>= 3; + } + too_many = isolated > (inactive + active) / 2; if (!too_many) wake_throttle_isolated(pgdat); @@ -806,7 +818,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * list by either parallel reclaimers or compaction. If there are, * delay for some time until fewer pages are isolated */ - while (unlikely(too_many_isolated(pgdat))) { + while (unlikely(too_many_isolated(cc))) { /* stop isolation if there are still pages not migrated */ if (cc->nr_migratepages) return -EAGAIN; -- 2.40.0