Received: by 2002:ab2:69cc:0:b0:1f4:be93:e15a with SMTP id n12csp1820134lqp; Mon, 15 Apr 2024 20:26:29 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU8eL5vbP8TqqbobX46mW8Qg/TQU55D0QDzrp+9/3d+zaJBObEhBAryegVhQ5NBxMbnmpbleMJnrx3ljePt0e1oMrXuS44oGqhZiDYx8w== X-Google-Smtp-Source: AGHT+IFod4pVjjem9AdiOxb7nMiMf4KC0+nDdauuxX0hC74TD6XW9YWtdCkFhjrJu1emzw5/7YyI X-Received: by 2002:a05:6a00:2394:b0:6ee:1508:edc3 with SMTP id f20-20020a056a00239400b006ee1508edc3mr11664647pfc.26.1713237988916; Mon, 15 Apr 2024 20:26:28 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713237988; cv=pass; d=google.com; s=arc-20160816; b=Q3gkqg571a5auyN/Wr4CR534UaXNPh9oRVjDBxJU850NuwC4/Nfk6YD7gGsWDppANJ vf2F2Wtfrok3bu1dj+tAwKgQnsiD2n2TCgd04nUIcfKHNxOsrTqyRgO1RkMAkvQbq4p6 NW+tY0744H2G2f3TdwOzkAqYzgZBoMoO3wH3XnbmM42Cgt8wyLPBhZJ/N3zrwh7fJj9Y hP2GhG2akiyjjfXRMDP9esvXvJwDAfiwKXPoBV1HzN95Yuly3/9JjskLLNCTww2+E2DQ cqGf5frlyR3lNoywd5ly+RyOG6eEMLJRTMUBF7Gvp9+7m7+ZrwIzBIMD7/MdnskkFjdq CrMA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:user-agent:date :message-id:from:references:cc:to:subject; bh=n/WLy+zPi1qKHK5T/ri5MKccUS478PnzcpiR4G7ju/o=; fh=lBQVOleId17WrbMD42Il/fBwsBRU97tQ2u+mj16sruQ=; b=mqby4zqtDlobrVLubryxc0fZyNDLk8Y5thAMA9xErIJdkwkpxPY6VAF7HVbw48Aszq IfG0nYn5znoDc9xPivgPp9/lgaq9C5wwJttnJhVZEUK+2SvqkUdhyUGXt05UNHp4+rxN zUsxedX1gI7EB0OfBerUTiwkcwna80gDI1yXbxiK9y4u4f5rz9u453ZtUnyInYQsKJcC 2W85pkozCPKF9exWoMCkHFp5tWyLX2zlLkvpCNlInMwxKEzRwhgKaLIYkeqWmpGchOU1 HyPzndSZHwA0RKgDqkid/4inCYOtaMpL+tJNKTH/eH/NUjXIC3uMgg3GEO2vPXvP/oif 7Qxw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-146174-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-146174-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id q10-20020aa7982a000000b006eab2458b76si8989696pfl.252.2024.04.15.20.26.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Apr 2024 20:26:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-146174-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huawei.com dmarc=pass fromdomain=huawei.com); spf=pass (google.com: domain of linux-kernel+bounces-146174-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-146174-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 39FFA2869A7 for ; Tue, 16 Apr 2024 03:24:51 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6CA9A376E5; Tue, 16 Apr 2024 03:23:09 +0000 (UTC) Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10DE6219E1 for ; Tue, 16 Apr 2024 03:23:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.35 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713237788; cv=none; b=EV4Mb5U037zq+7+AjFtrI9+nMq612X7on1JV1wSJUW3p7ddwRqf60j4F0YNiZltuvrSOLT2p0il2Ojn0ljVHQlMoOr8UQicuiZoi+NLx441tjz+SUT8xPjtPOpoCAaID7jObOV0siCXnpcezIlXsrlw6O9GyD0paAoDVvaDgPAM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713237788; c=relaxed/simple; bh=J7UUoh0gKM33WA+DNqobRnfpSG5tuReJRrFUzeCmW3w=; h=Subject:To:CC:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=UIsG4mUWSMGnvSjyz+6Q30a4n6YTLVwQdbIWOEMCnjANjYBm3Of1vXUNyqWmTjDnptTIclFf2rTN8ak3p3NLK4hFJDV5w0Pc3J91rYw2YHZhB8OXqPzzZZTNEQ2nPJJIP3ARkApBcqz60xVyyQaZp7aninqMpg8dAm+vJp2xoUA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.35 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4VJTn118vnz1RCqc; Tue, 16 Apr 2024 11:20:05 +0800 (CST) Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244]) by mail.maildlp.com (Postfix) with ESMTPS id 6BA721403D1; Tue, 16 Apr 2024 11:23:02 +0800 (CST) Received: from [10.173.135.154] (10.173.135.154) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Tue, 16 Apr 2024 11:23:01 +0800 Subject: Re: [PATCH v2] mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled To: Andrew Morton CC: Sidhartha Kumar , , , , References: <20240412025754.1897615-1-linmiaohe@huawei.com> <48647e5b-d15b-457b-9879-fb1b6bbaee27@oracle.com> <8d186776-f3b1-5d9a-2f94-fa249dee7d5f@huawei.com> <20240412162111.10f67ad0f001734464b53ad8@linux-foundation.org> From: Miaohe Lin Message-ID: Date: Tue, 16 Apr 2024 11:23:01 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In-Reply-To: <20240412162111.10f67ad0f001734464b53ad8@linux-foundation.org> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To canpemm500002.china.huawei.com (7.192.104.244) On 2024/4/13 7:21, Andrew Morton wrote: > On Fri, 12 Apr 2024 16:11:52 +0800 Miaohe Lin wrote: > >>> I recently sent a patch[1] to convert dissolve_free_huge_page() to folios which changes the function name and the name referenced in the comment so this will conflict with my patch. It's in mm-unstable now, would you be able to rebase to that in a new version? >>> > > This patch is a hotfixes, cc:stable one so the mm-unstable material will be > based on top of this change. > > I've queued this change up as a -fix against v1. And I've retained > this changelog addition: > > : This issue won't occur until commit a6b40850c442 ("mm: hugetlb: replace > : hugetlb_free_vmemmap_enabled with a static_key"). As it introduced > : rlock(cpu_hotplug_lock) in dissolve_free_huge_page() code path while > : lock(pcp_batch_high_lock) is already in the __page_handle_poison(). > > And I've queued another -fix to reflow that block comment to 80 columns. > > --- a/mm/memory-failure.c~mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled-v2-fix > +++ a/mm/memory-failure.c > @@ -155,14 +155,16 @@ static int __page_handle_poison(struct p > int ret; > > /* > - * zone_pcp_disable() can't be used here. It will hold pcp_batch_high_lock and > - * dissolve_free_huge_page() might hold cpu_hotplug_lock via static_key_slow_dec() > - * when hugetlb vmemmap optimization is enabled. This will break current lock > - * dependency chain and leads to deadlock. > - * Disabling pcp before dissolving the page was a deterministic approach because > - * we made sure that those pages cannot end up in any PCP list. Draining PCP lists > - * expels those pages to the buddy system, but nothing guarantees that those pages > - * do not get back to a PCP queue if we need to refill those. > + * zone_pcp_disable() can't be used here. It will > + * hold pcp_batch_high_lock and dissolve_free_huge_page() might hold > + * cpu_hotplug_lock via static_key_slow_dec() when hugetlb vmemmap > + * optimization is enabled. This will break current lock dependency > + * chain and leads to deadlock. > + * Disabling pcp before dissolving the page was a deterministic > + * approach because we made sure that those pages cannot end up in any > + * PCP list. Draining PCP lists expels those pages to the buddy system, > + * but nothing guarantees that those pages do not get back to a PCP > + * queue if we need to refill those. > */ > ret = dissolve_free_huge_page(page); > if (!ret) { > _ > Many thanks for doing this. :) . > . >