Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2368727pxp; Mon, 21 Mar 2022 18:13:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzxNz4fDoM7Z1cBR5ILjGk4CDyXMBhXzviZdjyRHg0sWC6SveAztNvXzq/l/e7uu5Kh05nR X-Received: by 2002:a17:902:e94f:b0:14f:1636:c8a8 with SMTP id b15-20020a170902e94f00b0014f1636c8a8mr15314219pll.130.1647911601940; Mon, 21 Mar 2022 18:13:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647911601; cv=none; d=google.com; s=arc-20160816; b=GUbzetuBz/e9xbLA5hh5V8T6Vmo1zQkht9qyc/iDTXpox4NvddU6oa1Fr23l4oqy7L 8XCrf6XSeyDrKSy4m6v8us02qqdMEgLtjI6QyTDKPPBGlJj9ANKQ5aAFbCKZcU/TPYSv O30CDjoWg38kZCD2qusMEZwlwnNp7W7HhUngUraDgnQLpYh8e+MU4d1Fd+cHtzrwpe6b JyQnthFh0f2J4suTEMxr4NRlF1yWCpm8O+wCjX0CMFnm9j900D1lvHEdUmQCDJh3LKr/ uccRBYlkCm74EuI62xVw7cmn/nDDIA6vjxbi45iLpuaeRkJrggxTPjHE2q2YKSB2RCkn xc9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ZEAhtCrJUXhBfx1FoGP14Q/p+PdLnLauLi6Y8sm/760=; b=ArfhrN2rYNc/L7VITj8pOqw3MExxXiUe3tgn0RUH4a4ss2iDZFRf51kApGj/MLTGd7 2oDkIvWGgT0/hqsDbqgTLz+wKXGb2mRGdSaqdeCB1sqdf8Qlift8eEsCTESTF9ev6oIY fGkOKe/QbPJo5kNYtJA/Dg6uKClxpuXZrbMw9hyJl9XjjISArlWzGMVyLv1OlS5hE0F3 qlhYEhPhtowAY/Hxwn0+QiFHxL9SeRRByEX0FtNu3gXY12nJQxYhGshu8l4OLsz87p1R 3C/zAX55nTdmP4qQQeTJaa9jTVdo9NCd9RA2T02Q6VF8NYGb6L768Q6kzLbxQhWmDRA+ Olww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="o/BMgl9z"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id n19-20020a63ee53000000b003816043f149si14638048pgk.830.2022.03.21.18.13.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Mar 2022 18:13:21 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="o/BMgl9z"; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 44868220E3; Mon, 21 Mar 2022 17:47:10 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234449AbiCVAs2 (ORCPT + 99 others); Mon, 21 Mar 2022 20:48:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234341AbiCVAs0 (ORCPT ); Mon, 21 Mar 2022 20:48:26 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24A4CF00 for ; Mon, 21 Mar 2022 17:47:00 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id e5so2377900pls.4 for ; Mon, 21 Mar 2022 17:47:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZEAhtCrJUXhBfx1FoGP14Q/p+PdLnLauLi6Y8sm/760=; b=o/BMgl9zNdPqkzoshOb4BN3pBKErNJDh3XzX1/tMZ9NzCkJ/ZbXNDx26fRZgONGgQy UHlOMFasMuUeUcWLKEbe+OlDIcbP6nEfc+/uMl+ih9Xgk/10ZsAsbEMK6qnY7FBj9brx CZL0gZHUnWtjpudj595id8UEmGdVUCifVxfIfg10s3tXb3VVDy73Y9XoYtCGXSsTwX3o TQkD2fvSZXY0uTfQFneIYANdAi/zbH86xjyVbhgvMPX0Q1uzSdeJIQW9DuCxS6c1lhSv FxoVabG48tP91Ynm+cGuhUJaZDUVXxolY6AOQWrYoZ/+kIUKCCBwrgIExCu+d1wthw1T YwpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZEAhtCrJUXhBfx1FoGP14Q/p+PdLnLauLi6Y8sm/760=; b=KACme7+RqnrRQXpUzu7HisUzuYiZmbgOUTfS+EOV3oeH6SzqqNEKA/fC8UmlnR1NDq 24uY4tkfP9FsI4l25MKX3FKNYMPmM+BLKQ6av6WDJRsMRpXJyndeEE5GHnkjZFNKPQB4 4HwKOrBG3Ze+9eRnKFQVaXbj7KBc7zKzuhQ/h+ZHSNri6T1n2OopwuVmbEX8hvsaR8Xf GjxJFIUqLijtqu0ffZbvG4qZmeEpSrlaxCA0kUt6eyi1RD6JgOkNZW/ztohMczQlvDAW N491JB6Ftxg8OvPzDNDVmMI6xabMiNK5Cbrh7EAP5k/7P9GIP0yDZB1KmyfPu5TBKN3k 57Yw== X-Gm-Message-State: AOAM531aFWi/HRhOlinhyf65IfSIDWVJDbmVaN4jFbAnhhXAkPeTFlUN Oc4v3CMKnbTs7UF7VpgG5S2UNoS8aNBblj2BHYY= X-Received: by 2002:a17:903:1cd:b0:154:5edf:5704 with SMTP id e13-20020a17090301cd00b001545edf5704mr7063631plh.26.1647910019578; Mon, 21 Mar 2022 17:46:59 -0700 (PDT) MIME-Version: 1.0 References: <20220318051612.271802-1-naoya.horiguchi@linux.dev> In-Reply-To: <20220318051612.271802-1-naoya.horiguchi@linux.dev> From: Yang Shi Date: Mon, 21 Mar 2022 17:46:48 -0700 Message-ID: Subject: Re: [PATCH v5] mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() To: Naoya Horiguchi Cc: Linux MM , Andrew Morton , Mike Kravetz , Miaohe Lin , Naoya Horiguchi , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 17, 2022 at 10:16 PM Naoya Horiguchi wrote: > > From: Naoya Horiguchi > > There is a race condition between memory_failure_hugetlb() and hugetlb > free/demotion, which causes setting PageHWPoison flag on the wrong page. > The one simple result is that wrong processes can be killed, but another > (more serious) one is that the actual error is left unhandled, so no one > prevents later access to it, and that might lead to more serious results > like consuming corrupted data. > > Think about the below race window: > > CPU 1 CPU 2 > memory_failure_hugetlb > struct page *head = compound_head(p); > hugetlb page might be freed to > buddy, or even changed to another > compound page. > > get_hwpoison_page -- page is not what we want now... > > The compound_head is called outside hugetlb_lock, so the head is not > reliable. > > So set PageHWPoison flag after passing prechecks. And to detect > potential violation, this patch also introduces a new action type > MF_MSG_DIFFERENT_PAGE_SIZE. > > Reported-by: Mike Kravetz > Signed-off-by: Naoya Horiguchi > Signed-off-by: Miaohe Lin > Cc: > --- > ChangeLog v4 -> v5: > - call TestSetPageHWPoison() when page_handle_poison() fails. > - call TestSetPageHWPoison() for unhandlable cases (MF_MSG_UNKNOWN and > MF_MSG_DIFFERENT_PAGE_SIZE). > - Set PageHWPoison on the head page only when the error page is surely > a hugepage, otherwise set the flag on the raw page. > - rebased onto v5.17-rc8-mmotm-2022-03-16-17-42 > > ChangeLog v3 -> v4: > - squash with "mm/memory-failure.c: fix race with changing page > compound again". > - update patch subject and description based on it. > > ChangeLog v2 -> v3: > - rename the patch because page lock is not the primary factor to > solve the reported issue. > - updated description in the same manner. > - call page_handle_poison() instead of __page_handle_poison() for > free hugepage case. > - reorder put_page and unlock_page (thanks to Miaohe Lin) > > ChangeLog v1 -> v2: > - pass subpage to get_hwpoison_huge_page() instead of head page. > - call compound_head() in hugetlb_lock to avoid race with hugetlb > demotion/free. > --- > mm/hugetlb.c | 8 +++-- > mm/memory-failure.c | 75 +++++++++++++++++++++++++++------------------ > 2 files changed, 51 insertions(+), 32 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index fbf598bbc4e3..d8ef67c049e4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6777,14 +6777,16 @@ bool isolate_huge_page(struct page *page, struct list_head *list) > > int get_hwpoison_huge_page(struct page *page, bool *hugetlb) > { > + struct page *head; > int ret = 0; > > *hugetlb = false; > spin_lock_irq(&hugetlb_lock); > - if (PageHeadHuge(page)) { > + head = compound_head(page); > + if (PageHeadHuge(head)) { > *hugetlb = true; > - if (HPageFreed(page) || HPageMigratable(page)) > - ret = get_page_unless_zero(page); > + if (HPageFreed(head) || HPageMigratable(head)) > + ret = get_page_unless_zero(head); > else > ret = -EBUSY; > } > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index e939719c0765..9323a5653dec 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1194,7 +1194,7 @@ static int __get_hwpoison_page(struct page *page, unsigned long flags) > int ret = 0; > bool hugetlb = false; > > - ret = get_hwpoison_huge_page(head, &hugetlb); > + ret = get_hwpoison_huge_page(page, &hugetlb); > if (hugetlb) > return ret; > > @@ -1281,11 +1281,10 @@ static int get_any_page(struct page *p, unsigned long flags) > > static int __get_unpoison_page(struct page *page) > { > - struct page *head = compound_head(page); > int ret = 0; > bool hugetlb = false; > > - ret = get_hwpoison_huge_page(head, &hugetlb); > + ret = get_hwpoison_huge_page(page, &hugetlb); > if (hugetlb) > return ret; > > @@ -1504,39 +1503,38 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > struct page *head = compound_head(p); > int res; > unsigned long page_flags; > - > - if (TestSetPageHWPoison(head)) { > - pr_err("Memory failure: %#lx: already hardware poisoned\n", > - pfn); > - res = -EHWPOISON; > - if (flags & MF_ACTION_REQUIRED) > - res = kill_accessing_process(current, page_to_pfn(head), flags); > - return res; > - } > - > - num_poisoned_pages_inc(); > + bool put = false; > + unsigned long already_hwpoisoned = 0; > > if (!(flags & MF_COUNT_INCREASED)) { > res = get_hwpoison_page(p, flags); > if (!res) { > lock_page(head); > if (hwpoison_filter(p)) { > - if (TestClearPageHWPoison(head)) > - num_poisoned_pages_dec(); > unlock_page(head); > return -EOPNOTSUPP; > } > unlock_page(head); > - res = MF_FAILED; > - if (__page_handle_poison(p)) { > - page_ref_inc(p); > + if (page_handle_poison(p, true, false)) { > res = MF_RECOVERED; > + } else { > + if (TestSetPageHWPoison(head)) > + already_hwpoisoned = page_to_pfn(head); > + else > + num_poisoned_pages_inc(); > + res = MF_FAILED; > } > action_result(pfn, MF_MSG_FREE_HUGE, res); > - return res == MF_RECOVERED ? 0 : -EBUSY; > + res = res == MF_RECOVERED ? 0 : -EBUSY; > + goto out; > } else if (res < 0) { > + if (TestSetPageHWPoison(p)) > + already_hwpoisoned = pfn; > + else > + num_poisoned_pages_inc(); > action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED); > - return -EBUSY; > + res = -EBUSY; > + goto out; > } > } > > @@ -1547,21 +1545,31 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > * If this happens just bail out. > */ > if (!PageHuge(p) || compound_head(p) != head) { > + if (TestSetPageHWPoison(p)) > + already_hwpoisoned = pfn; > + else > + num_poisoned_pages_inc(); > action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); The commit log says "this patch also introduces a new action type MF_MSG_DIFFERENT_PAGE_SIZE", but it is not defined in the patch and it is called here. Did I miss something? > res = -EBUSY; > - goto out; > + goto unlock_page; > } > > page_flags = head->flags; > > if (hwpoison_filter(p)) { > - if (TestClearPageHWPoison(head)) > - num_poisoned_pages_dec(); > - put_page(p); > + put = true; > res = -EOPNOTSUPP; > - goto out; > + goto unlock_page; > + } > + > + if (TestSetPageHWPoison(head)) { > + put = true; > + already_hwpoisoned = page_to_pfn(head); > + goto unlock_page; > } > > + num_poisoned_pages_inc(); > + > /* > * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so > * simply disable it. In order to make it work properly, we need > @@ -1574,18 +1582,27 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > if (huge_page_size(page_hstate(head)) > PMD_SIZE) { > action_result(pfn, MF_MSG_NON_PMD_HUGE, MF_IGNORED); > res = -EBUSY; > - goto out; > + goto unlock_page; > } > > if (!hwpoison_user_mappings(p, pfn, flags, head)) { > action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED); > res = -EBUSY; > - goto out; > + goto unlock_page; > } > > return identify_page_state(pfn, p, page_flags); > -out: > +unlock_page: > unlock_page(head); > +out: > + if (put) > + put_page(p); > + if (already_hwpoisoned) { > + pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); > + res = -EHWPOISON; > + if (flags & MF_ACTION_REQUIRED) > + res = kill_accessing_process(current, already_hwpoisoned, flags); > + } > return res; > } > > -- > 2.25.1 >