Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3944528pxj; Mon, 21 Jun 2021 09:55:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxttr9J9cqT3G9kz5aISIo4VbWaTyhcGCojclzDZQYW4wmfXIFTnRdrJqiQM8LOL/XomkWx X-Received: by 2002:a17:907:3f23:: with SMTP id hq35mr23752902ejc.477.1624294546895; Mon, 21 Jun 2021 09:55:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624294546; cv=none; d=google.com; s=arc-20160816; b=tpLjGonc2vCcpbejA778lUNJMBsk39hmHHOgQ1CoJyBpHBAT1Tp6AF6zi1FxQdKuvb 13bVS7GU6LkQjg9uSHKYIFJD1iA8gt9BIIUAV8GRpILzHMmeSY0KprHsbViRcXwJgCn7 LsLklpeS5m9R2UfQBD+6zpfMpuNDFUACD6BFK0BgBrNPnFmIhy3I7ozwldMqplc2hAyl Iy0OzuCFLNjL5OSnvU2+niHr/BIfsS+Xr6OqjjwLwe7aJNR+Xpy5la5JbUHM1eccdDo2 qiAkoHAohjyJy+JvAMC5jcUHxH/xG8lxS49qUr+DyNT3+8zEX+YUseauy0byWPQ0XDGq nQWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=vjtdWzL7TveAljo5Vqtu1B1Zp3/PiCrXozwSgeEh/8Y=; b=GLUfz8+/nLI0QI3OaJlUIY5ezdtoo0hyBhFBcuhxK7SxF+VbsXb+2geGYqZtkwXN4+ g0SOMpeHHIeBwl4OWUDaDvMjubpvo40msfVVRMqPMcolCfguyMjniN2E6PMFCA+qDyD1 fcLUOEJ5MeaxKR61QDd+ep09zerk8nlAXa3iQzt+hlhanQZcOaFydVjr8AuFUFVvOgda bSuqb9ApGwA8ylKGojET9/K2/enE8qr+BmjgciAEW1uVK3lAZWcDfYFW+6qcittxlvcr U5Ho9GSMeP1fSs5goMg0oxlJ3R/9la28MpMebnLEEeUz/prwQyuo9MSIs8LzqwmPrcUI joNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=XMyxJq0L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id og18si9009921ejc.277.2021.06.21.09.55.24; Mon, 21 Jun 2021 09:55:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=XMyxJq0L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233477AbhFUQy2 (ORCPT + 99 others); Mon, 21 Jun 2021 12:54:28 -0400 Received: from mail.kernel.org ([198.145.29.99]:41450 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233648AbhFUQuG (ORCPT ); Mon, 21 Jun 2021 12:50:06 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8D906613E9; Mon, 21 Jun 2021 16:34:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1624293300; bh=DTdE6AkAOZ9bgUSZqgZTzSc4lGf74S9EoZ3oVtIhUwE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XMyxJq0LHR36q8p51CCpIcLfb41tPzyPftgjFKv8f13PGLbY+YI83kR8+HN4Yr3OM bYD5HBbQyE9qCrCB+1YOZya6sDFQ95Pgy+YId6aid3w+nDJ02MaPSgFVYULilrXPkE MbMqxEr+JEKg6X9qKrddBGKdBNUFMeV9cMyrS2r4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Naoya Horiguchi , Muchun Song , Mike Kravetz , Oscar Salvador , Michal Hocko , Tony Luck , Andrew Morton , Linus Torvalds Subject: [PATCH 5.12 166/178] mm,hwpoison: fix race with hugetlb page allocation Date: Mon, 21 Jun 2021 18:16:20 +0200 Message-Id: <20210621154928.395678837@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210621154921.212599475@linuxfoundation.org> References: <20210621154921.212599475@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Naoya Horiguchi commit 25182f05ffed0b45602438693e4eed5d7f3ebadd upstream. When hugetlb page fault (under overcommitting situation) and memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race: CPU0: CPU1: gather_surplus_pages() page = alloc_surplus_huge_page() memory_failure_hugetlb() get_hwpoison_page(page) __get_hwpoison_page(page) get_page_unless_zero(page) zero = put_page_testzero(page) VM_BUG_ON_PAGE(!zero, page) enqueue_huge_page(h, page) put_page(page) __get_hwpoison_page() only checks the page refcount before taking an additional one for memory error handling, which is not enough because there's a time window where compound pages have non-zero refcount during hugetlb page initialization. So make __get_hwpoison_page() check page status a bit more for hugetlb pages with get_hwpoison_huge_page(). Checking hugetlb-specific flags under hugetlb_lock makes sure that the hugetlb page is not transitive. It's notable that another new function, HWPoisonHandlable(), is helpful to prevent a race against other transitive page states (like a generic compound page just before PageHuge becomes true). Link: https://lkml.kernel.org/r/20210603233632.2964832-2-nao.horiguchi@gmail.com Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling") Signed-off-by: Naoya Horiguchi Reported-by: Muchun Song Acked-by: Mike Kravetz Cc: Oscar Salvador Cc: Michal Hocko Cc: Tony Luck Cc: [5.12+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 15 +++++++++++++++ mm/memory-failure.c | 29 +++++++++++++++++++++++++++-- 3 files changed, 48 insertions(+), 2 deletions(-) --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -145,6 +145,7 @@ bool hugetlb_reserve_pages(struct inode long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); bool isolate_huge_page(struct page *page, struct list_head *list); +int get_hwpoison_huge_page(struct page *page, bool *hugetlb); void putback_active_hugepage(struct page *page); void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason); void free_huge_page(struct page *page); @@ -330,6 +331,11 @@ static inline bool isolate_huge_page(str return false; } +static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb) +{ + return 0; +} + static inline void putback_active_hugepage(struct page *page) { } --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5664,6 +5664,21 @@ unlock: return ret; } +int get_hwpoison_huge_page(struct page *page, bool *hugetlb) +{ + int ret = 0; + + *hugetlb = false; + spin_lock_irq(&hugetlb_lock); + if (PageHeadHuge(page)) { + *hugetlb = true; + if (HPageFreed(page) || HPageMigratable(page)) + ret = get_page_unless_zero(page); + } + spin_unlock_irq(&hugetlb_lock); + return ret; +} + void putback_active_hugepage(struct page *page) { spin_lock(&hugetlb_lock); --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -949,6 +949,17 @@ static int page_action(struct page_state return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY; } +/* + * Return true if a page type of a given page is supported by hwpoison + * mechanism (while handling could fail), otherwise false. This function + * does not return true for hugetlb or device memory pages, so it's assumed + * to be called only in the context where we never have such pages. + */ +static inline bool HWPoisonHandlable(struct page *page) +{ + return PageLRU(page) || __PageMovable(page); +} + /** * __get_hwpoison_page() - Get refcount for memory error handling: * @page: raw error page (hit by memory error) @@ -959,8 +970,22 @@ static int page_action(struct page_state static int __get_hwpoison_page(struct page *page) { struct page *head = compound_head(page); + int ret = 0; + bool hugetlb = false; + + ret = get_hwpoison_huge_page(head, &hugetlb); + if (hugetlb) + return ret; + + /* + * This check prevents from calling get_hwpoison_unless_zero() + * for any unsupported type of page in order to reduce the risk of + * unexpected races caused by taking a page refcount. + */ + if (!HWPoisonHandlable(head)) + return 0; - if (!PageHuge(head) && PageTransHuge(head)) { + if (PageTransHuge(head)) { /* * Non anonymous thp exists only in allocation/free time. We * can't handle such a case correctly, so let's give it up. @@ -1017,7 +1042,7 @@ try_again: ret = -EIO; } } else { - if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) { + if (PageHuge(p) || HWPoisonHandlable(p)) { ret = 1; } else { /*