Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp1212489pxy; Thu, 6 May 2021 03:06:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz/mNrDOUzNE7lKB1FOYFNZFY4qIKppb0vSPZid8oSr7uUL8C7ha5P1RGpIS/YzDmah/YSX X-Received: by 2002:a05:6402:2283:: with SMTP id cw3mr4158751edb.122.1620295608717; Thu, 06 May 2021 03:06:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620295608; cv=none; d=google.com; s=arc-20160816; b=0+H1EGLuo0lEKOzTauJSxgXbFq+4816poGefCcGyn3SKy4q0WJDOBO8RZUzLZT2cm/ xA8NST9hj14SKtB6hvUizEsCieaAwnnbUGT6vkUkhGcpt+60EBPjTIWvqrWLTY5gcMuk zLswJ62scP8k44Tpvhhb9L+gmPkhXmSGK5uHzYtOr6EWETeCiCjm8kofN34/ruLulnrX O9FegJybQXy0vXw9v7eC8qW4whbexDNbYyNpCtJ+RC9m8k7DS+Hr+qc3CBp6XfS4ehtR xXETKE5dloFQY+lf9bt7lSlFrp/9+R+9BPh7tesN1ssn77nmst0RzauQ2Puk7mifObK8 Qz1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=YIpEo11129CJ4rw/TN9z5pkaRyJ95s3BWLyTSKj56qA=; b=RC0RnmepKtX1EEKp4Syk0NZLn5DbJOPmqSs0dUJPbBumvBtUNr95b9cLJozsQ+g1tV asNwvWZnouro9sWjbdNmD+B8OcEEctbth0DG6volR1y0RX9XgYuSbD3Cz7cG2vUqJceo XIAHBZKqwl1piLO9pwC1aqHNOduqULTedH3nBMFA/hjCKpV82Ax27Ihqz27VH/9TxSdK LB2eGmgeeMzRrf+nlirAR+k1O0XNabdvQl0HOKaIojkZZ6D9cnmR2f7PqzZFYPDsr/Lz zyAlUWGCQ21hw/sPIrIqqXtupq6MkKTlBWiaJUkUOsWoZOk7zdad1aaDIqkP/NQL87Q+ QCfw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bn4si1876500ejb.702.2021.05.06.03.06.25; Thu, 06 May 2021 03:06:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233830AbhEFIwg (ORCPT + 99 others); Thu, 6 May 2021 04:52:36 -0400 Received: from mx2.suse.de ([195.135.220.15]:47532 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233464AbhEFIwe (ORCPT ); Thu, 6 May 2021 04:52:34 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C5426AC47; Thu, 6 May 2021 08:51:35 +0000 (UTC) Date: Thu, 6 May 2021 10:51:33 +0200 From: Oscar Salvador To: Naoya Horiguchi Cc: Mike Kravetz , Michal Hocko , Muchun Song , "akpm@linux-foundation.org" , HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+j44CA55u05LmfKQ==?= , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v2] mm,hwpoison: fix race with compound page allocation Message-ID: References: <20210423080153.GA78658@hori.linux.bs1.fc.nec.co.jp> <20210428074654.GA2093897@u2004> <20210428082344.GA29213@linux> <20210428091835.GA273940@hori.linux.bs1.fc.nec.co.jp> <20210506013122.GA2240524@u2004> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210506013122.GA2240524@u2004> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 06, 2021 at 10:31:22AM +0900, Naoya Horiguchi wrote: > From: Naoya Horiguchi > Date: Thu, 6 May 2021 09:54:39 +0900 > Subject: [PATCH] mm,hwpoison: fix race with compound page allocation > > When hugetlb page fault (under overcommiting situation) and memory_failure() > race, VM_BUG_ON_PAGE() is triggered by the following race: > > CPU0: CPU1: > > gather_surplus_pages() > page = alloc_surplus_huge_page() > memory_failure_hugetlb() > get_hwpoison_page(page) > __get_hwpoison_page(page) > get_page_unless_zero(page) > zero = put_page_testzero(page) > VM_BUG_ON_PAGE(!zero, page) > enqueue_huge_page(h, page) > put_page(page) > > __get_hwpoison_page() only checks page refcount before taking additional > one for memory error handling, which is wrong because there's time > windows where compound pages have non-zero refcount during initialization. > > So makes __get_hwpoison_page() check more page status for a few types > of compound pages. PageSlab() check is added because otherwise > "non anonymous thp" path is wrongly chosen. > > Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling") > Signed-off-by: Naoya Horiguchi > Reported-by: Muchun Song > Cc: stable@vger.kernel.org # 5.12+ Hi Naoya, thanks for the patch. I have some concerns though, more below: > --- > mm/memory-failure.c | 53 +++++++++++++++++++++++++++------------------ > 1 file changed, 32 insertions(+), 21 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index a3659619d293..966a1d6b0bc8 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1095,30 +1095,41 @@ static int __get_hwpoison_page(struct page *page) > { > struct page *head = compound_head(page); > > - if (!PageHuge(head) && PageTransHuge(head)) { > - /* > - * Non anonymous thp exists only in allocation/free time. We > - * can't handle such a case correctly, so let's give it up. > - * This should be better than triggering BUG_ON when kernel > - * tries to touch the "partially handled" page. > - */ > - if (!PageAnon(head)) { > - pr_err("Memory failure: %#lx: non anonymous thp\n", > - page_to_pfn(page)); > - return 0; > + if (PageCompound(page)) { > + if (PageSlab(page)) { > + return get_page_unless_zero(page); > + } else if (PageHuge(head)) { > + int ret = 0; > + > + spin_lock(&hugetlb_lock); > + if (HPageFreed(head) || HPageMigratable(head)) > + ret = get_page_unless_zero(head); > + spin_unlock(&hugetlb_lock); > + return ret; Ok, I am probably overthinking this but should we re-check under the lock wehther the page is a hugetlb page? My concern is, what would happen if: CPU0 CPU1 __get_hwpoison_page PageHuge(head) == T dissolve hugetlb page hugetlb_lock In that case, by the time we get to check hugetlb flags, those checks might return false, and we do not get a refcount. So, I guess my question is: Should we re-check under the lock, and if it is not, do a "goto try_to_get_ref" that starts right at the beginning, or goes directly to the get_page_unless_zero at the end (the former probably better)? As I said, I might be overthinking this, but well. -- Oscar Salvador SUSE L3