Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp4666180ybi; Tue, 11 Jun 2019 10:21:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqwZaKHbrks7Jy5pTzhXbmvrX5fA9+ddDsAoKMepkzmrgNT9EMUksWqizfUNHLiEl3fs0tDI X-Received: by 2002:a65:62cc:: with SMTP id m12mr21102610pgv.237.1560273702518; Tue, 11 Jun 2019 10:21:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560273702; cv=none; d=google.com; s=arc-20160816; b=AKUTGaX1Ef2nCMX7lQt+iNEXRyXOLcgxzeqZsXykKYGiaOC7ARZlN6ZjtziiI6IqFV xT2M85sX45hm0bLOyvcunozEdNY2Mqlj98X8P2BNlV8S+ZCRRV3BWoRR3qcnQu2WQtdE f0WcEDHgFl6mFLA34JcQo/Xy3ef5X1JkAC1BoGFpzLJmzI3bj/k/QgDFDY1CfElCWqed x7vOKYfQ+AQUFiZLTsr1sbsCs2kEyt4/c/fbGWyQrSS1z051neqUp5Dj0LI2s+xoYvD8 jjshjQ5k/IBU28R+bAf6+MH5ncc7fbMC8258Dp387ZI5u5AY3ctLi29MVHHZplkUbvHZ Umgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=FRXVWJIPkmJFeMjoY+ciz/9M1AcGX3oAGMSsEXsUQc4=; b=LRVS+ebMjdBeCeYBOVK5ecHARN0+OFb7+o4V1ZZiSZtpSHRskJ/X7L9qA7FIMqkiOv xZEQRchFOTHM8iRIEFa0J06J9v+tomdFX8B3m50UwQUVqBLEbnx9YOESygbfM8b5V4Hf wPxL7+NYeOf79OuiyZe8d6V/dqRNhM2GxH0jeW2UnBViq9MMV2Txfd9qYd8yNCWoKtZR h6ZBxDotdrPOAwL7PRwsBxhEja73jXj9XnbCyNFW8PGr4WTkgN+X7MVigXjmAC1RkaJv MRo2asmgvGmfQr6oShP7Dd9Yfpv84c5dCpGxP7sDGBh5SPu9APnZZI+TC8vBf9KPtuSM 5RUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=TczzrgYM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r10si12475967plo.117.2019.06.11.10.21.27; Tue, 11 Jun 2019 10:21:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=TczzrgYM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405940AbfFKRQf (ORCPT + 99 others); Tue, 11 Jun 2019 13:16:35 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:55828 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405226AbfFKRQe (ORCPT ); Tue, 11 Jun 2019 13:16:34 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x5BHDdFH022687; Tue, 11 Jun 2019 17:16:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=FRXVWJIPkmJFeMjoY+ciz/9M1AcGX3oAGMSsEXsUQc4=; b=TczzrgYMbt3/n5gl6dsdWiKXYMRRdAJ/TsGM2YoyeVTLRirZjvoIH55EQDebk9A4vZBo Gq4LZkULh8VT8TmOOrq5+z4VAP7WjGXqWzx9p7J6UB1CEzyAC8DNK9Xts3W7JEdvj1S5 FEcGtlf2arGiRuhTj8iBaejwGnmtQ3VXddxzHCST9EC/uBcP4a8BqyOy3P6i16iLWn8b 9Vti0heHbzqC40I65qNuwdJS1kzl0kbooJaQRHO9CnEzXo7ENuNfMEhmKaheJfrrTtQ6 x+MdJtd0eRPHIYSpYI2z4yYyuCAhqEDjCOfxfEVtwAk5+D717638fA9LovnemDtSr3DN Bw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 2t04etpm98-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Jun 2019 17:16:21 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x5BHErDd138550; Tue, 11 Jun 2019 17:16:20 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 2t04hyfmnw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 11 Jun 2019 17:16:07 +0000 Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x5BHG54t024839; Tue, 11 Jun 2019 17:16:05 GMT Received: from [192.168.1.222] (/71.63.128.209) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 11 Jun 2019 10:16:04 -0700 Subject: Re: [PATCH v2 2/2] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge To: Naoya Horiguchi , linux-mm@kvack.org Cc: Andrew Morton , Michal Hocko , xishi.qiuxishi@alibaba-inc.com, "Chen, Jerry T" , "Zhuo, Qiuxu" , linux-kernel@vger.kernel.org References: <1560154686-18497-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1560154686-18497-3-git-send-email-n-horiguchi@ah.jp.nec.com> From: Mike Kravetz Message-ID: <039dd97d-83f5-f71a-e78f-a451b0064903@oracle.com> Date: Tue, 11 Jun 2019 10:16:03 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <1560154686-18497-3-git-send-email-n-horiguchi@ah.jp.nec.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9284 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906110110 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9284 signatures=668687 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1906110110 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/10/19 1:18 AM, Naoya Horiguchi wrote: > madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline > for hugepages with overcommitting enabled. That was caused by the suboptimal > code in current soft-offline code. See the following part: > > ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, > MIGRATE_SYNC, MR_MEMORY_FAILURE); > if (ret) { > ... > } else { > /* > * We set PG_hwpoison only when the migration source hugepage > * was successfully dissolved, because otherwise hwpoisoned > * hugepage remains on free hugepage list, then userspace will > * find it as SIGBUS by allocation failure. That's not expected > * in soft-offlining. > */ > ret = dissolve_free_huge_page(page); > if (!ret) { > if (set_hwpoison_free_buddy_page(page)) > num_poisoned_pages_inc(); > } > } > return ret; > > Here dissolve_free_huge_page() returns -EBUSY if the migration source page > was freed into buddy in migrate_pages(), but even in that case we actually > has a chance that set_hwpoison_free_buddy_page() succeeds. So that means > current code gives up offlining too early now. > > dissolve_free_huge_page() checks that a given hugepage is suitable for > dissolving, where we should return success for !PageHuge() case because > the given hugepage is considered as already dissolved. > > This change also affects other callers of dissolve_free_huge_page(), > which are cleaned up together. > > Reported-by: Chen, Jerry T > Tested-by: Chen, Jerry T > Signed-off-by: Naoya Horiguchi > Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining") > Cc: # v4.19+ > --- > mm/hugetlb.c | 15 +++++++++------ > mm/memory-failure.c | 5 +---- > 2 files changed, 10 insertions(+), 10 deletions(-) > > diff --git v5.2-rc3/mm/hugetlb.c v5.2-rc3_patched/mm/hugetlb.c > index ac843d3..048d071 100644 > --- v5.2-rc3/mm/hugetlb.c > +++ v5.2-rc3_patched/mm/hugetlb.c > @@ -1519,7 +1519,12 @@ int dissolve_free_huge_page(struct page *page) Please update the function description for dissolve_free_huge_page() as well. It currently says, "Returns -EBUSY if the dissolution fails because a give page is not a free hugepage" which is no longer true as a result of this change. > int rc = -EBUSY; > > spin_lock(&hugetlb_lock); > - if (PageHuge(page) && !page_count(page)) { > + if (!PageHuge(page)) { > + rc = 0; > + goto out; > + } > + > + if (!page_count(page)) { > struct page *head = compound_head(page); > struct hstate *h = page_hstate(head); > int nid = page_to_nid(head); > @@ -1564,11 +1569,9 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) > > for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) { > page = pfn_to_page(pfn); > - if (PageHuge(page) && !page_count(page)) { > - rc = dissolve_free_huge_page(page); > - if (rc) > - break; > - } We may want to consider keeping at least the PageHuge(page) check before calling dissolve_free_huge_page(). dissolve_free_huge_pages is called as part of memory offline processing. We do not know if the memory to be offlined contains huge pages or not. With your changes, we are taking hugetlb_lock on each call to dissolve_free_huge_page just to discover that the page is not a huge page. You 'could' add a PageHuge(page) check to dissolve_free_huge_page before taking the lock. However, you would need to check again after taking the lock. -- Mike Kravetz