Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp155936ybi; Wed, 29 May 2019 18:38:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqwr4cgvvId1U1LihHtMnsFMnEXHhdnd6oVVkm5bQ5Ya80jaKD1M1yEtW/3MdSS0FJ8hu/KR X-Received: by 2002:a62:5296:: with SMTP id g144mr869528pfb.3.1559180331403; Wed, 29 May 2019 18:38:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559180331; cv=none; d=google.com; s=arc-20160816; b=MogkWotYz25AbZbp6J6ukGMvsxd7xv/9ogn4GY1jtApiUJaAiQtDz496eipTSdhPfs HD5vlpIWw26kPCHXGULHT77MuD4sL3vqTr2jC2SPcuAPtZoQUes7twf/Yxro7X7wiNxd 8aXZld/wQDSuIq0eMgx051uyP1pd54HzGXLJE3w+StIjQawkVSA4DFJl+x6cvDQN3s7Z xS/moPRhq30r/DJnK7nU0EsWZa76g9qbfTVC18tAY1AqtIEjyF5HbiW7H/P6fIFCcTy7 9L7NSPwPlbc9J3HZNXYAiliarsU9JpuL2xMgHjxoYpjhTNTgNeCcT+Ox6Q8mgEb43Pfq oigg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from; bh=KXt9yjJmReLVLYbwW4InTngtbU1taewRPQcdtuaJsSs=; b=H2BsZVbAjNqUes9yh5YyEzvk2C18XDBrueaFyPXEaKWGcquzFMaKSP+jpQFBSK+hLo tOKi5Sn+2TUS+cxB1A6XLwTrX9ZynnPPKriFA8yOcEVeUzkZ483Gxcvap2WC+GYc2xIW 3ax/N2mWikSIvfFmJWd+vB1Ro2wdKy4cGbftMKN4NQZLyTvp/rvSpjpeflIW8AbYII+G ydyt15Adsp4QchIUonEefNT5pRgPp7zQEC0amnnHzUbjMSVrtw9L68iN0jz+kizw/YJD 3fHIpLy0X+dq8Rvol92fL/F3yvG+zA71HXlBbcPq/NteUqMs1YZblPuRRXZXPnS5zRdl 5F+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b5si1620657ple.81.2019.05.29.18.38.34; Wed, 29 May 2019 18:38:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727392AbfE3Bgl convert rfc822-to-8bit (ORCPT + 99 others); Wed, 29 May 2019 21:36:41 -0400 Received: from tyo161.gate.nec.co.jp ([114.179.232.161]:55912 "EHLO tyo161.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726527AbfE3Bgl (ORCPT ); Wed, 29 May 2019 21:36:41 -0400 Received: from mailgate01.nec.co.jp ([114.179.233.122]) by tyo161.gate.nec.co.jp (8.15.1/8.15.1) with ESMTPS id x4U1aBLY004110 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 30 May 2019 10:36:11 +0900 Received: from mailsv02.nec.co.jp (mailgate-v.nec.co.jp [10.204.236.94]) by mailgate01.nec.co.jp (8.15.1/8.15.1) with ESMTP id x4U1aB2E021407; Thu, 30 May 2019 10:36:11 +0900 Received: from mail01b.kamome.nec.co.jp (mail01b.kamome.nec.co.jp [10.25.43.2]) by mailsv02.nec.co.jp (8.15.1/8.15.1) with ESMTP id x4U1V0M6011627; Thu, 30 May 2019 10:36:11 +0900 Received: from bpxc99gp.gisp.nec.co.jp ([10.38.151.151] [10.38.151.151]) by mail02.kamome.nec.co.jp with ESMTP id BT-MMP-5509030; Thu, 30 May 2019 10:35:45 +0900 Received: from BPXM23GP.gisp.nec.co.jp ([10.38.151.215]) by BPXC23GP.gisp.nec.co.jp ([10.38.151.151]) with mapi id 14.03.0319.002; Thu, 30 May 2019 10:35:44 +0900 From: Naoya Horiguchi To: Mike Kravetz CC: "linux-mm@kvack.org" , Andrew Morton , Michal Hocko , "xishi.qiuxishi@alibaba-inc.com" , "Chen, Jerry T" , "Zhuo, Qiuxu" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v1] mm: hugetlb: soft-offline: fix wrong return value of soft offline Thread-Topic: [PATCH v1] mm: hugetlb: soft-offline: fix wrong return value of soft offline Thread-Index: AQHVFFJeyxW0HZW72U2AA7NEa5OFu6aB3awAgABy1IA= Date: Thu, 30 May 2019 01:35:44 +0000 Message-ID: <20190530013549.GA28893@hori.linux.bs1.fc.nec.co.jp> References: <1558937200-18544-1-git-send-email-n-horiguchi@ah.jp.nec.com> <81a37f9c-4a85-c18d-b882-f361c4998d45@oracle.com> In-Reply-To: <81a37f9c-4a85-c18d-b882-f361c4998d45@oracle.com> Accept-Language: en-US, ja-JP Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.34.125.150] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mike, On Wed, May 29, 2019 at 11:44:50AM -0700, Mike Kravetz wrote: > On 5/26/19 11:06 PM, Naoya Horiguchi wrote: > > Soft offline events for hugetlb pages return -EBUSY when page migration > > succeeded and dissolve_free_huge_page() failed, which can happen when > > there're surplus hugepages. We should judge pass/fail of soft offline by > > checking whether the raw error page was finally contained or not (i.e. > > the result of set_hwpoison_free_buddy_page()), so this behavior is wrong. > > > > This problem was introduced by the following change of commit 6bc9b56433b76 > > ("mm: fix race on soft-offlining"): > > > > if (ret > 0) > > ret = -EIO; > > } else { > > - if (PageHuge(page)) > > - dissolve_free_huge_page(page); > > + /* > > + * We set PG_hwpoison only when the migration source hugepage > > + * was successfully dissolved, because otherwise hwpoisoned > > + * hugepage remains on free hugepage list, then userspace will > > + * find it as SIGBUS by allocation failure. That's not expected > > + * in soft-offlining. > > + */ > > + ret = dissolve_free_huge_page(page); > > + if (!ret) { > > + if (set_hwpoison_free_buddy_page(page)) > > + num_poisoned_pages_inc(); > > + } > > } > > return ret; > > } > > > > , so a simple fix is to restore the PageHuge precheck, but my code > > reading shows that we already have PageHuge check in > > dissolve_free_huge_page() with hugetlb_lock, which is better place to > > check it. And currently dissolve_free_huge_page() returns -EBUSY for > > !PageHuge but that's simply wrong because that that case should be > > considered as success (meaning that "the given hugetlb was already > > dissolved.") > > Hello Naoya, > > I am having a little trouble understanding the situation. The code above is > in the routine soft_offline_huge_page, and occurs immediately after a call to > migrate_pages() with 'page' being the only on the list of pages to be migrated. > In addition, since we are in soft_offline_huge_page, we know that page is > a huge page (PageHuge) before the call to migrate_pages. > > IIUC, the issue is that the migrate_pages call results in 'page' being > dissolved into regular base pages. Therefore, the call to > dissolve_free_huge_page returns -EBUSY and we never end up setting PageHWPoison > on the (base) page which had the error. > > It seems that for the original page to be dissolved, it must go through the > free_huge_page routine. Once that happens, it is possible for the (dissolved) > pages to be allocated again. Is that just a known race, or am I missing > something? No, your understanding is right. I found that the last (and most important) part of patch description ("this behavior is wrong") might be wrong. Sorry about that and let me correct myself: - before commit 6bc9b56433b76, the return value of soft offline is the return of migrate_page(). dissolve_free_huge_page()'s return value is ignored. - after commit 6bc9b56433b76 soft_offline_huge_page() returns success only dissolve_free_huge_page() returns success. This change is *mainly OK* (meaning nothing is broken), but there still remains the room of improvement, that is, even in "dissolved from free_huge_page()" case, we can try to call set_hwpoison_free_buddy_page() to contain the 4kB error page, but we don't try it now because dissolve_free_huge_page() return -EBUSY for !PageHuge case. > > > This change affects other callers of dissolve_free_huge_page(), > > which are also cleaned up by this patch. > > It may just be me, but I am having a hard time separating the fix for this > issue from the change to the dissolve_free_huge_page routine. Would it be > more clear or possible to create separate patches for these? Yes, the change is actually an 'improvement' purely related to hugetlb, and seems not a 'bug fix'. So I'll update the description. Maybe no need to separate to patches. Thanks, Naoya Horiguchi