Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp1010023pxy; Wed, 5 May 2021 21:04:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzaxuN5dY+H0J3v43TAhyFnqunCbIAYR57nND+IZCCGXEuYj2JcfGxHwJhZGgbfYbQCLWXq X-Received: by 2002:a63:3242:: with SMTP id y63mr2135271pgy.87.1620273857668; Wed, 05 May 2021 21:04:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620273857; cv=none; d=google.com; s=arc-20160816; b=mTiSUj+FkkeYkaUOMEWZCaz2qHvga49tIg0YSeMVzqCcnAVPj3YMxzWTw8fszeB/sT a+4vl5Fxnhc1R/knWhh8kT+L+Y/KjEAgPIBt2RrYXbM5fVqdjYJn9nqW7C/PubLBi6EN +cfInprb0GOdZkHYhLFWz2Cxo0LdkbZre9bbmWI1dqERMZwXm6LkEeMEDcifnyAW4Hc8 q8K5SrwfGYCN2hlARnmTmowRxjw9xWu7ZFEkigh4PJD/GOAZJulWDiE3qxK6hXcGxcmq wUj6bcUM3bJH19M9D0+Y4hGLnEGEkL7d0SG7TDxor98EZwInjvrh6n7t5325nVBIgNaJ Z2uQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=YZ7wUA2bBHVgCEAIs4TNiIYXwaloHLheASAnqSI7sFs=; b=ALxGSWj40IfyJ39j50DyE7oLOvCGdhRfSSfwqq3b0n4m6eLh7SzxlOOla4mOpEnx36 7mx+Aw3/a1a0eccR9rQypzXsn43qurToNn/vIbtlfP12rEc9H1K7+KR+BydikgD6m14g TBIEYLyPujEqnVQToY7es3bzLwmK4UM2SU2kPXtNm3QBOnXbE15/sX08p0N3ZTRaWgJe ieM2D/LGx82HLn8REBcSR62D1LnZF4ZWMcpSQZxSEk0Fs9AeVrm7g070R75IFJCiDM0C MBbL/PGUP/T2M4v7AXN9OCYKBIABp5oDodSdpbSr1xeMdS0BPnONv1eH6ABmPICr/sJk Z76Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b7si1429565plr.271.2021.05.05.21.04.00; Wed, 05 May 2021 21:04:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230185AbhEFEDl (ORCPT + 99 others); Thu, 6 May 2021 00:03:41 -0400 Received: from mail-m121142.qiye.163.com ([115.236.121.142]:44566 "EHLO mail-m121142.qiye.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231991AbhEFECy (ORCPT ); Thu, 6 May 2021 00:02:54 -0400 Received: from [0.0.0.0] (unknown [14.154.28.254]) by mail-m121142.qiye.163.com (Hmail) with ESMTPA id 8CE89802A7; Thu, 6 May 2021 12:01:36 +0800 (CST) Subject: Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy To: =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , David Hildenbrand Cc: "akpm@linux-foundation.org" , "osalvador@suse.de" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" References: <20210421020426.5914-1-dinghui@sangfor.com.cn> <20210506024943.GA1777607@hori.linux.bs1.fc.nec.co.jp> From: Ding Hui Message-ID: <33be44ea-f377-c049-03ff-3b45289ab5f7@sangfor.com.cn> Date: Thu, 6 May 2021 12:01:34 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <20210506024943.GA1777607@hori.linux.bs1.fc.nec.co.jp> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-HM-Spam-Status: e1kfGhgUHx5ZQUtXWQgYFAkeWUFZS1VLWVdZKFlBSE83V1ktWUFJV1kPCR oVCBIfWUFZQk4aHlYaQkMYSUJOTRgeHRhVEwETFhoSFyQUDg9ZV1kWGg8SFR0UWUFZT0tIVUpKS0 9ISVVLWQY+ X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6Py46Ljo6GD8WFkMUThoMHzcz MA4KFCFVSlVKTUlLSUxITUJMSkJMVTMWGhIXVR8SFRwTDhI7CBoVHB0UCVUYFBZVGBVFWVdZEgtZ QVlKT1VKTk9VSUNVSU5PWVdZCAFZQU1LSUI3Bg++ X-HM-Tid: 0a793fd69030b037kuuu8ce89802a7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote: >> On 21.04.21 04:04, Ding Hui wrote: >>> Recently we found there is a lot MemFree left in /proc/meminfo after >>> do a lot of pages soft offline. >>> >>> I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. >>> After take_page_off_buddy, the page is no longer belong to buddy >>> allocator, and will not be used any more, but we maybe missed accounting >>> NR_FREE_PAGES in this situation. >>> >>> Signed-off-by: Ding Hui >>> --- >>> mm/page_alloc.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index cfc72873961d..8d65b62784d8 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) >>> del_page_from_free_list(page_head, zone, page_order); >>> break_down_buddy_pages(zone, page_head, page, 0, >>> page_order, migratetype); >>> + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); >>> ret = true; >>> break; >>> } >>> >> >> Should this use __mod_zone_freepage_state() instead? > > Yes, __mod_zone_freepage_state() looks better to me. > > And I think that maybe an additional __mod_zone_freepage_state() in > unpoison_memory() is necessary to cancel the decrement. I thought of the > following, but it doesn't build because get_pfnblock_migratetype() is > available only in mm/page_alloc.c, so you might want to add a small exported > routine in mm/page_alloc.c and let it called from unpoison_memory(). > > @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn) > } > > if (!get_hwpoison_page(p, flags, 0)) { > - if (TestClearPageHWPoison(p)) > + if (TestClearPageHWPoison(p)) { > + int migratetype = get_pfnblock_migratetype(p, pfn); > + > num_poisoned_pages_dec(); > + __mod_zone_freepage_state(page_zone(p), 1, migratetype); > + } > unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", > pfn, &unpoison_rs); > return 0; > I think there is another problem: In normal case, we keep the last refcount of the hwpoison page, so get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when call put_page. At race condition, we maybe leak the page because we does not put it back to buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages, NR_FREE_PAGES is adjusted correctly. CPU0 CPU1 soft_offline_page soft_offline_free_page page_handle_poison take_page_off_buddy SetPageHWPoison unpoison_memory if (!get_hwpoison_page(p)) TestClearPageHWPoison num_poisoned_pages_dec __mod_zone_freepage_state return 0 /* miss put the page back to buddy */ page_ref_inc num_poisoned_pages_inc How about do nothing and return -EBUSY (so the caller can retry) if unpoison a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed pages") does ? @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn) } if (!get_hwpoison_page(p, flags, 0)) { - if (TestClearPageHWPoison(p)) - num_poisoned_pages_dec(); - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", + unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page %#lx\n", pfn, &unpoison_rs); - return 0; + return -EBUSY; } lock_page(page); -- Thanks, - Ding Hui