Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1439336imm; Thu, 19 Jul 2018 01:28:58 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfuiKbcVHFo6Usgvog6rIgzPN+JsJRt3ghZPwrQaQveZY9aKjhC0c+rDP7DT1EWC7au3yWd X-Received: by 2002:a17:902:8b8c:: with SMTP id ay12-v6mr9152878plb.74.1531988937949; Thu, 19 Jul 2018 01:28:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531988937; cv=none; d=google.com; s=arc-20160816; b=ERfFGaZdEjAoBAnUOXxRY4rOXlk9OgzpvIXeVewIt7b6TrozAtxJWGVeRjHJBRwRpz tqJEkU3akiF+CjHVyT0+JccUEChgN3GHscp8yyO3ZL0TJyQ6JEKnNsWW3AO7Z6cvfhwq l8WsUxiviCglnAL3LBAeAT78TTxiscjqxpW7lMoUFRHzJGTs7w88rh4Pp7idkQQU98LP nIUut6mx9DM1XdNH7j+iy0ew/DCarU4O6VPVTjZ7Pho6i3bMzWF3ugfQWK6VUfs+KS/8 nKHkhCUMena4vuoZJ7w3D/WDinQioSQOdgOzOBwwleT8X/j8EPtQKiEKCDMWnvWdiNjZ Wndg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=83sWgOivFxGI/ytEq/sBB/jx4uyR9Dki8h+QkbX6R0k=; b=sAfEyAo9eXMWCJC9mje3nPJnKWgN4OVvSlKFmHgxCcAcAvCyrgdZF218/LVvQINylV 6SRfXikP+xJiMae9DQpXHhf2f6bOINCkUj7MfxguvvUbAKC8lnrv9o3FgcyKu9p5ibux pVBDyaGFR/Nzh6ww3Ra0gowFCICs5RrDevCs5h/iUx6VYU+YMuNrU2P0ZCsXBSfsF8XL QifWgbX/ENqP2852Q4+PwbT1hZu68MD2tgiHFXCfA6qvm9rkwmG9Bw4k1Go0YWREWTUd 5ZJ/v4Mz070kdfI/eiMR2ToHAKJAq1lPv8hOBtPTaWVcAjLUc+gqZgbNP15JNki/14/A rAiA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j30-v6si5480973pgm.26.2018.07.19.01.28.43; Thu, 19 Jul 2018 01:28:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731598AbeGSJJr (ORCPT + 99 others); Thu, 19 Jul 2018 05:09:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:59680 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730304AbeGSJJr (ORCPT ); Thu, 19 Jul 2018 05:09:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8AD6DAD12; Thu, 19 Jul 2018 08:27:44 +0000 (UTC) Date: Thu, 19 Jul 2018 10:27:43 +0200 From: Michal Hocko To: Naoya Horiguchi Cc: "linux-mm@kvack.org" , Andrew Morton , "xishi.qiuxishi@alibaba-inc.com" , "zy.zhengyi@alibaba-inc.com" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v2 1/2] mm: fix race on soft-offlining free huge pages Message-ID: <20180719082743.GN7193@dhcp22.suse.cz> References: <1531805552-19547-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1531805552-19547-2-git-send-email-n-horiguchi@ah.jp.nec.com> <20180717142743.GJ7193@dhcp22.suse.cz> <20180718005528.GA12184@hori1.linux.bs1.fc.nec.co.jp> <20180718085032.GS7193@dhcp22.suse.cz> <20180719061945.GB22154@hori1.linux.bs1.fc.nec.co.jp> <20180719071516.GK7193@dhcp22.suse.cz> <20180719080804.GA32756@hori1.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180719080804.GA32756@hori1.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 19-07-18 08:08:05, Naoya Horiguchi wrote: > On Thu, Jul 19, 2018 at 09:15:16AM +0200, Michal Hocko wrote: > > On Thu 19-07-18 06:19:45, Naoya Horiguchi wrote: > > > On Wed, Jul 18, 2018 at 10:50:32AM +0200, Michal Hocko wrote: [...] > > > > Why do we even need HWPoison flag here? Everything can be completely > > > > transparent to the application. It shouldn't fail from what I > > > > understood. > > > > > > PageHWPoison flag is used to the 'remove from the allocator' part > > > which is like below: > > > > > > static inline > > > struct page *rmqueue( > > > ... > > > do { > > > page = NULL; > > > if (alloc_flags & ALLOC_HARDER) { > > > page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); > > > if (page) > > > trace_mm_page_alloc_zone_locked(page, order, migratetype); > > > } > > > if (!page) > > > page = __rmqueue(zone, order, migratetype); > > > } while (page && check_new_pages(page, order)); > > > > > > check_new_pages() returns true if the page taken from free list has > > > a hwpoison page so that the allocator iterates another round to get > > > another page. > > > > > > There's no function that can be called from outside allocator to remove > > > a page in allocator. So actual page removal is done at allocation time, > > > not at error handling time. That's the reason why we need PageHWPoison. > > > > hwpoison is an internal mm functionality so why cannot we simply add a > > function that would do that? > > That's one possible solution. I would prefer that much more than add an overhead (albeit small) into the page allocator directly. HWPoison should be a really rare event so why should everybody pay the price? I would much rather see that the poison path pays the additional price. > I know about another downside in current implementation. > If a hwpoison page is found during high order page allocation, > all 2^order pages (not only hwpoison page) are removed from > buddy because of the above quoted code. And these leaked pages > are never returned to freelist even with unpoison_memory(). > If we have a page removal function which properly splits high order > free pages into lower order pages, this problem is avoided. Even more reason to move to a new scheme. > OTOH PageHWPoison still has a role to report error to userspace. > Without it unpoison_memory() doesn't work. Sure but we do not really need a special page flag for that. We know the page is not reachable other than via pfn walkers. If you make the page reserved and note the fact it has been poisoned in the past then you can emulate the missing functionality. Btw. do we really need unpoisoning functionality? Who is really using it, other than some tests? How does the memory become OK again? Don't we really need to go through physical hotremove & hotadd to clean the poison status? Thanks! -- Michal Hocko SUSE Labs