Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423967AbdD1Ga5 (ORCPT ); Fri, 28 Apr 2017 02:30:57 -0400 Received: from mx2.suse.de ([195.135.220.15]:36785 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1164118AbdD1Gav (ORCPT ); Fri, 28 Apr 2017 02:30:51 -0400 Date: Fri, 28 Apr 2017 08:30:48 +0200 From: Michal Hocko To: Naoya Horiguchi Cc: Balbir Singh , Laurent Dufour , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , Wen Congyang Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages Message-ID: <20170428063048.GA9399@dhcp22.suse.cz> References: <1493130472-22843-1-git-send-email-ldufour@linux.vnet.ibm.com> <1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com> <1493172615.4828.3.camel@gmail.com> <20170426031255.GB11619@hori1.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170426031255.GB11619@hori1.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2222 Lines: 48 On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote: > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote: > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote: > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when > > > offlining pages") skip the HWPoisoned pages when offlining pages, but > > > this should be skipped when onlining the pages too. > > > > > > Signed-off-by: Laurent Dufour > > > --- > > > mm/memory_hotplug.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > > index 6fa7208bcd56..741ddb50e7d2 100644 > > > --- a/mm/memory_hotplug.c > > > +++ b/mm/memory_hotplug.c > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, > > > if (PageReserved(pfn_to_page(start_pfn))) > > > for (i = 0; i < nr_pages; i++) { > > > page = pfn_to_page(start_pfn + i); > > > + if (PageHWPoison(page)) { > > > + ClearPageReserved(page); > > > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it > > was never offlined to begin with? Or do you expect this to be set on newly > > hotplugged memory? Also don't we need to skip the entire pageblock? > > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means > that we skip the page status check for hwpoisoned pages *not* to prevent > memory offlining for memblocks with hwpoisoned pages. That means that > hwpoisoned pages can be offlined. Is this patch actually correct? I am trying to wrap my head around it but it smells like it tries to avoid the problem rather than fix it properly. I might be wrong here of course but to me it sounds like poisoned page should simply be offlined and keep its poison state all the time. If the memory is hot-removed and added again we have lost the struct page along with the state which is the expected behavior. If it is still broken we will re-poison it. Anyway a patch to skip over poisoned pages during online makes perfect sense to me. The PageReserved fiddling around much less so. Or am I missing something. Let's CC Wen Congyang for the clarification here. -- Michal Hocko SUSE Labs