Received: by 10.223.148.5 with SMTP id 5csp6531838wrq; Wed, 17 Jan 2018 15:04:42 -0800 (PST) X-Google-Smtp-Source: ACJfBouNygmhwsa9HxNjdAQZeGihh19veQp/c1Bcx7c06ifWq+LcHadFh3RDVmQV09nyXh0UB/A2 X-Received: by 10.124.24.7 with SMTP id a7mr5343462plz.334.1516230282408; Wed, 17 Jan 2018 15:04:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516230282; cv=none; d=google.com; s=arc-20160816; b=Fu9eCT/lSz5eL6MXd6nOa+Xr29cpP1I0nhlfQXLqJ9I5v3qwEoi+ImnwvEE0GQrMck nwVSVxR+Y0KdJvgizeGxxpT7WGhIBRN3cPTeLpITQBoil3ni4xyGN626qXZh/JZIYvN/ dS4UxT8TYqsGKiP/PHSszt7/OwDb0Ka2gW3A0r+wyDGslb8YuZwrtQ5UiBeVx/s51oNi jua2rl6Xf5G4bPrWyFeqNelaNwz/tEjwdHPTSckWEu0ly07nxS41+0vy2RDOLkvk0ucw ifNh8YkycAGcvLeavng1aBtKL0wK8PgvN85J/Hj8ss92FukvU88meuwLi46b/GN5v12a miIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=CxL0rO2UWK84NT+Cm7aj8IfrM7U6x3Oc59+D7mSl7sE=; b=lJDLKo12w1xP8fsMz15bzBJueLxtkhdB1/4IgywuEfEJj3eHFSo0aFxQ+xrm65RAzJ q/3QogTWJB8e2vTzUl1ztqAq8tJkLfFHJLUghJyj9tGxKnpX2XzWkq6EU1oEtxcA4IZ6 XVEfXzGc4DKtmVUxuX8C4xKEehYxkzu6xOkmBPis4sUeDqJYOBtKly3sn8JgUSn9EzAO WdVIWqabHd+5vOj6M8S8yEa/kqvWW3v6Pzjgi+ba3ujeJ0yq7Hr1SfMUm2lsjF5Obgdf mv6lSS1pHQlOgtIoSik4zj1vaK0hVk6FlJVXTCDp6bIKIGWA4Ag7XfN7/lUQnT0bqY8x k+YQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y20si5268593pfj.54.2018.01.17.15.04.27; Wed, 17 Jan 2018 15:04:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753548AbeAQXEC (ORCPT + 99 others); Wed, 17 Jan 2018 18:04:02 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:52214 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750816AbeAQXEB (ORCPT ); Wed, 17 Jan 2018 18:04:01 -0500 Received: from localhost.localdomain (c-73-71-123-133.hsd1.ca.comcast.net [73.71.123.133]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 33DAF1024; Wed, 17 Jan 2018 23:04:00 +0000 (UTC) Date: Wed, 17 Jan 2018 15:03:59 -0800 From: Andrew Morton To: Michal Hocko Cc: Naoya Horiguchi , Balbir Singh , Laurent Dufour , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Wen Congyang Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages Message-Id: <20180117150359.655bb93d8f1d663a2cd48c33@linux-foundation.org> In-Reply-To: <20170428063048.GA9399@dhcp22.suse.cz> References: <1493130472-22843-1-git-send-email-ldufour@linux.vnet.ibm.com> <1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com> <1493172615.4828.3.camel@gmail.com> <20170426031255.GB11619@hori1.linux.bs1.fc.nec.co.jp> <20170428063048.GA9399@dhcp22.suse.cz> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko wrote: > On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote: > > On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote: > > > On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote: > > > > The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when > > > > offlining pages") skip the HWPoisoned pages when offlining pages, but > > > > this should be skipped when onlining the pages too. > > > > > > > > Signed-off-by: Laurent Dufour > > > > --- > > > > mm/memory_hotplug.c | 4 ++++ > > > > 1 file changed, 4 insertions(+) > > > > > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > > > index 6fa7208bcd56..741ddb50e7d2 100644 > > > > --- a/mm/memory_hotplug.c > > > > +++ b/mm/memory_hotplug.c > > > > @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, > > > > if (PageReserved(pfn_to_page(start_pfn))) > > > > for (i = 0; i < nr_pages; i++) { > > > > page = pfn_to_page(start_pfn + i); > > > > + if (PageHWPoison(page)) { > > > > + ClearPageReserved(page); > > > > > > Why do we clear page reserved? Also if the page is marked PageHWPoison, it > > > was never offlined to begin with? Or do you expect this to be set on newly > > > hotplugged memory? Also don't we need to skip the entire pageblock? > > > > If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means > > that we skip the page status check for hwpoisoned pages *not* to prevent > > memory offlining for memblocks with hwpoisoned pages. That means that > > hwpoisoned pages can be offlined. > > Is this patch actually correct? I am trying to wrap my head around it > but it smells like it tries to avoid the problem rather than fix it > properly. I might be wrong here of course but to me it sounds like > poisoned page should simply be offlined and keep its poison state all > the time. If the memory is hot-removed and added again we have lost the > struct page along with the state which is the expected behavior. If it > is still broken we will re-poison it. > > Anyway a patch to skip over poisoned pages during online makes perfect > sense to me. The PageReserved fiddling around much less so. > > Or am I missing something. Let's CC Wen Congyang for the clarification > here. Wen Congyang appears to have disappeared and this fix isn't yet finalized. Can we all please revisit it and have a think about Michal's questions? Thanks. From: Laurent Dufour Subject: mm: skip HWPoisoned pages when onlining pages b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages") skipped the HWPoisoned pages when offlining pages, but this should be skipped when onlining the pages too. n-horiguchi@ah.jp.nec.com said: : If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd : means that we skip the page status check for hwpoisoned pages *not* to : prevent memory offlining for memblocks with hwpoisoned pages. That : means that hwpoisoned pages can be offlined. : : And another reason to clear PageReserved is that we could reuse the : hwpoisoned page after onlining back with replacing the broken DIMM. In : this usecase, we first do unpoisoning to clear PageHWPoison, but it : doesn't work if PageReserved is set. My simple testing shows the BUG : below in unpoisoning (without the ClearPageReserved): : : Unpoison: Software-unpoisoned page 0x18000 : BUG: Bad page state in process page-types pfn:18000 : page:ffffda5440600000 count:0 mapcount:0 mapping: (null) index:0x70006b599 : flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked) : raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff : raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 : page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set : bad because of flags: 0x800(reserved) Link: http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com Signed-off-by: Laurent Dufour Cc: Naoya Horiguchi Cc: Andrey Vagin Cc: Glauber Costa Cc: Vladimir Davydov Cc: Balbir Singh Signed-off-by: Andrew Morton --- mm/memory_hotplug.c | 4 ++++ 1 file changed, 4 insertions(+) diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c --- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages +++ a/mm/memory_hotplug.c @@ -696,6 +696,10 @@ static int online_pages_range(unsigned l if (PageReserved(pfn_to_page(start_pfn))) for (i = 0; i < nr_pages; i++) { page = pfn_to_page(start_pfn + i); + if (PageHWPoison(page)) { + ClearPageReserved(page); + continue; + } (*online_page_callback)(page); onlined_pages++; } _