Received: by 10.223.176.46 with SMTP id f43csp4507531wra; Tue, 23 Jan 2018 10:16:55 -0800 (PST) X-Google-Smtp-Source: AH8x224TuaRD8jmkTCXYnHMsjgR3yJ9QpXwa2raGMwzMq0BG+vIQt5Qwrk065C6Frn7moyKvvL/0 X-Received: by 10.36.6.143 with SMTP id 137mr4930262itv.34.1516731415630; Tue, 23 Jan 2018 10:16:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516731415; cv=none; d=google.com; s=arc-20160816; b=CchnabE6dhe2sQtJLta+V9bhPnY8sEDUrz8tHkcuUSQtRFC/X3oZNO3XouLUN/76Sa gm9kRA4XemMY/ZvlKaNrUVWx4oEgVDHLTfOALa3s+XuKAOXLuu2dscns/IaK0dks05kJ bzhuDS1X2yKjz5FfrBujy3T6PGyh68CBx1WK1YlaJYq3ZoMTK85LNkQVFlpps505ooob N+ndi7lNV2cTviNMrMqJkyPcIjG1mzagOWpDbgztIzNUvH3AsKJN+a0kx3OpIvg5eoVl fBhrUiT1wz6vtrquwcZHTefTvBzg0SsUREaGo7/zGQbL/61dX8qQhuwfM2OMvEygwD2I +QhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :references:cc:to:subject:from:arc-authentication-results; bh=3Wbsfn4+OG5D/LMIHqI4UU6f2MMwiQ80Z/PkoPaHbQY=; b=MezGy43De5Vcmx4xZVvqZBJi5tq0e5JTZl1Xl+yGZxfHW072CGqcjy02ccwkCA8dMd rt4cp35RE0KP/P5UWRmCFIMa8BZqv5qf+4a2rTa0xV8QDV+NFZBSY6HL/FL/fu9T0zo5 mJO09M03O0diXkxIRSKL+wEYKShvSbuwKnXnQEVEqm3XVz7gh0XUwa8S1S4tNDx2dHkN shNOKQk8M9zT9ZeM/LipCLQn1HJ0+C1g6Nga+V6AXwoH3MEQ53/F27NC/x7aqhGZgiV0 XTF+vTO8L2i5bRzIvjsGHCSnsNv/3VqmTY01U2r6qgAZ2B/iDk4E3EQxy1kh8heojqQD 4eMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r18si14626784iod.120.2018.01.23.10.16.40; Tue, 23 Jan 2018 10:16:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752158AbeAWSPo (ORCPT + 99 others); Tue, 23 Jan 2018 13:15:44 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:38562 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751721AbeAWSPm (ORCPT ); Tue, 23 Jan 2018 13:15:42 -0500 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0NIFYhe014493 for ; Tue, 23 Jan 2018 13:15:42 -0500 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 2fp75m14b6-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 23 Jan 2018 13:15:41 -0500 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 23 Jan 2018 18:15:39 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 23 Jan 2018 18:15:36 -0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w0NIFZoB48824566; Tue, 23 Jan 2018 18:15:35 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 53ADF11C04A; Tue, 23 Jan 2018 18:09:13 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EFADB11C050; Tue, 23 Jan 2018 18:09:12 +0000 (GMT) Received: from [9.101.4.33] (unknown [9.101.4.33]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 23 Jan 2018 18:09:12 +0000 (GMT) From: Laurent Dufour Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages To: Andrew Morton , Michal Hocko Cc: Naoya Horiguchi , Balbir Singh , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Wen Congyang References: <1493130472-22843-1-git-send-email-ldufour@linux.vnet.ibm.com> <1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com> <1493172615.4828.3.camel@gmail.com> <20170426031255.GB11619@hori1.linux.bs1.fc.nec.co.jp> <20170428063048.GA9399@dhcp22.suse.cz> <20180117150359.655bb93d8f1d663a2cd48c33@linux-foundation.org> Date: Tue, 23 Jan 2018 19:15:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180117150359.655bb93d8f1d663a2cd48c33@linux-foundation.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 18012318-0012-0000-0000-000005A6F886 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18012318-0013-0000-0000-0000192280BC Message-Id: <75179fb1-eb83-15b8-b7ba-d405745e1566@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-23_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801230249 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andrew, On 18/01/2018 00:03, Andrew Morton wrote: > On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko wrote: > >> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote: >>> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote: >>>> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote: >>>>> The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when >>>>> offlining pages") skip the HWPoisoned pages when offlining pages, but >>>>> this should be skipped when onlining the pages too. >>>>> >>>>> Signed-off-by: Laurent Dufour >>>>> --- >>>>> mm/memory_hotplug.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>>>> index 6fa7208bcd56..741ddb50e7d2 100644 >>>>> --- a/mm/memory_hotplug.c >>>>> +++ b/mm/memory_hotplug.c >>>>> @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, >>>>> if (PageReserved(pfn_to_page(start_pfn))) >>>>> for (i = 0; i < nr_pages; i++) { >>>>> page = pfn_to_page(start_pfn + i); >>>>> + if (PageHWPoison(page)) { >>>>> + ClearPageReserved(page); >>>> >>>> Why do we clear page reserved? Also if the page is marked PageHWPoison, it >>>> was never offlined to begin with? Or do you expect this to be set on newly >>>> hotplugged memory? Also don't we need to skip the entire pageblock? >>> >>> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means >>> that we skip the page status check for hwpoisoned pages *not* to prevent >>> memory offlining for memblocks with hwpoisoned pages. That means that >>> hwpoisoned pages can be offlined. >> >> Is this patch actually correct? I am trying to wrap my head around it >> but it smells like it tries to avoid the problem rather than fix it >> properly. I might be wrong here of course but to me it sounds like >> poisoned page should simply be offlined and keep its poison state all >> the time. If the memory is hot-removed and added again we have lost the >> struct page along with the state which is the expected behavior. If it >> is still broken we will re-poison it. >> >> Anyway a patch to skip over poisoned pages during online makes perfect >> sense to me. The PageReserved fiddling around much less so. >> >> Or am I missing something. Let's CC Wen Congyang for the clarification >> here. > > Wen Congyang appears to have disappeared and this fix isn't yet > finalized. Can we all please revisit it and have a think about > Michal's questions? I tried to recreate the original issue, but there were a lot of changes done in this area since the last April. I was not able to offline a poisoned page because isolate_movable_page() is failing. I'll investigate that further... Cheers, Laurent. > Thanks. > > > From: Laurent Dufour > Subject: mm: skip HWPoisoned pages when onlining pages > > b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages") > skipped the HWPoisoned pages when offlining pages, but this should be > skipped when onlining the pages too. > > n-horiguchi@ah.jp.nec.com said: > > : If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd > : means that we skip the page status check for hwpoisoned pages *not* to > : prevent memory offlining for memblocks with hwpoisoned pages. That > : means that hwpoisoned pages can be offlined. > : > : And another reason to clear PageReserved is that we could reuse the > : hwpoisoned page after onlining back with replacing the broken DIMM. In > : this usecase, we first do unpoisoning to clear PageHWPoison, but it > : doesn't work if PageReserved is set. My simple testing shows the BUG > : below in unpoisoning (without the ClearPageReserved): > : > : Unpoison: Software-unpoisoned page 0x18000 > : BUG: Bad page state in process page-types pfn:18000 > : page:ffffda5440600000 count:0 mapcount:0 mapping: (null) index:0x70006b599 > : flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked) > : raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff > : raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 > : page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > : bad because of flags: 0x800(reserved) > > Link: http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com > Signed-off-by: Laurent Dufour > Cc: Naoya Horiguchi > Cc: Andrey Vagin > Cc: Glauber Costa > Cc: Vladimir Davydov > Cc: Balbir Singh > Signed-off-by: Andrew Morton > --- > > mm/memory_hotplug.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c > --- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages > +++ a/mm/memory_hotplug.c > @@ -696,6 +696,10 @@ static int online_pages_range(unsigned l > if (PageReserved(pfn_to_page(start_pfn))) > for (i = 0; i < nr_pages; i++) { > page = pfn_to_page(start_pfn + i); > + if (PageHWPoison(page)) { > + ClearPageReserved(page); > + continue; > + } > (*online_page_callback)(page); > onlined_pages++; > } > _ >