Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754718AbcDTK7I (ORCPT ); Wed, 20 Apr 2016 06:59:08 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:52320 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752806AbcDTK7G (ORCPT ); Wed, 20 Apr 2016 06:59:06 -0400 Message-ID: <571760F3.2040305@huawei.com> Date: Wed, 20 Apr 2016 18:58:59 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Naoya Horiguchi CC: Linux MM , LKML Subject: Re: mce: a question about memory_failure_early_kill in memory_failure() References: <571612DE.8020908@huawei.com> <20160420070735.GA10125@hori1.linux.bs1.fc.nec.co.jp> <57175F30.6050300@huawei.com> In-Reply-To: <57175F30.6050300@huawei.com> Content-Type: text/plain; charset="ISO-2022-JP" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.571760F7.0015,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: c1397a028ef5efeb73321d463814285d Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1638 Lines: 57 On 2016/4/20 18:51, Xishi Qiu wrote: > On 2016/4/20 15:07, Naoya Horiguchi wrote: > >> On Tue, Apr 19, 2016 at 07:13:34PM +0800, Xishi Qiu wrote: >>> /proc/sys/vm/memory_failure_early_kill >>> >>> 1: means kill all processes that have the corrupted and not reloadable page mapped. >>> 0: means only unmap the corrupted page from all processes and only kill a process >>> who tries to access it. >>> >>> If set memory_failure_early_kill to 0, and memory_failure() has been called. >>> memory_failure() >>> hwpoison_user_mappings() >>> collect_procs() // the task(with no PF_MCE_PROCESS flag) is not in the tokill list >>> try_to_unmap() >>> >>> If the task access the memory, there will be a page fault, >>> so the task can not access the original page again, right? >> >> Yes, right. That's the behavior in default "late kill" case. >> > > Hi Naoya, > > Thanks for your reply, my confusion is that after try_to_unmap(), there will be a > page fault if the task access the memory, and we will alloc a new page for it. > Hi Naoya, If we alloc a new page, the task won't access the poisioned page again, so it won't be killed by mce(late kill), right? If the poisioned page is anon, we will lost data, right? Thanks, Xishi Qiu > So how the hardware(mce) know this page fault is relate to the poisioned page which > is unmapped from the task? > > Will we record something in pte when after try_to_unmap() in memory_failure()? > > Thanks, > Xishi Qiu > >> I'm guessing that you might have a more specific problem around this code. >> If so, please feel free to ask with detail. >> >> Thanks, >> Naoya Horiguchi >> > >