Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752717AbdLEQ5a (ORCPT ); Tue, 5 Dec 2017 11:57:30 -0500 Received: from mga06.intel.com ([134.134.136.31]:11559 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751937AbdLEQ53 (ORCPT ); Tue, 5 Dec 2017 11:57:29 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,364,1508828400"; d="scan'208";a="9314014" Date: Tue, 5 Dec 2017 08:57:27 -0800 From: Andi Kleen To: gengdongjiu Cc: "tony.luck@intel.com" , Naoya Horiguchi , "npiggin@gmail.com" , "vbabka@suse.cz" , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , Linux Kernel Mailing List , "wangxiongfeng (C)" , Huangshaoyu , Wuquanming Subject: Re: [question] handle the page table RAS error Message-ID: <20171205165727.GG3070@tassilo.jf.intel.com> References: <0184EA26B2509940AA629AE1405DD7F2019C8B36@DGGEMA503-MBS.china.huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0184EA26B2509940AA629AE1405DD7F2019C8B36@DGGEMA503-MBS.china.huawei.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1639 Lines: 22 On Sun, Dec 03, 2017 at 01:22:25PM +0000, gengdongjiu wrote: > Hi all, > Sorry to disturb you. Now the ARM64 has supported the RAS, when enabling this feature, we encounter a issue. If the user space application happen page table RAS error, > Memory error handler(memory_failure()) will do nothing except make a poisoned page flag, and fault handler in arch/arm64/mm/fault.c will deliver a signal to kill this > application. when this application exit, it will call unmap_vmas () to release his vma resource, but here it will touch the error page table again, then will trigger RAS error again, so > this application cannot be killed and system will be panic, the log is shown in [2]. > > As shown the stack in [1], unmap_page_range() will touch the error page table, so system will panic, does this panic behavior is expected? How the x86 handle the page table > RAS error? If user space application happen page table RAS error, I think the expected behavior should be killing the application instead of panic OS. In current code, when release > application vma resource, I do not see it will check whether table page is poisoned, could you give me some suggestion about how to handle this case? Thanks a lot. x86 doesn't handle it. There are lots of memory types that are not handled by MCE recovery because it is just too difficult. In general MCE recovery focuses on memory types that use up significant percent of total memory. Page tables are normally not that big, so not really worth handling. I wouldn't bother about them unless you measure them to big a significant portion of memory on a real world workload. -Andi