Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp65536pxf; Wed, 10 Mar 2021 00:02:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJxklOTulxSWaZSgh1V/7TgpAg+/fIAE02zOdHBIRCrD3FgmQbdCVo1C6EsXq3zJYSGJiIA8 X-Received: by 2002:a05:6402:10c9:: with SMTP id p9mr1905984edu.268.1615363379807; Wed, 10 Mar 2021 00:02:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615363379; cv=none; d=google.com; s=arc-20160816; b=Gd4DHAicVfSmaN84FkGysR3oRSfnQz/PWVmX37cKbk+SlHqaXSw1FpP/9mhwRBD1y4 3WVCg4d4ASKLMO89pxigcqgWoO62eY3hsIn4nk/7vDUD8KmbHly7vmA1u0ndBUzhG1A+ cWAnSUfsQSDcmLvt1YmoPImD21BvwORTRbL/UtLbNM+Tu1SLzceFS+16Y9VGtADo16FU zkcAti3y8wye7h0hi54Sn+/AdaN0QUlNUPB9dsRvbyOYkRegvh0XlMlbtjH2XZRia2Gf BYv2+nLfkUiSwodmgnLw2gj2qpWa7wvcbxPucDgBqqDbHOX1o5gHoFcpGIKtl1WzFq6X pQdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=XRSk9YMnArABF0rUZbMIwxyf3HE03ej+t67UzKiAYlY=; b=SSDL1K3D8gy2B9kbyVLBDG6BIvPbJ/PbMnW8dFo51bB4dEDhBv2sWjWAGUbJ/g9S9q qN7b3rWJ7i7vLzKEes+QN70pr/1m1MvH7sPQDa4cKAZ0l+gHJaTDwoD2LYi85hGe0nfg +8w4qMXkiQUgaYymtFX6x3pNJOQDCUf0v1u8C3t8QA6WwWAJj8h9rOsB3bdSN00EFmJ+ uxFCLQed/uC8rUNC/D5i/M+U6TwkLguJiDQSbMPPqUoSeVuj3RkHMq2oovYrLAhf03u/ ymUro04O8TNHHzuLse/PHQSQhw10ztkhZrz7jDC3wa5SEt1wtG0yekG5V9aScL71pLFR Hh9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bs25si11345689ejb.238.2021.03.10.00.02.37; Wed, 10 Mar 2021 00:02:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230198AbhCJIBj convert rfc822-to-8bit (ORCPT + 99 others); Wed, 10 Mar 2021 03:01:39 -0500 Received: from mail.kingsoft.com ([114.255.44.146]:47226 "EHLO mail.kingsoft.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S229828AbhCJIBL (ORCPT ); Wed, 10 Mar 2021 03:01:11 -0500 X-AuditID: 0a580155-20dff7000005482e-38-604875472f08 Received: from mail.kingsoft.com (localhost [10.88.1.32]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-2-NODE-85) with SMTP id 9E.D3.18478.74578406; Wed, 10 Mar 2021 15:29:11 +0800 (HKT) Received: from alex-virtual-machine (172.16.253.254) by KSBJMAIL2.kingsoft.cn (10.88.1.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 10 Mar 2021 16:01:08 +0800 Date: Wed, 10 Mar 2021 16:01:07 +0800 From: Aili Yao To: "HORIGUCHI =?UTF-8?B?TkFPWUE=?=(=?UTF-8?B?5aCA5Y+j44CA55u05Lmf?=)" CC: "Luck, Tony" , Oscar Salvador , "david@redhat.com" , "akpm@linux-foundation.org" , "bp@alien8.de" , "tglx@linutronix.de" , "mingo@redhat.com" , "hpa@zytor.com" , "x86@kernel.org" , "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" , , Subject: Re: [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned Message-ID: <20210310160107.7383a6f4@alex-virtual-machine> In-Reply-To: <20210309082824.GA1793@hori.linux.bs1.fc.nec.co.jp> References: <20210305093016.40c87375@alex-virtual-machine> <20210305093656.6c262b19@alex-virtual-machine> <20210305221143.GA220893@agluck-desk2.amr.corp.intel.com> <20210308064558.GA3617@hori.linux.bs1.fc.nec.co.jp> <3690ece2101d428fb9067fcd2a423ff8@intel.com> <20210308223839.GA21886@hori.linux.bs1.fc.nec.co.jp> <20210308225504.GA233893@agluck-desk2.amr.corp.intel.com> <20210309100421.3d09b6b1@alex-virtual-machine> <20210309060440.GA29668@hori.linux.bs1.fc.nec.co.jp> <20210309143534.6c1a8ec5@alex-virtual-machine> <20210309082824.GA1793@hori.linux.bs1.fc.nec.co.jp> Organization: kingsoft X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Originating-IP: [172.16.253.254] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL2.kingsoft.cn (10.88.1.32) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprLIsWRmVeSWpSXmKPExsXCFcGooOte6pFgMP0dm8Wc9WvYLD5v+Mdm 8XX9L2aLaRvFLS6camCyuLxrDpvFvTX/WS0uHVjAZHGx8QCjxZlpRRabN01ltnhz4R6LxY8N j1kdeD2+t/axeCze85LJY9OqTjaPTZ8msXu8O3eO3ePEjN8sHi+ubmTxeL/vKpvH5tPVHp83 yXmcaPnCGsAdxWWTkpqTWZZapG+XwJWxc887xoLFohVXnx9ga2DcJ9DFyMkhIWAiseDrHaYu Ri4OIYHpTBJr5x9hgXBeMUq0PV3CDlLFIqAq8fbWDzYQmw3I3nVvFiuILSKQJLF49lewbmaB 7ywSrcdvgRUJC/hL/HnwF6yIV8BK4vGcSWA2p4CDxI2rjawQG86xSEza0c8CkuAXEJPovfKf CeIme4m2LYsYIZoFJU7OfAJWwyygKdG6/Tc7hK0tsWzha2YQW0hAUeLwkl/sEL1KEke6Z7BB 2LESy+a9Yp3AKDwLyahZSEbNQjJqASPzKkaW4tx0o02MkBgM3cE4o+mj3iFGJg7GQ4wSHMxK Irx+x90ShHhTEiurUovy44tKc1KLDzFKc7AoifPuPeaaICSQnliSmp2aWpBaBJNl4uCUamA6 V1qseLlQpFaeafafTs5mCcUk9TPB+TLbPlVe8Au9ZvnGasPOu0KrY3kYa/7JqZobzm21VfF4 u07g9lR9z2P80ldOJUkarVF8Fv8iVMkqah/fo1b2q7saVnh+EpVpnbDpWUwW/0X/fdK3V3ex vb7TvSlE7FrhFOlD+szVZslFPW2sGw8vLzM7s2NCxKvKSN5mMaOneZrrZDxlvoZd9Io4r2DS rLmEf3n1mv+Klw3FpyT255iX8XVw77JqWfH50Erua+l6ru9nexzVsvus7yEga8t2/kH3T/fH XJtS5Jao9Kr7PJ0euZrfsvvAdUXXw+3/tjCuv2lx6IX0PZWN/JnnXr727Xu77uHFpRYiB5VY ijMSDbWYi4oTAfVKR04wAwAA Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 9 Mar 2021 08:28:24 +0000 HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Mar 09, 2021 at 02:35:34PM +0800, Aili Yao wrote: > > When the page is already poisoned, another memory_failure() call in the > > same page now return 0, meaning OK. For nested memory mce handling, this > > behavior may lead to mce looping, Example: > > > > 1.When LCME is enabled, and there are two processes A && B running on > > different core X && Y separately, which will access one same page, then > > the page corrupted when process A access it, a MCE will be rasied to > > core X and the error process is just underway. > > > > 2.Then B access the page and trigger another MCE to core Y, it will also > > do error process, it will see TestSetPageHWPoison be true, and 0 is > > returned. > > > > 3.The kill_me_maybe will check the return: > > > > 1244 static void kill_me_maybe(struct callback_head *cb) > > 1245 { > > > > 1254 if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && > > 1255 !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { > > 1256 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, > > p->mce_whole_page); > > 1257 sync_core(); > > 1258 return; > > 1259 } > > > > 1267 } > > > > 4. The error process for B will end, and may nothing happened if > > kill-early is not set, The process B will re-excute instruction and get > > into mce again and then loop happens. And also the set_mce_nospec() > > here is not proper, may refer to commit fd0e786d9d09 ("x86/mm, > > mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages"). > > > > For other cases which care the return value of memory_failure() should > > check why they want to process a memory error which have already been > > processed. This behavior seems reasonable. > > Other reviewers shared ideas about the returned value, but actually > I'm not sure which the best one is (EBUSY is not that bad). > What we need to fix the reported issue is to return non-zero value > for "already poisoned" case (the value itself is not so important). > > Other callers of memory_failure() (mostly test programs) could see > the change of return value, but they can already see EBUSY now and > anyway they should check dmesg for more detail about why failed, > so the impact of the change is not so big. > > > > > Signed-off-by: Aili Yao > > Reviewed-by: Naoya Horiguchi > > Thanks, > Naoya Horiguchi Thanks! And I found my mail was lost in mailist! -- Thanks! Aili Yao