Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp644068pxy; Thu, 22 Apr 2021 10:03:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzCAmZNQCRhZFBqHmneh8uIC+3ijZa0xw31MFtIDHrziq+WpEyEZjRSG/KmhZSHEfUwn6TS X-Received: by 2002:a17:906:fca1:: with SMTP id qw1mr4348748ejb.478.1619111008299; Thu, 22 Apr 2021 10:03:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619111008; cv=none; d=google.com; s=arc-20160816; b=i+pXEpmp5kz23de3Sdz0vuoXHInwfMTlB3BNbLm0DGkcgCM86ANIvHSKxeGyYwQv0W 7iJh8bv7f/4Q729KkMRa+/QA9WZsawcdAYN5ibb60FUFCIzPpm/hztDs3ACniSvqlaA2 djreP4dvsAicVYjvaGLeMtvOa3MFioxUwvUK+MQ6J3jxuDwU1hDeXmMosHpxM5EvnfJJ YgEPIFHD9Z8aF8bRB5vsGWINlCRIRE0uq0hUeRuZo1lT6EmvLPvGl9UhI07LBTodlq/Z xGFXsZ7/MNgvPfXdTZ7ADqRdzgzFbpqNKS3COdeeh9lRKde/TBFtw1B9BySgIiTdxj2m CImg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=V9HbXQ0do/5kolNtrmxXCTBW4RIorF+m2CxiqXYrMkA=; b=wCfx70cRw/ZgUckRd9bLzVfSIlwCW/E22U2Y4yuh6bt7eiQR7/ORNCP4Eu7lBWLdYY N0ucAUJA/wpRUovB0no7WmBkZIjxfZg2gz9hvTmIRccVXV/Aveqd479ANVzi6igZ3ArT Fqtrqlag3No9t2rZof6m8At31Vblazd417LQT/TV3xrKdhPba7iOqnUG1XHcUJ6053lA tIe8dtFM6kIlmdwu5OElqAz35KcG/dDNMIrh/OE+mVGIiWaGKVb3L+dNTNKNw7cqHbT8 rfRl7NCYfMGwoZUBLOBvLrFVwT/3wPaIR26s8XUIdIQwuGZTzOFxkjquqv9+KmlgMYCf yidQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=G8wRccL+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u1si2828940ejg.28.2021.04.22.10.03.04; Thu, 22 Apr 2021 10:03:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=G8wRccL+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238182AbhDVRCj (ORCPT + 99 others); Thu, 22 Apr 2021 13:02:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238083AbhDVRCi (ORCPT ); Thu, 22 Apr 2021 13:02:38 -0400 Received: from mail.skyhub.de (mail.skyhub.de [IPv6:2a01:4f8:190:11c2::b:1457]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0073C06174A for ; Thu, 22 Apr 2021 10:02:03 -0700 (PDT) Received: from zn.tnic (p200300ec2f0e2900329c23fffea6a903.dip0.t-ipconnect.de [IPv6:2003:ec:2f0e:2900:329c:23ff:fea6:a903]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 279641EC047F; Thu, 22 Apr 2021 19:02:01 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1619110921; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=V9HbXQ0do/5kolNtrmxXCTBW4RIorF+m2CxiqXYrMkA=; b=G8wRccL+4ing5H+EGOTjGZcQWlEIEnnysR9kZJe/ZMV67xSvcItpJhGF9nmjKVglPadRcT U1sxGuTge9x5mqQcPpV6pordkU2ienB/SUMpur3xpaCxBodg9ytUOUeBdjX74ZDFEn2+uo Rh6AHTZSC+oF9EPWARes4lsWOwv18xo= Date: Thu, 22 Apr 2021 19:02:04 +0200 From: Borislav Petkov To: Naoya Horiguchi Cc: linux-mm@kvack.org, Tony Luck , Aili Yao , Andrew Morton , Oscar Salvador , David Hildenbrand , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already Message-ID: <20210422170204.GD7021@zn.tnic> References: <20210421005728.1994268-1-nao.horiguchi@gmail.com> <20210421005728.1994268-3-nao.horiguchi@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210421005728.1994268-3-nao.horiguchi@gmail.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 21, 2021 at 09:57:27AM +0900, Naoya Horiguchi wrote: > From: Aili Yao > Subject: Re: [PATCH v3 2/3] mm,hwpoison: return -EHWPOISON when page already ... Return -EHWPOISON to denote that the page has already been poisoned" > When the page is already poisoned, another memory_failure() call in the > same page now returns 0, meaning OK. For nested memory mce handling, this > behavior may lead to one mce looping, s/mce/MCE/g > Example: For example: > 1. When LCME is enabled, and there are two processes A && B running on > different core X && Y separately, which will access one same page, then which access the same page... s/&&/and/g > the page corrupted when process A access it, a MCE will be rasied to > core X and the error process is just underway. ... and you lost me here. I don't understand what that is trying to say. Is that trying to say that when process A encounters the error, the MCE will be raised on CPU X? > 2. Then B access the page and trigger another MCE to core Y, it will also > do error process, it will see TestSetPageHWPoison be true, and 0 is > returned. That sentence needs massaging. > 3. The kill_me_maybe will check the return: > > 1244 static void kill_me_maybe(struct callback_head *cb) > 1245 { > ... > 1254 if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && > 1255 !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { > 1256 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); > 1257 sync_core(); > 1258 return; > 1259 } > ... > 1267 } No need for the line numbers. > 4. The error process for B will end, and may nothing happened if > kill-early is not set, The process B will re-excute instruction and get > into mce again and then loop happens. And also the set_mce_nospec() > here is not proper, may refer to commit fd0e786d9d09 ("x86/mm, > mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages"). That needs massaging too. > For other cases which care the return value of memory_failure() should > check why they want to process a memory error which have already been > processed. This behavior seems reasonable. This whole commit message needs sanitizing. Also, looking at the next patch, you can merge this one into the next because the next one is acting on -EHWPOISON so it all belongs together in a single patch. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette