Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp578378pxb; Thu, 14 Jan 2021 13:07:36 -0800 (PST) X-Google-Smtp-Source: ABdhPJzHp0kj4+bE+0he9vL0PRlX0z3x3i6SX/qVIPehz5xQeiH4x609QhWlQsBiBmrn7rn6mVTd X-Received: by 2002:a17:906:1b41:: with SMTP id p1mr3502359ejg.162.1610658456592; Thu, 14 Jan 2021 13:07:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610658456; cv=none; d=google.com; s=arc-20160816; b=FVLGURSSVy+5N7HCjJJ/vBXgmuhkqRsdvdSwD04wY15Q/qzMwWTzAxvSGYgJTeLbj3 f17hiN1qXih5B55RxdRtUuOpebuw6FRctJvAeh2HkYp5QB0wb5ebVMAQBw+2p5+g0RhU LpwRDGPvuGNa0u98jUhlJe7iHlQym/e0pR/XDTOzWp8HntNsEKlaIHWhu/3j8JGe51le F2IHChvjIvmC5zbxHK/I9W8KL5Ugf4T+fa5ehzKO1pDJmbZxzvLEVUfndoRrGr7brxct /39qG4xhJUZ9P7Dq83oqQkGqfDtbDb8Jt9ZMlyR0IBbGeyaaKd6qnV1TK0MD/JljDltJ /POw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:ironport-sdr :ironport-sdr; bh=+ZcXqm7L6inI/jOVRmf3FsU0HJV0VoLUypTpaQwYf6g=; b=FEizIy6/qleSkXdpfkNxMZCjwth+AW2GbiOfPSwI3V5eY/C7bEgQe6K2nFBZObWw1K s2CHIYsA6EdfbhNUjz57otpLwDqORDsFXWufAIsm6lpZCF5webnKUWpRdfexSc/FAEnp z+U6MWSICrofqSpQGF2QPmYrMfc1ItxSl4rtw+Ohg736Ni8iGmhgfFtfY8PyRyif0Viz y5dr4VYuAysdRixVg2SHPU+tkRkX/Eact5TEL4U7DLxfw9dnpUCjWfUjNpSeK8egPGOe owRHgzo9OR2dMyyR3pAF6TNXoUF7jJbrjMW+JBmjOGoidn1vgsUcYXHMGqbbJ9Nr9kTK aWwA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g10si3032924ejf.698.2021.01.14.13.07.12; Thu, 14 Jan 2021 13:07:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727152AbhANVFw (ORCPT + 99 others); Thu, 14 Jan 2021 16:05:52 -0500 Received: from mga14.intel.com ([192.55.52.115]:36186 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725988AbhANVFw (ORCPT ); Thu, 14 Jan 2021 16:05:52 -0500 IronPort-SDR: ea4SKUP8l9rpbX9MBFEJhiF+Zv2RVEkmd0oqI1S5zQggGWJjlj3WF4ZPSkwvwbysVC4ibLyI9K unK3Hk8DmMww== X-IronPort-AV: E=McAfee;i="6000,8403,9864"; a="177665019" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="177665019" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 13:05:10 -0800 IronPort-SDR: 03RvIpZ+Lb+/0ctdvuH0LibwMHA7S2WVnPFAmamgNbPWTX4ZZdXMIHObiFmiRtu9XKUYjZUXlz 37Li9XUrMdbQ== X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="382406448" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.68]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 13:05:10 -0800 Date: Thu, 14 Jan 2021 13:05:08 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: x86@kernel.org, Andrew Morton , Peter Zijlstra , Darren Hart , Andy Lutomirski , linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <20210114210508.GA20224@agluck-desk2.amr.corp.intel.com> References: <20210108222251.14391-1-tony.luck@intel.com> <20210111214452.1826-1-tony.luck@intel.com> <20210111214452.1826-2-tony.luck@intel.com> <20210114202213.GI12284@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210114202213.GI12284@zn.tnic> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 14, 2021 at 09:22:13PM +0100, Borislav Petkov wrote: > On Mon, Jan 11, 2021 at 01:44:50PM -0800, Tony Luck wrote: > > @@ -1431,8 +1433,11 @@ noinstr void do_machine_check(struct pt_regs *regs) > > mce_panic("Failed kernel mode recovery", &m, msg); > > } > > > > - if (m.kflags & MCE_IN_KERNEL_COPYIN) > > + if (m.kflags & MCE_IN_KERNEL_COPYIN) { > > + if (current->mce_busy) > > + mce_panic("Multiple copyin", &m, msg); > > So this: we're currently busy handling the first MCE, why do we must > panic? > > Can we simply ignore all follow-up MCEs to that page? If we s/all/some/ you are saying the same as Andy: > So I tend to think that the machine check code should arrange to > survive some reasonable number of duplicate machine checks. > I.e., the page will get poisoned eventually and that poisoning is > currently executing so all following MCEs are simply nothing new and we > can ignore them. > > It's not like we're going to corrupt more data - we already are > "corrupting" whole 4K. > > Am I making sense? > > Because if we do this, we won't have to pay attention to any get_user() > callers and whatnot - we simply ignore and the solution is simple and > you won't have to touch any get_user() callers... Changing get_user() is a can of worms. I don't think its a very big can. Perhaps two or three dozen places where code needs to change to account for the -ENXIO return ... but touching a bunch of different subsystems it is likley to take a while to get everyone in agreement. I'll try out this new approach, and if it works, I'll post a v3 patch. Thanks -Tony