Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp622719pxv; Thu, 22 Jul 2021 08:22:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWOnbzK0bHRaUCoQaredHdBY4BwiMkfGm26yy92iOTAWs93gvm7LsPnzcnuOobxAuNMgYH X-Received: by 2002:a92:190f:: with SMTP id 15mr244276ilz.45.1626967359535; Thu, 22 Jul 2021 08:22:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626967359; cv=none; d=google.com; s=arc-20160816; b=YVzMO9gBosYHi3+RHhT2V5qtw4iYiV+vEBfGqrkC9+dekPs1zcRISeOinv6YZvzFUe teus8js71KFg8f/hUdPilD9YdbKxmUP59dm8g5hbsrtjzOw+sthKB9rDjLD2DmxP0JTn 6gCO5ArYA3K3/mEFt0QveS3aE7nYXHboc9e9t11UPwplo3hwNHKn0yla9GIiDM3DasGb 9tDNiQ97GVamzuCBQJ5BpbufzZFMieWgiQ6SwkyTcYN9ubJRlb98C9lpP7q70otJgXPL iI1FbKIYRC77gskQY4VYL4/uIxxyfj82F6Go1z11U+nCq6DtWc4R/VSNmlsiK+kCiR6p 4waQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=r996+IfD04Dlpf830RaLkm8tfjV2VTlostEMHJ0GPHg=; b=aULUJ1i6pWjFtN6Wz/nMyge2J3dR8pMbs9LQeGaTXaog0pz+JgqfSDXAJsiX0YN1Mu M953AbNyuQdGnm9HZWLGjHJPW3er9Pn973Ra0AOrefkbVA3JJTtBiMTVSmBlMF6EQTL8 I8ny8CDGz2L+rg6YrOt4H9MOHrFu7Q1yNNM8Wh7+TyoOsuE3Xqh0oBDsASSMiENu3hIE eXWCc0EXMo1ui6lc5iiddjlsy6+ulhu0pGTfiVcGETx7bKl2+49ui/2DyqgIOH3LlLQY 9boUEbzmZl3139O5CJSlqq4v+ky9rB5QA/Ejnr0ePgrlkLgP3pAXwslHu1AdX+f0BoLe 4Amg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g5si27093596ilb.152.2021.07.22.08.22.24; Thu, 22 Jul 2021 08:22:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232662AbhGVOkw (ORCPT + 99 others); Thu, 22 Jul 2021 10:40:52 -0400 Received: from mga17.intel.com ([192.55.52.151]:46123 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232596AbhGVOi5 (ORCPT ); Thu, 22 Jul 2021 10:38:57 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10053"; a="191941924" X-IronPort-AV: E=Sophos;i="5.84,261,1620716400"; d="scan'208";a="191941924" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2021 08:19:32 -0700 X-IronPort-AV: E=Sophos;i="5.84,261,1620716400"; d="scan'208";a="462806278" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.146]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Jul 2021 08:19:32 -0700 Date: Thu, 22 Jul 2021 08:19:30 -0700 From: "Luck, Tony" To: Jue Wang Cc: Borislav Petkov , dinghui@sangfor.com.cn, huangcun@sangfor.com.cn, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+jIOebtOS5nyk=?= , Oscar Salvador , x86 , "Song, Youquan" Subject: Re: [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <20210722151930.GA1453521@agluck-desk2.amr.corp.intel.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 22, 2021 at 06:54:37AM -0700, Jue Wang wrote: > This patch assumes the UC error consumed in kernel is always the same UC. > > Yet it's possible two UCs on different pages are consumed in a row. > The patch below will panic on the 2nd MCE. How can we make the code works > on multiple UC errors? > > > > + int count = ++current->mce_count; > > + > > + /* First call, save all the details */ > > + if (count == 1) { > > + current->mce_addr = m->addr; > > + current->mce_kflags = m->kflags; > > + current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV); > > + current->mce_whole_page = whole_page(m); > > + current->mce_kill_me.func = func; > > + } > > ...... > > + /* Second or later call, make sure page address matches the one from first call */ > > + if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT)) > > + mce_panic("Machine checks to different user pages", m, msg); The issue is getting the information about the location of the error from the machine check handler to the "task_work" function that processes it. Currently there is a single place to store the address of the error in the task structure: current->mce_addr = m->addr; Plausibly that could be made into an array, indexed by current->mce_count to save mutiple addresses (perhaps also need mce_kflags, mce_ripv, etc. to also be arrays). But I don't want to pre-emptively make such a change without some data to show that situations arise with multiple errors to different addresses: 1) Actually occur 2) Would be recovered if we made the change. The first would be indicated by seeing the: "Machine checks to different user pages" panic. You'd have to code up the change to have arrays to confirm that would fix the problem. -Tony